Sunday, February 25, 2007

A quiet revolution in KM?

Many thanks to Justin Kestelyn who recently highlighted Alejandro Vargas' Blog. Alejandro has been quietly putting together a powerful Body of Knowledge on RAC and related issues [still not a "Featured Employee Blog", but it should be!].

I must say, this was a Damascan Road event for me.

During the mid to late 90's, I was heavily involved in Content and Knowledge Management solutions for enterprises. We struggled with concepts of "Human Capital" and "Intellectual Property". In most cases, I was involved in helping companies setup the technical and organisational infrastructure to better empower "knowledge workers" to share, find and reuse information. The solutions of the day were corporate intranets, portals, search engines, mail, groupware and such-like.

I have been somewhat removed from that scene over the past 5 years, and it was only when seeing Alejandro's blog that it really struck me that a quiet revolution has been underway in Corporate Knowledge Management (KM).

The main problems we used to face in knowledge management were cultural, not technical. How to encourage participation? How to reward contribution and reuse? How to effectively organise and locate relevant information? How to measure and align with corporate goals? After years of practice (in both senses of the word), I came to the realisation that the most important contributing factor always came down to having a pivotal community member (not necessarily the formal "leader") around which others would cluster and be inspired to act.

Ironically, the solution to such cultural issues appears to have been technical if I am reading the megatrends correctly.

We saw the first inklings of progress with wikis, which facilitate rapid collective development of a body of knowledge, but now I realise that it is with blogs and podcasts that we are in the process of making a significant step-change.

Now blogs are nothing new. But predominantly we have seen them as a pop cultural phenomenon, creating our first issue of Web Celebs.

Alejandro's blog however epitomises the new breed of not-quite corporate blogging... blogging on a specific topic of interest that relates to their work, but done under personal editorial control. In the past (and still true today), Alejandro's employer would have provided corporate facilities to capture and share the kind of information he is publishing. I'm sure Alejandro still diligently complies with such mandated systems, but it is his blog which is having the most positive impact on the world.

So how is corporate knowledge management evolving with the advent of blogs and podcasts? To my mind, there are two important factors to note:


  • Accessibility If a tree falls in a forest and no one is there, does it make a sound? Similarly with knowledge management, what is the value of sharing if no-one uses your work? Corporate knowledge management systems were once able to deliver the optimal audience for your work (if you tried very very hard over a very very long period of time), but the rise of the internet and blogging in particular has changed that dynamic forever.

  • Attribution some would say "ego". One critical factor in the success of blogs is that the primary structural organisation is by attribution. It is your collection. Secondary organisation by news aggregators, semantic web or search engines cannot dilute the fact. The days of consigning your masterpiece into the black hole of corporate knowledge are fast receeding. For some, "ego" may be the prime motivation, but I believe it is usually a little more complex than that. Blogs tend to tell a story over time. Posts will be related. As a blog author, you will continually have you history of posts in front of you. Not only does this reinforce the sense of continuing narrative, it illuminates the "gaps" in your story and therefore compels further contribution to fill the void. Just as a novellist may feel compelled to complete their work no matter how unlikely it may be to find a welcoming readership, the blogger is likewise committed to continue what they start. Note that wikis are tapping into a fundamentally different attribution dynamic, one not so much governed by "ego", but by "tribal" aliegance.


"Information is power" is a hackneyed phrase. And I think it doesn't do justice to what is happening in the world today. It is too small town.

I prefer to think in terms of "potential accessibility x attribution = self-perceived value", and "referenced accessibility x attribution = actual value" i.e. power. And I think what we are seeing now is that the tools available (wikis and blogs in particular) are delivering a "value/power" formula that is starting to unleash an unprecedented wave of knowledge sharing and collaboration.

Implicit in the above is a decided shift in "editorial control". No longer is it a neat case of the knowledge worker submitting their work to the whims of the corporate machine. The blogger is in control. That is a significant challenge to the corporation, especially those for which information is the primary product they sell. Alejandro is perhaps fortunate to work a company that makes its money from selling software ... sales that his blogging supports and enhances. But a tax or legal firm? That's a different kettle of fish, since they primarily trade in "knowledge" i.e. expert advice.

So what response should we expect from corporations and developers of "km"/blogging software? Here are a few thoughts...


  • We need to see the modern tools (blogs, wikis and podcasts) incorporated as primary sources for corporate KM strategies. That means integration between blogging tools and the corporate KM systems such as customer care/help systems, search engines, semantic webs and intranet/internet portals.
  • Locking away such tools as "internal only" resources works against the accessibility imperitive. There are of course valid concerns over confidentiality. To accomodate this dilemma, metadata (and systems) need to support the concept of confidentiality. That is, employees should be able to selectively target internal, customer and public audiences when they post or broadcast. We should be able to adapt approval/moderation processes in accordance with confidentiality.
  • In a corporate setting, it doesn't take long before people start questioning the "business value" of all this blogging etc. Personally, I hav ea foot in both camps. On the whole, I believe very few people can justify blogging exclusively on company time. People like Tom Kyte for example [ if we consider Ask Tom a specialised kind of blog]. For most of us however, it's a shared "value capture" equation. Sure, the blogs/wikis etc may enhance our contribution to the business, but a large part can also be a personal development exercise in enhancing our long term career value. In crude terms, maybe that next job will be clinched on the strength of your blog. So just like I read professional books of my choice on personal time, that's when I blog too. As employees, we need to be realistic about this and manage our time accordingly.
  • For the corporations themselves, "value capture" can be a trickier proposition. And this depends on the nature of the business too. For a company like Oracle, having employees and users positively blog about its software can arguably give a major (but unmeasurable) sales boost by enhancing "brand value". Maybe we are not quite there yet, but I can imagine a day soon where the lack of an active blogsphere around your software product will make it almost impossible to sell. However to take an extreme example, consider CNN. What if all your reporters actively blogged too? Or if all Ernst & Young's accountants popped up on blogspot? It is harder to implicitly conclude there would be a positive impact on the bottom line. At this point, I see no other solutions than trying to achieve a suitable compromise. The softwares we use play an important role in providing the functionality to achieve that balance faster.



All of the above may be bleeding obvious to some and well discussed in the KM journals, but everyone needs their "Ah-ha" moment, and I just had mine.

Saturday, February 24, 2007

Securing your home router #2

Just listening to Security Now #80 in which Steve and Leo discuss a Javascript exploit that aims to compromise home routers with default password.

I strikes me that its a very short leap to combine this Javascript approach with the broken security implementation in certain routers that I blogged on recently. In fact, the Javascript approach overcomes the main limitation of the vulnerability hack which is that it required LAN access to your router to do anything more than mischief.

Wednesday, February 21, 2007

Generating CLOB/CDATA elements with XMLDB

Generating CDATA elements with Oracle XMLDB recently got a good airing in the XMLDB forums.

I won't reiterate the discussion there, but offer a summary and some sources.

It seems the current state of affairs is that if you need to generate large text elements with XMLDB you have two options:


  • use DBMS_LOB procedural code to manually construct a CDATA element, or
  • use XMLTYPE views to construct an XML-encoded element


In both cases you need to be careful not to do anything that casts or converts to varchar to avoid the inherent size limitations.

Note that the XML-encoding in XMLTYPE views is automatic, and I currently don't know how to tell it not to encode but rather quote as CDATA.

Some sources and examples:


  • clob-cdata.pl is a Perl script using DBI that demonstrates how to generate an XMLTYPE view over an arbitrary CLOB element, without using XMLSchema. In this case, the CLOB will be automatically XML-encoded [clob-cdata-nonschema.sql is just the plain SQL].
  • clob-cdata-schema.sql shows how you can do a similar thing, but using an XMLSchema definition.
  • clob-cdata-small.sql shows how you can create CDATA elements where the text size is small using the XMLCdata function

Letting strangers on your Wifi .. need a reason why not?

Sometime back I was hacking my wifi admin pages (to let me register a certain NTP server .. but that's another story), and in the process discovered how broken the security is on my device (an SMC SMC2804WBRP-G Barricade router).

Basically the security check - to make sure you are a valid, logged-in administrator - just redirects to the "action" page which does no further checking of your credentials.

It doesn't take a genius to figure out that if you just post directly to the "action" page you can probably bypass authentication. At least, that's what occured to me, so I tried it and (too my surprise nonetheless) it worked. Or didn't work, depending on your point of view!

To their credit(!), the routine to reset the admin password does require you to send the existing password, but other operations have no barrier.

Here's a simple Perl script that demonstrates how you can "own" an SMC router of this type. It basically lets you reset factory defaults, after which you know the admin password (smcadmin). The factory default has no wifi enabled, so to make any further use of the router you must be connected to a LAN port. But certainly one way to wreck your neighbour's weekend.

I reported this vulnerability to SMC and CERT, but haven't heard whether any action has been taken to fix this.

I also don't know how many other models or brands of routers are susceptible to the same fault. But take this as a warning (and the reason why I am posting this information) ... if you want to offer wifi services to others, make sure your device is not subject to this kind of flaw first!

Running Instant Client on Linux

I recently had cause to install and configure the Oracle Instant Client under Linux. As I've written before, it is a breeze to get a client up and running.

I did find however that the way the instant client deploys its files can break makefiles and so on if you are doing C/C++ development.

I wrote a simple script (see installInstantClient.sh) to install and cleanup an Instant Client and take care of a few things like:


  • move executables into a /bin subdirectory
  • move libraries into a /lib subdirectory
  • create links for commonly know library names
  • create a default network/admin/tnsnames.ora
  • suggest appropriate environment settings for your .bash_profile


Note that the script is written to explicitly handle the basic+sdk+sqlplus installation. If you want to use it for a different combination of kits it will need some simple modification.

Time to revamp PL/SQL?

Welcome the year of the pig! That maybe appropriate, because I can't help thinking that its 2007 already, and high time that Oracle gave a serious revamp to doddering PL/SQL. Doddering, you say? Well, yes. The past few years have seen incredible language innovation (read Ruby, Python, even JavaScript getting a new AJAXian lease of life) but PL/SQL seems to have been left by the wayside.

I do not know exactly what Oracle have in store for us with 11g, but I sincerely hope it addresses some of my major beefs, which I'd summarise as follows.

1. Give me CLOB-sized VARCHARs
I have a thumping server with a gazillion gigabytes of memory, so why do I spend so much time working around 4000 byte or 32k limits in PL/SQL? Or worse, have my app fail randomly in production when a certain bit of data slips the limit.

These are internal RDBMS implementation details that application programmers should not be concerned with. That's not to say that application programmers shouldn't be concerned about performance, just that they shouldn't be constrained by such arbitrary fundamental restrictions.

OK, so the 4000 byte limit is a SQL thing. But once I am manipulating string data in PL/SQL, if I need a Mb, then please Oracle let me use a Mb.

This is the 21st Century guys. We're not all dealing with simple accounting data. Handling large volumes of text is de rigeur. Text, not nameless objects, be it XML, HTML or just plain ASCII/Unicode.

2. Function-style DBMS_LOB interface
Partly as a result of (1), we often need to resort to DBMS_LOB for dealing with large text. Since it has a largely procedural interface, this usually means we need to drop into PL/SQL when in fact plain SQL would have been preferred.

Rather than deal with temporary LOBs etc, I'd prefer just a function-style interface so most of the LOB handling could be done inline with SQL.

3. Get over the VARCHAR limits with XMLType and XML-related Packages
OK, maybe just a variation on the same theme, but one of the most common situations where application programmers will run into varchar limits is when working with XML. To many of the various XML-related functions and packages are hamstrung by their lack of native support for CLOBs. In many cases, this means what can be elegantly programmed "in the lab" has no practical use because of these limits.

4. Better documentation - proper definitions, real examples
IMHO, most Oracle docs are written according the the "Anne Elk" school of documentation.

It can't get much worse than DBMS_XMLDOM, but let me take an example at random... open the PL/SQL Packages and Types ref and page down a few times. Let's look at DBMS_APPLICATION_INFO.SET_CLIENT_INFO. Parameter is client_info IN VARCHAR2, uhuh. Definition is "Supplies any additional information about the client application". HELLO? Did that actually help anyone who didn't already know what the parameter was for or how it was used?

Normally with most reference guides this is when you turn to the examples to "reverse engineer" the definition. But there are no examples as a rule in the reference docs.

You may find an example in User's Guides, but then again, you may not;) That needs to change.


OK, that's a few for starters. Got any other beefs? Please post a comment, I'd be interested to hear what you have to say.

Synchronising two directory trees

I've released an update to Chang Liu's tree-sync.pl script on CPAN (currently tree-sync-2.2.pl).

I had some problems with the original script, so this is a re-write that uses an algorithm based on File::Find module.

I've only been testing on Windows and Linux, but so far so good - it solves my immiedate issue which was to have a way of maintaining a synchronised "backup" copy of various filesystems.

PS: As of Oct-2008, the tree-sync project is now on github. Use this if you want to contribute to development. Of course, releases will still be distributed for use on CPAN.

Sunday, February 18, 2007

XSL Transforms in the database

Previously, I wrote on how to extract XPath refs from an arbitrary XML document. Well, you can actually do this inside a database too - specifically Oracle 9i/10g with built-in XMLDB support.

Say we have XML data and XSL templates stored in a simple table:

CREATE TABLE x1 (item varchar(25) primary key, xml xmltype);

Where the data is stored with item="data", and the XSL template to extract paths to text is stored as item="xsl-to-text", then our transform may be executed as simply as this:
select 
XMLTransform(
xml,
(select xml from x1 where item='xsl-to-text')
).getstringval() into v_out_text
from x1 where item='data';
dbms_output.put_line(v_out_text);

A full sample script is available here.

Extracting XPath refs from an XML document

I was inspired by a recent post in the XMLDB Forum to look at the question of how to extract a complete list of XPaths and the associated text node values from an arbitrary XML file. I looked into an XSLT approach which I'll describe here.

Say we have an XML file like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Library>
<Books>
<Book>
<Author>
<Last>Perry</Last>
<First>Anne</First>
</Author>
<Title>Long Spoon Lane</Title>
</Book>
</Books>
<Members>
<Member>
<Name>Paul</Name>
<Joined>2005-11-01</Joined>
</Member>
</Members>
</Library>

And our objective is to produce a listing like this:
/Library/Books/Book/Author/Last():Perry
/Library/Books/Book/Author/First():Anne
/Library/Books/Book/Title():Long Spoon Lane
/Library/Members/Member/Name():Paul
/Library/Members/Member/Joined():2005-11-01

After some investigation and reference to sites like Path Tracing and the XSLT 1.0 spec I arrived at what I think is the simplest xsl possible:
<?xml version="1.0" encoding="windows-1252" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:strip-space elements = "*" />

<xsl:template match="text()">

<xsl:for-each select="ancestor-or-self::*">
<xsl:text>/</xsl:text>
<xsl:value-of select="name()" />
</xsl:for-each>

<xsl:text>():</xsl:text>
<xsl:value-of select="." />
<xsl:text>&#xA;</xsl:text>

<xsl:apply-templates/>

</xsl:template>

</xsl:stylesheet>

What is going on here?

Well, firstly note that we strip-spaces and then match on all text() nodes - this ensures we skip all the pure whitespace nodes.

The magic that generates the XPath is the the "for-each" over all "ancestor-or-self" elements which generates the XPath identifier. Then we simply add the text value on the end.

A variation on the XSL template that produces an XML structure instead of text is as follows. It really varies just in terms of output formatting:
<?xml version="1.0" encoding="windows-1252" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>

<xsl:strip-space elements = "*" />

<xsl:template match="/">
<items>
<xsl:apply-templates/>
</items>
</xsl:template>

<xsl:template match="text()">
<item>
<path>
<xsl:for-each select="ancestor-or-self::*">
<xsl:text>/</xsl:text>
<xsl:value-of select="name()" />
</xsl:for-each>
</path>
<value>
<xsl:value-of select="." />
<xsl:apply-templates/>
</value>
</item>
</xsl:template>

</xsl:stylesheet>

Tuesday, February 13, 2007

Handling namespaces with DBMS_XMLDOM

The PL/SQL package dbms_xmlgen has been around for a while now, but it sort of suffers from a lack of doc and examples. Getting it to handle namespaces properly is a good example - it seems like it should be a bit more intelligent than it actually is!

Let's say we wanted to generate this:
<?xml version="1.0"?>
<a:gadgets xmlns:a="uri:a" xmlns:b="uri:b" >
<b:phones/>
</a:gadgets>

Part of the solution would look like this (enough to illustrate some key points):
...
V_CURRENT_EL := DBMS_XMLDOM.CREATEELEMENT(doc => V_DOMDOCUMENT, tagName => 'gadgets', ns => 'uri:a');
DBMS_XMLDOM.SETATTRIBUTE(elem => V_CURRENT_EL, name => 'xmlns:a', newvalue => 'uri:a');
DBMS_XMLDOM.SETATTRIBUTE(elem => V_CURRENT_EL, name => 'xmlns:b', newvalue => 'uri:b');
V_CURRENT_CHILD_NODE := DBMS_XMLDOM.makeNode(V_CURRENT_EL);
DBMS_XMLDOM.SETPREFIX(n => V_CURRENT_CHILD_NODE, prefix => 'a');
V_CURRENT_NODE := DBMS_XMLDOM.APPENDCHILD(n => V_CURRENT_NODE, newChild => V_CURRENT_CHILD_NODE);
...
Here are the few guidelines I've "inferred" about DBMS_XMLDOM behaviour (note that this is all my reverse-engineered understanding, so I may be well off base!):

1. namespaceURI/ns params of createDocument, createElement methods just declare the namespace of the entity and have no practical impact on the generated xml

So the xml output would be the same even if you changed line 1 of the code above to:
V_CURRENT_EL := DBMS_XMLDOM.CREATEELEMENT(doc => V_DOMDOCUMENT, tagName => 'gadgets', ns => 'someotheruri');
NB: you can verify the namespace is internally altered using DBMS_XMLDOM.GETNAMESPACE

2. to set the xmlns attributes of the document, you need to explicitly create the attribute. You need lines 3 and 4 of the code above.

3. to cause a node or attribute to be serialised with a namespace prefix, you need to explicitly request it with the SETPREFIX call e.g.
DBMS_XMLDOM.SETPREFIX(n => V_CURRENT_CHILD_NODE, prefix => 'a');  


4. however, the namespace prefix will only be used in the serialised xml if you also set a namespaceURI/ns (from point 1).


5. there is no validation or verification that the namespace prefix you set (point 3) matches a namespace you have declared (points 1 and 2).


As you can see, DBMS_XMLDOM is really just a very thin wraper to programmatically generate XML at the lowest level. It leaves much of the "intelligence" for you to provide!

Bearing that in mind, you probably can think of many cases where using DBMS_XMLDOM is not the right answer, but is sort of brute force/naive.

If you have data in Oracle and want to produce some complex XML output, there are many smarter alternatives, such as defining an xmltype view over the source data and then doing an xsl transform into the desired document format. Aside from being much less tiresome than DBMS_XMLDOM, it has the benefit of separating presentation (the template) from the data. When you need to change the output format, you just change the template rather than hacking away at the DBMS_XMLDOM code.

Sunday, February 11, 2007

Complex SOAP::Lite requests - my rules for SOAP::Sanity!

Previously, I mentioned I'd come back to more complex request and response structures with SOAP::Lite.

Frankly, I haven't posted because I can't avoid the feeling that there's still a little more to unravel. Did I mention that good documentation is sparse? ;) Byrne and others have posted some good stuff (see for example the majordojo soaplite archive and this hands-on tour at builder.com), but mostly you'll find the experiences shared go along lines like "...after hacking around for a bit, I found that this worked...".

But is it possible to try and pin down at least a couple of guidelines for using SOAP::Data? Well, so far I can't claim to solving it all, but I am able to share a few anchors I've tried to plant for my own sanity!

My Rules for SOAP::Sanity


In the following "rules", $soap is a pre-initialised SOAP::Lite object, as in:
my $soap = SOAP::Lite->uri ( $serviceNs ) -> proxy ( $serviceUrl );

1. The value of a SOAP::Data element becomes the content of the XML entity.


It may seem bleeding obvious. Nevertheless, get this idea fixed in you head and it will help for more complex structures.

So if we are calling a "getHoroscope" service with the following request structure:
<getHoroscope>
<sign>Aries</sign>
</getHoroscope>

"Aries" is the value, i.e. the content, of the XML entity called "sign". Thus our request will look like this:
$data = SOAP::Data->name("sign" => 'Aries');
$som = $soap->getHoroscope( $data );

2. To create a hiearchy of entities, use references to a SOAP::Data structure.


In (1), the content of the entity was a simple string ("Aries"). Here we consider the case where we need the content to encapsulate more XML elements rather than just a string. For example a request with this structure:
<getHoroscope>
<astrology>
<sign>Aries</sign>
</astrology>
</getHoroscope>

Here "astrology" has an XML child element rather than a string value.
To achieve this, we set the value of the "astrology" element as a reference to the "sign" SOAP::Data object:
$data = SOAP::Data->name("astrology" =>
\SOAP::Data->name("sign" => 'Aries')
);
$som = $soap->getHoroscope( $data );


3. To handle multiple child entities, encapsuate as reference to a SOAP::Data collection.


In this case, we need our "astrology" element to have multiple children, for example:
<getHoroscope>
<astrology>
<sign>Aries</sign>
<sign>Pisces</sign>
</astrology>
</getHoroscope>

So a simple variation on (2). To achieve this, we collect the "Aries" and "Pisces" elements as a collection within an anonymous SOAP::Data object. We pass a reference to this object as the value of the "astrology" item.
$data = SOAP::Data->name("astrology" => 
\SOAP::Data->value(
SOAP::Data->name("sign" => 'Aries'),
SOAP::Data->name("sign" => 'Pisces')
)
);
$som = $soap->getHoroscope( $data );

4. Clearly distinguish method name structures from data.


This is perhaps just a style and clarity consideration. In the examples above, the method has been implicitly dispatched ("getHoroscope").
If you prefer (or need) to pass the method information to a SOAP::Lite call, I like to keep the method information distinct from the method data.

So for example, item (3) can be re-written (including some additional namespace handling) as:
$data = SOAP::Data->name("astrology" => 
\SOAP::Data->value(
SOAP::Data->name("sign" => 'Aries'),
SOAP::Data->name("sign" => 'Pisces')
)
);
$som = $soap->call(
SOAP::Data->name('x:getHoroscope')->attr({'xmlns:x' => $serviceNs})
=> $data
);

I prefer to read this than have it all mangled together.

That brings me to the end of my list of rules! I am by no means confident that there aren't more useful guidelines to be added, or that in fact the ones I have proposed above will even stand the test of time.

Nevertheless, with these four ideas clearly in mind, I find I have a fair chance of sitting down to write a complex SOAP::Lite call correctly the first time, rather than the trial and error approach I used to be very familiar with!

Safe OCCI createStatelessConnectionPool usage

I've been working a bit with Oracle's OCCI (the C++ API for Oracle Database), and stateless connection pools in particular.

I've noticed a particular behaviour that's important to be aware of when creating the connection pool (using the oracle::occi::Environment method "createStatelessConnectionPool"). The problem is that this call will fail if you have some kind of conenction or TNS error, leaving you with an unusable and invalid pool.

To give a concrete example, if you create a connection pool like this:
scPool = env->createStatelessConnectionPool(...

what I find is that if the database is down (for example), this call with throw a TNS error like ORA-12541: TNS:no listener, and the scPool object is invalid (but not a null reference).

if you attempt to use the pool thereafter, e.g.:
if (scPool) cout << "MaxConn=" << scPool->getMaxConnections() << endl;

then firstly "if (scPool) .." wont protect you (because its not null), and the getMaxConnections method call will throw an uncatchable segmentation fault (this is Linux x86 I'm using)

The workaround of course is to null your scPool if you catch the TNS error, and then if you want a robust application that must keep running even if the connection pool is not created, everytime you try and get a connection from the pool you should also first check to see if you have a pool object to use (and if not, try and create it again).

Tortuous to say the least!

I would have expected that the desired behaviour should be for createStatelessConnectionPool to return a valid connection pool even if connections are not possible at that point in time, and that for the TNS errors to be only thrown if and when you try and get a connection from the pool.

Anyone have a view? ... bug, ER or expected?

12-Feb, an update: I've since discovered that this behaviour is true only if you set a "minumum connections" >0 for the pool. If you set "minumum connections"==0, then the behaviour is as I would expect - the pool is created without error, but you may hit an error when attempting to get a connection from the pool.