Thursday, July 01, 2004

Using functional indexes to improve XMLTYPE query performance

Oracle XDB provides excellent facilities for handling XML data within the database. One aspect is the use of XmlType columns to store XML documents as-is within a table. XmlType object methods provide the means to extract and manipulate the XML doc in PL/SQL, in particular we can use XPath queries to extract data from the XML doc as part of a SELECT expression.
For example, lets take a simple "XML document Registry" table:

CREATE TABLE myxml (doc_id NUMBER PRIMARY KEY,xmldoc XMLTYPE);

And now fill it with lots (say 10000) rows, with XML documents that have the following structure:

<document>
<Author>
<last_name>Ng</last_name>
<first_name>Francis</first_name>
</Author>
<body>the body of the document</body>
</document>

We can SELECT an element from the stored documents thus:

set timing on
set autotrace on explain
SELECT x.xmldoc.extract('/document/Author/last_name/text()').getStringVal() as last_name
FROM myxml x
WHERE x.xmldoc.extract('/document/Author/last_name/text()').getStringVal() like '%JON%';
[ ... output (3 rows in my case) ... ]
Elapsed: 00:00:18.45
Execution Plan
---------------------------------------------------
SELECT STATEMENT Optimizer=CHOOSE
TABLE ACCESS (FULL) OF 'MYXML'

This is wonderful from an application developer's perspective, but as you can see the performance can be horrendous .. 18 seconds to select 3 rows out of 10000 (just imagine all table scanning and xml parsing going on under the covers). But relief is available in the form of functional indexes. For the previous query we could define an index thus:

CREATE INDEX myxml_last_name_idx ON myxml(xmldoc.extract('/document/Author/last_name/text()').getStringVal());

Under 10g, we immediately see a drammatic improvement:

[ ... output (3 rows in my case) ... ]
Elapsed: 00:00:00.22

But under 9i, things initially don't change at all. We need to add an index hint before the optimiser will properly choose to use the index and deliver stunning performance:

SELECT /*+ index(x myxml_last_name_idx) */ x.xmldoc.extract('/document/Author/last_name/text()').getStringVal() as last_name
FROM myxml x
WHERE x.xmldoc.extract('/document/Author/last_name/text()').getStringVal() like '%JON%';
Elapsed: 00:00:00.12
Execution Plan
---------------------------------------------------
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=2054 Bytes=4112108)
INDEX (RANGE SCAN) OF 'MYXML_LAST_NAME_IDX' (NON-UNIQUE) (Cost=2 Card=2054 Bytes=4112108)

{in this example, note that you must use the table alias (x) in the index hint. If you use the table name, the optimiser won't get the hint}.

On metalink and the mailing lists you'll find a number of postings concerning performance of XML queries under 9i. It seems that there have been great improvements in the optimiser in 10g. As demonstrated you can solve many of the performance issues under 9i, but it may take a bit of manual tuning.
Refs:
Note:191106.1 How to use Index on XML datatype column

1 comment:

Carlos said...

Thanks a lot for you interesting post!!!
Carlos