Monday, November 24, 2008

Loading XML into eXist Using XQuery and the Sandbox

This past weekend I was tinkering with the eXist XML database. The installation went fine and some of their sample queries ran fine. My next step was to load some of my content into it.

Rather than use their web interface or desktop client, I wanted to load the documents using XQuery through their sandbox application. I thought this would be quick and easy and would allow me to compare some features of eXist to MarkLogic Server.

There is quite a bit of documentation for eXist, but the XQuery API is light on specific usage examples. I also ran into some non-obvious gotchas. Here is the XQuery code that I used to load a document into a specific collection, along with some notes below.
declare namespace xmldb="";
declare variable $file as xs:string {
"file:///C:/Program%20Files/eXist/samples/mattio/sample.xml" };
declare variable $name as xs:string { "sample.xml" };
declare variable $collection as xs:string { "/db/test/" };

let $collection-status :=
if(not(xmldb:collection-exists($collection))) then
xmldb:create-collection("", $collection)
else ("Collection already exists.")
return <collection-status> { $collection-status } </collection-status>
let $load-status := xmldb:store($collection, $name, xs:anyURI($file))
return <load-status> { $load-status } </load-status>
When I was trying to use C:\ to start my path or when I was leaving out xs:anyURI(), I was getting a misleading error that implied there was something wrong with my document. The error was:

XMLDB reported an exception while storing documentorg.xmldb.api.base.XMLDBException: fatal error at (1,1) : Content is not allowed in prolog. [at line 120, column 21] In call to function: sandbox:exec-query(xs:string) [134:10]

Here are some other notes.

  1. Note that the xmldb namespace needs to be declared.

  2. Note the syntax of $file. This is how you reference a document on your file system, including encoding the path to use %20 instead of a space.

  3. Note that $file must be wrapped in xs:anyURI() when used in xmldb:store() in order to force it to be considered a URI and not a simple string.

Thanks to Dannes and Wolfgang for their help with this. They were on the exist-open list on a Saturday.

Next up I'll load about 50 large documents to build some basic queries to review index tuning.

No comments: