Thursday, May 3, 2007

Using cts:search to Search More Than One Node Level in MarkLogic Server

I had a relatively simple requirement for building a search application running against MarkLogic Server: search multiple levels of the XML hierarchy for all files in the repository and return each level as a document; the search must be case and diacritic insensitive. Here's beginnings of the query that did the trick:

define variable $ORIGINAL_QUERY as xs:string external

for $i in cts:search( //(chapter
div
entry
section),
cts:word-query(
$ORIGINAL_QUERY,
("case-insensitive", "diacritic-insensitive")
)
)
return

<result id="{ $i/@local-id }">
{
$i/( content-metadata/title ),
$i/( content-metadata/subtitle ),
$i/( content-metadata/label ),
$i/( head ),
$i/( entry-head ),
$i/( content-metadata/contributor/display-name ),
$i/( content-metadata/copyright/display-date )
}
</result>

There's a lot more work to do on this -- tokenizing words, tokeninzing quoted strings as phrases, accepting Boolean terms, returning different values based on the node, etc., etc., etc.

Thanks to the people on the Mark Logic developer email list for helping with this.

No comments: