Wednesday, February 14, 2007

Searching Textml for Words with Tildes

Document searches in Textml are insensitive to case and ... in most cases ... special characters. For example, searches for "José Martí" and "jose marti" should return the same results set in the same order. But there is at least one exception to this, ñ. A search for "quinceañera" will return a different set of results when compared to "quinceanera." Ixiasoft has reported this is a feature based on requests from their Spanish-speaking partners and customers, which make sense because the two are different characters in the Spanish alphabet.

One solution is to add a hidden element to the document that contains the alternate spelling.

Another solution is to use the <oneof> element. To do this you would intercept each word at the application level and modify words with "ñ" or "n." That part of the query might look something like this:
<elem>quincea<oneof><choice>ñ</choice><choice>n</choice></oneof>era</elem>
If you have to support this particular feature, neither solution is very palatable.

UPDATE 1: If you try and pass the choice operator to the Textml QueryAnalyzer object in .NET, it will throw an exception. quincea[n,ñ]era as a search string will fail. I've submitted this bug to Ixiasoft and I'm waiting for a response.

UPDATE 2: Ixiasoft has responded that this is a bug and is due to be released in their newest package, 3.6.1.1542.

No comments: