Thursday, May 10, 2007

Textml vs. MarkLogic, Part 1

I've been working with Ixiasoft's Textml Server for several years now. Recently I've also started working with Mark Logic's MarkLogic Server and I'm starting to notice differences -- some in Textml's favor and some in Mark Logic's favor.

Some of my knocks on MarkLogic compared to Textml are:
  • MarkLogic lacks a query parser. A simple set of expressions should be defined and accepted by a parser -- AND, OR, NOT, a near operator, some sort of frequency and priority operators, would be fine. If you need something more complex then you have to build your own, but give me something. (Truth be told, the one in Textml is a little flaky.)
  • MarkLogic lacks a common way to not index or search stopwords. Add the ability to define a list of stop words on the forest or database level.
  • MarkLogic lacks a document-focused admin interface. Textml's version of this comes in quite handy.
  • MarkLogic lacks result set counts that are both fast and accurate. I should not need to worry about whether I should use xdmp:estimate(), cts:remainder(), or fn:count() to know how many items are in my cts:search(). Just tell me. A database can do it. Textml can do it. MarkLogic needs to as well.
Some of my knocks on Textml compared to MarkLogic are:
  • Textml lacks the ability to accept a large document and search/return only part of it as a result of a search. If I have a book to load, I have to figure out what my display unit in the application is going to be (an entire chapter, a smaller section of a chapter) and break up the file ahead of time. There are all sorts of reasons why that's a problem.
  • Textml lacks XQuery support. I'm just learning XQuery now, but it's pretty darn powerful. Where's the support for it?
  • Textml lacks improvements. Maybe it's just me, but the development of new features seems stagnant.
Bottom line so far: MarkLogic Server is significantly more powerful than Textml Server. MarkLogic is more complex, and therefore more demanding for development, but it has a huge upside.

I'll do another post on this as I learn more about MarkLogic, if necessary.

UPDATE: MarkLogic recently released a new, very powerful search library. If you're reading this, you have to check out lib-search.


kelly said...

You might find the code here useful:

Mattio Valentino said...

Thanks Kelly. I'm actually having some active conversations with MarkLogic about this. I did end up writing my own simple parser in the meantime.