Showing posts with label Textml. Show all posts
Showing posts with label Textml. Show all posts

Friday, November 21, 2008

Exporting XML Files from Textml

I had a case where documents were being created and stored dynamically in Textml Server by an application, but we wanted the physical files exported. I had a ContentServer class already in place for selecting all documents in a collection and for selecting a document by file name, which would make this easier. This was going nowhere near a production server, so reusing what I had to get this done quickly was my primary concern.

There are some ways to clean this up, but the general approach should be helpful in similar situations.

Here is the method that will return a list of all documents in a collection. This gives me the file name, which I use to get the individual documents.

private List<ListItem> SelectAllTextml()
{
List<ListItem> myList = new List<ListItem>();
string textmlStandardHeader = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><query VERSION=\"3.6\" RESULTSPACE=\"RGuideAdmin\">\n";
string textmlStandardFooter = "</query>";
string textmlCollection = "<property NAME=\"collection\"><elem>" +
this.ContextAdditionalName + "</elem></property>";
string textmlFile = "<property NAME=\"NAME\"><elem><anystr/></elem></property>";
string textmlQuery = textmlStandardHeader + "<andkey>" +
textmlCollection +
textmlFile +
"</andkey>" + textmlStandardFooter;
IxiaClientServices IxiaCS = new IxiaClientServices();
IxiaServerServices IxiaSS = IxiaCS.ConnectServer(this.ContextServer);
IxiaDocBaseServices IxiaDS = IxiaSS.ConnectDocBase(this.ContextContainer);
IxiaSearchServices IxiaSearchS = IxiaDS.SearchServices;
IxiaResultSpace textmlResultSpace = IxiaSearchS.SearchDocuments(textmlQuery);
if (textmlResultSpace.Count > 0)
{
for (int i = 0; i < textmlResultSpace.Count; i++)
{
ListItem documentItem = new ListItem();
IxiaDocument document;
document = textmlResultSpace.Item(i);

MemoryStream xmlStream = new MemoryStream();
document.Content.SaveTo(xmlStream);
xmlStream.Position = 0;
XPathDocument textmlXmlDocument = new XPathDocument(xmlStream);
XPathNavigator textmlXmlNav = textmlXmlDocument.CreateNavigator();
documentItem.Text =
textmlXmlNav.SelectSingleNode("descendant::title[1]").ToString() +
" (" + document.Collection + ")";
documentItem.Value =
textmlXmlNav.SelectSingleNode("descendant::guide[1]").
GetAttribute("id", "");

myList.Add(documentItem);
}
}
return myList;
}


Here is the method that will return a document (or documents) by file name, limited to a collection.

private List<XmlDocument> SelectTextml(string fileName)
{
List<XmlDocument> myList = new List<XmlDocument>();
string textmlStandardHeader = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><query VERSION=\"3.6\" RESULTSPACE=\"RGuideAdmin\">\n";
string textmlStandardFooter = "</query>";
string textmlCollection = "<property NAME=\"collection\"><elem>" +
this.ContextAdditionalName + "</elem></property>";
string textmlFile = "<property NAME=\"NAME\"><elem>" + fileName + "<anystr/></elem></property>";
string textmlQuery = textmlStandardHeader + "<andkey>" +
textmlCollection +
textmlFile +
"</andkey>" + textmlStandardFooter;
IxiaClientServices IxiaCS = new IxiaClientServices();
IxiaServerServices IxiaSS = IxiaCS.ConnectServer(this.ContextServer);
IxiaDocBaseServices IxiaDS = IxiaSS.ConnectDocBase(this.ContextContainer);
IxiaSearchServices IxiaSearchS = IxiaDS.SearchServices;
IxiaResultSpace textmlResultSpace = IxiaSearchS.SearchDocuments(textmlQuery);
if (textmlResultSpace.Count > 0)
{
for (int i = 0; i < textmlResultSpace.Count; i++)
{
IxiaDocument document = textmlResultSpace.Item(i);
MemoryStream xmlStream = new MemoryStream();
document.Content.SaveTo(xmlStream);
xmlStream.Position = 0;
XmlDocument textmlXmlDocument = new XmlDocument();
textmlXmlDocument.Load(xmlStream);
myList.Add(textmlXmlDocument);
}
}

return myList;
}


Here's the method I used to go through each document returned in the collection list and save each to an XML file.

myContentServer = new ContentServer(Server.MapPath("~/App_Data/" + 
ddlProduct.SelectedValue + ".xml"));
List<ListItem> myGuides = myContentServer.SelectAll();
if (myGuides.Count > 0)
{
if (Directory.Exists(Server.MapPath(exportDirectory + "/" +
ddlProduct.SelectedValue)))
{
// Delete the directory and anything existing in it.
Directory.Delete(Server.MapPath(exportDirectory + "/" +
ddlProduct.SelectedValue), true);
}
Directory.CreateDirectory(Server.MapPath(exportDirectory + "/" +
ddlProduct.SelectedValue));
foreach (ListItem guide in myGuides)
{
List<XmlDocument> myGuide = myContentServer.Select(guide.Value);
XmlDocument document = myGuide[0];
// Save the file to the export directory.
document.Save(Server.MapPath(exportDirectory + "/" +
ddlProduct.SelectedValue + "/" + guide.Value + ".xml"));
divExportList.InnerHtml += "<br/>" + exportDirectory + "/" +
ddlProduct.SelectedValue + "/" + guide.Value + ".xml";
}
}

Tuesday, November 18, 2008

Weird Bug While Porting Textml Server Code from JSP to ASP.NET

This morning I was porting an old search results page accessing Textml Server from JSP to ASP.NET. One feature implemented there is search within results. We execute this by storing the original query in the session and then, when a user asks to search within results, we pull it out and re-run it so the second query can reference the first.

We have a line like this in the JSP page...
IxiaResultSpace originalResults = 
search.SearchDocuments((String)session.getAttribute("resultQuery"));
...followed by a few lines later by a line like this...
"<include TYPE=\"ResultSpace\">" + sessionID + "-ALL</include>"
All was well.

The logic of the page overall is more than a bit wonky, but we decided to port first and revise later. When done, I was getting an error that said

"vrn2nc55cxej5knnemwyzvqv-ALL is not a valid ResultSpace include /query/andkey/include at Ixiasoft.TextmlServer.ResultSpace.ExecuteQuery() at Ixiasoft.TextmlServer.ResultSpace.get_Count() at searchresults.RunSearch() in c:\Greenwood Web Sites\devsite\searchresults.aspx.cs:line 357."

What's that now?

After some trips through the debugger, poking around the documentation and some googling (no one blogs on this thing) I went back to the old method of just writing out strings to the page. Nothing jumped out as an error and nothing worked.

By sheer chance, I decided to see what the string value of the original query was so I added...
string originalQuery = originalResults.TextmlQuery;
...to the page with the intent of displaying it somewhere for review and suddenly the error stopped being thrown and the code functioned as expected. After making sure I made no other changes I tested it again. I commented out that line and the error was thrown. I put the line back in and the page ran fine. A co-worker asked if the Count property forced it work as well and it does.

I can't explain this one.

Monday, June 23, 2008

Load an XML File from the File System into Textml with ASP.NET

I'm building an internal application that needs to load an XML file from the file system into a specific repository path in Textml Server. Here's the method I used within a ContentServer class I created.

The references to various this properties are set in the same class where I get the values from a configuration file for the staging and production servers.
public bool Publish(string fileUri)
{
try
{
ArrayList documents = new ArrayList(1);
IxiaClientServices IxiaCS = new IxiaClientServices();
IxiaServerServices IxiaSS = IxiaCS.ConnectServer(this.ContextServer);
IxiaDocBaseServices IxiaDS = IxiaSS.ConnectDocBase(this.ContextContainer);
IxiaDocumentServices ds = IxiaDS.DocumentServices;
IxiaDocument document = IxiaDocument.getInstance();
FileInfo file = new FileInfo(fileUri);

document.Name = file.Name;
document.MimeType = "text/xml";
document.Collection = this.ContextAdditionalName;
document.Content = IxiaDocument.MakeBinaryContent(file.FullName);

documents.Add(document);
IxiaTextmlServerError [] err = ds.SetDocuments(documents,
(int)IxiaDocumentServices.TextmlSetDocuments.TextmlAddDocument |
(int)IxiaDocumentServices.TextmlSetDocuments.TextmlReplaceDocument |
(int)IxiaDocumentServices.TextmlSetDocuments.TextmlIndexDocument,
TextmlDocumentType.TextmlDocument);

// If there is more than one item, and that first item is
// not null or empty, return false.
if (err.Length > 1 && !String.IsNullOrEmpty(err[0].ToString()))
{
// TODO: Log each in the EventViewer
return false;
}
else return true;
}
catch (Exception ex)
{
// TODO: Log in the EventViewer
return false;
}
}

I need to do the same into MarkLogic so I'll post that snippet here as soon as it's done.

Friday, December 21, 2007

Textml vs. MarkLogic, Part 2

I had a very good conversation yesterday with John Kreisa the Director of Product Marketing for Mark Logic where we talked about the differences between Textml and MarkLogic Server -- the issues I raised in a previous post, plus a few others. One of the other topics that I forgot to mention there and is a huge plus for MarkLogic is the ability to influence relevancy ranking.

The best part of this conversation was that John initiated it. Now, I know it's a minor detail, and maybe even a little silly, but it's nice to know that they're being so proactive. None of the features we talked about may ever be added to the product, but at least I know my voice was heard.

I'm most hopeful that they add a document-focused admin console to the product. So far, it's the one thing I really miss (well, at least since I was able to build my own query parser).

UPDATE: MarkLogic recently released a new, very powerful search library. If you're reading this, you have to check out lib-search.

Sunday, May 27, 2007

Accessing an IxiaDocument Object from Textml in JSP

This is actually split across an application listener object and a JSP view page after a search result item is clicked on, which is why you'll see an Object being pulled from the session. You'll also see a reference to SearchUtilities, a search helper object.
String textmlRmiUrl = "rmi://servername:1099";
String textmlDomain = "DOMAIN";
String textmlUser = "username";
String textmlPassword = "password";
String textmlServer = "servername";
String textmlDocbase = "docbasename";
HashMap parms = new HashMap(1);
parms.put("ServerURL", textmlRmiUrl);
ClientServices cs = com.ixia.textmlserver.ClientServicesFactory.getInstance("RMI", parms); 
cs.Login(textmlDomain, textmlUser, textmlPassword);
// Note that there can be only one login per application
ByteArrayInputStream inputStream = null;
Object sessionResults = session.getAttribute( SearchUtilities.translateTab(tab));
IxiaServerServices ss = cs.ConnectServer(textmlServer); // Get the server Services
IxiaDocBaseServices docbase = ss.ConnectDocBase(textmlDocbase); // then, the DocbaseServices
IxiaSearchServices search = docbase.SearchServices(); // then, the SearchServices
IxiaResultSpace result = null; // then initialize the results space
result = (IxiaResultSpace)sessionResults;
IxiaDocument.Content ixiacontent = result.Item(searchDocId,"highlight").GetContent(); 
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
ixiacontent.SaveTo(outputStream);
inputStream = new ByteArrayInputStream(outputStream.toByteArray());
File xslreader = new File(xslpath);
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource(xslreader));
transformer.setParameter("book",x); // XSL parameter
transformer.transform(new StreamSource(inputStream), new StreamResult(out));

Accessing an IxiaDocument Object from Textml in ASP.NET

Here is just one way to access a Textml IxiaDocument object returned after doing a search.

Various parts of this are wrapped in try/catch blocks and defined in methods as is practical.
IxiaClientServices IxiaCS = new IxiaClientServices();
IxiaServerServices IxiaSS = IxiaCS.ConnectServer(textmlServer);
IxiaDocBaseServices IxiaDS = IxiaSS.ConnectDocBase(textmlDocbase);
IxiaSearchServices IxiaSearchS = IxiaDS.SearchServices;
IxiaQueryAnalyzer TextmlQueryAnalyzer = new IxiaQueryAnalyzer();
String queryEdited = TextmlQueryAnalyzer.GetXMLQueryString(queryWordsAll, "words");
String querySubmitted = textmlStandardHeader +
"<query VERSION=\"3.6\" RESULTSPACE=\"ALL\">" +
"<" + topLevelKey +">" +
textmlCollectionLae +
queryEdited +
"</" + topLevelKey + ">" +
textmlStandardSort +
textmlStandardFooter;
// Several variables defined elsewhere.
IxiaResultSpace rs = IxiaSearchS.SearchDocuments(querySubmitted); 
// The query is parsed elsewhere
// This section would be part of a loop
IxiaDocument doc = rs.Item(i, "highlight");
// Hits marked with a span of the class "highlight"
MemoryStream xmlStream = new MemoryStream();
doc.Content.SaveTo(xmlStream);
xmlStream.Position = 0;
XPathDocument textmlXmlDocument = new XPathDocument(xmlStream);
XslCompiledTransform textmlTransform = new XslCompiledTransform();
textmlTransform.Load(this.Server.MapPath("xsl/" + xslFile));
// xslFile defined elsewhere
StringWriter textmlWriter = new StringWriter();
XsltArgumentList textmlXslArg = new XsltArgumentList();
textmlXslArg.AddParam("documentLink", "", link); // XSL parameters
textmlTransform.Transform(textmlXmlDocument, textmlXslArg, textmlWriter);
divXml.InnerHtml = textmlWriter.ToString();

Thursday, May 10, 2007

Textml vs. MarkLogic, Part 1

I've been working with Ixiasoft's Textml Server for several years now. Recently I've also started working with Mark Logic's MarkLogic Server and I'm starting to notice differences -- some in Textml's favor and some in Mark Logic's favor.

Some of my knocks on MarkLogic compared to Textml are:
  • MarkLogic lacks a query parser. A simple set of expressions should be defined and accepted by a parser -- AND, OR, NOT, a near operator, some sort of frequency and priority operators, would be fine. If you need something more complex then you have to build your own, but give me something. (Truth be told, the one in Textml is a little flaky.)
  • MarkLogic lacks a common way to not index or search stopwords. Add the ability to define a list of stop words on the forest or database level.
  • MarkLogic lacks a document-focused admin interface. Textml's version of this comes in quite handy.
  • MarkLogic lacks result set counts that are both fast and accurate. I should not need to worry about whether I should use xdmp:estimate(), cts:remainder(), or fn:count() to know how many items are in my cts:search(). Just tell me. A database can do it. Textml can do it. MarkLogic needs to as well.
Some of my knocks on Textml compared to MarkLogic are:
  • Textml lacks the ability to accept a large document and search/return only part of it as a result of a search. If I have a book to load, I have to figure out what my display unit in the application is going to be (an entire chapter, a smaller section of a chapter) and break up the file ahead of time. There are all sorts of reasons why that's a problem.
  • Textml lacks XQuery support. I'm just learning XQuery now, but it's pretty darn powerful. Where's the support for it?
  • Textml lacks improvements. Maybe it's just me, but the development of new features seems stagnant.
Bottom line so far: MarkLogic Server is significantly more powerful than Textml Server. MarkLogic is more complex, and therefore more demanding for development, but it has a huge upside.

I'll do another post on this as I learn more about MarkLogic, if necessary.

UPDATE: MarkLogic recently released a new, very powerful search library. If you're reading this, you have to check out lib-search.

Friday, March 30, 2007

Query Textml for Documents by Size and Collection

I wanted to search Textml for documents over a certain size and in a particular collection to help track down some problems we were having. Here's a query that does it:
<?xml version="1.0" encoding="UTF-16"?>
<query VERSION="3.6" RESULTSPACE="R1">
<andkey>
<property NAME="Collection">
<elem>/encyclopedias/<anystr></elem>
</property>
<property NAME="Size">
<interval>
<start INCLUSIVE="True">
<number>100000</number>
</start>
</interval>
</property>
</andkey>
</query>
Note that you can also set the upper limit for the interval by using the <end> element.

Tuesday, February 20, 2007

View Results from the Textml QueryAnalyzer Object

I wasn't sure if I had an error in my code or if I uncovered a bug in Textml's QueryAnalyzer, so I wrote a little ASP.NET page where I could pass in a search string and see what QueryAnalyzer would do with it. This is how I found the bug with their .NET implementation of the choice operator.

Here are the ASPX and C# files
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="QueryAnalyzer.aspx.cs"
Inherits="QueryAnalyzer" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >
<head runat="server">
<title>QueryAnalyzer Test Page</title>
</head>
<body>
<form id="form1" runat="server">
<asp:TextBox ID="txtQuery" runat="server"></asp:TextBox>
<asp:Button ID="btnRun" runat="server" Text="Run" /><br />
<div id="divQuery" runat="server"></div>
<div id="divError" runat="server"></div>
</form>
</body>
</html>

--

using System;
using System.Data;
using System.Configuration;
using System.Collections;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using Ixiasoft.TextmlServer;
using Ixiasoft.TextmlServer.Tools;

public partial class QueryAnalyzer : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
if (Page.IsPostBack)
{
divError.InnerHtml = "";
divQuery.InnerHtml = "";
IxiaQueryAnalyzer TextmlQueryAnalyzer = new IxiaQueryAnalyzer();
try
{
string queryEdited =
TextmlQueryAnalyzer.GetXMLQueryString(txtQuery.Text, "words");
divQuery.InnerHtml = HttpUtility.HtmlEncode(queryEdited);
}
catch (Exception ex)
{
divError.InnerHtml += "<br />" + ex.ToString() +
"<br />" + ex.StackTrace.ToString() +
"<br />Passed to QueryAnalyzer: " +
HttpUtility.HtmlEncode(txtQuery.Text);
}
}
}
}

Wednesday, February 14, 2007

Searching Textml for Words with Tildes

Document searches in Textml are insensitive to case and ... in most cases ... special characters. For example, searches for "José Martí" and "jose marti" should return the same results set in the same order. But there is at least one exception to this, ñ. A search for "quinceañera" will return a different set of results when compared to "quinceanera." Ixiasoft has reported this is a feature based on requests from their Spanish-speaking partners and customers, which make sense because the two are different characters in the Spanish alphabet.

One solution is to add a hidden element to the document that contains the alternate spelling.

Another solution is to use the <oneof> element. To do this you would intercept each word at the application level and modify words with "ñ" or "n." That part of the query might look something like this:
<elem>quincea<oneof><choice>ñ</choice><choice>n</choice></oneof>era</elem>
If you have to support this particular feature, neither solution is very palatable.

UPDATE 1: If you try and pass the choice operator to the Textml QueryAnalyzer object in .NET, it will throw an exception. quincea[n,ñ]era as a search string will fail. I've submitted this bug to Ixiasoft and I'm waiting for a response.

UPDATE 2: Ixiasoft has responded that this is a bug and is due to be released in their newest package, 3.6.1.1542.