Rendition Protocol: 2009

Thursday, October 8, 2009

Backup / Export Your Google Site

Judging from this thread, the inability to easily backup/export a Google Site has been a problem for awhile. Thankfully there's a tool on Google Code you can use to grab a local copy. The tool is pretty simple, but the documentation is not entirely clear. Here's what I did for my simple site, which is private:

Host: sites.google.com
Domain: [leave blank]
Webspace: [the last part of domain name path; if the domain name is http://sites.google.com/site/mysite123, this value would be just mysite123]
Import/Export Revisions: [I left this unchecked]
Username: [my Google account, with @gmail.com at the end]
Password: [the obvious]
Choose Target Directory: [the obvious]

Here's Google's blog post that talks about the issue a bit more.

Sunday, September 27, 2009

MVC Storefront Collected Links

There are videos and posts out there about the MVC Storefront sample application for ASP.NET MVC 1.0, but they're scattered around a bit.

This list will mix the original posts from Rob Conery's blog, Weke Road, and the ASP.NET MVC video collection and some Rob's related posts.

MVC Storefront Part 19a: Windows Workflow Followup

[video]

The source code is out on Codeplex.

Friday, July 31, 2009

Unique Attribute Values Across Multiple Documents using XQuery

It's a little slow, but here's one way to get a list of all the unique attribute values across multiple XML documents using XQuery.


let $raw-values := 
  for $book in collection("abc")/(gbook|set)[@type='oeb']
  return 
    element { "book" } 
    { 
      for $value in distinct-values($book//node()/@class)
      return element { "class" } { $value }
    }
for $item in distinct-values($raw-values//class)
order by $item
return element { "uniques" } { $item }

Friday, July 10, 2009

MarkLogic XCC Layer File Open Errors

If you have library modules you're importing, the query may work fine in cq, but if you try to use the same query via the XCC layer you may get "File Open Error" messages.

One cause of this for me was the pathing in the import statement. cq seems to handle a relative path while XCC cannot, at least in MarkLogic 4.1.

I needed to change from...

import module namespace my = "http://blah.com" at "search-parser-xml.xqy",
"search-snippet.xqy";

...to...

import module namespace my = "http://blah.com" at "/search-parser-xml.xqy",
"/search-snippet.xqy";

Monday, July 6, 2009

MarkLogic, cq and Namespaces

If you import an XQuery library in cq and declare the namespace, cq gets fussy if you then try to declare your own functions. I know there are clear reasons for this, but here's what I do so I can use my own functions during testing.


xquery version "1.0-ml";

import module namespace search = "http://marklogic.com/appservices/search" 
  at "/MarkLogic/appservices/search/search.xqy";

declare namespace my="http://www.my-web-site.com/xquery";

declare variable $options-title := 
<options xmlns="http://marklogic.com/appservices/search">  
  <searchable-expression>
    collection("abc123")//(div)
  </searchable-expression>
  <transform-results apply="snippet">
    <per-match-tokens>30</per-match-tokens>
    <max-matches>1</max-matches>
    <max-snippet-chars>200</max-snippet-chars>
    <preferred-elements/>
  </transform-results>
</options>;

declare function my:do-search()
{
  search:search("food", $options-title, (), 25)
};

my:do-search()

Saturday, July 4, 2009

Saxon, Command Line, C#, and XSL 2.0

I've been using Xalan/Xerces for command line XSL transformations for years, but I've been moving farther away from Java over the years, so I wanted something .NET compatible and I wanted something XSL 2.0 compatible. I finally switched to Saxon.

I normally use the standard XML objects in my ASP.NET apps, but I'll switch to Xalan command line tools when I need the "write" extension. I can do the same with Saxon now.

C:\Program Files\Saxon.NET>bin\Transform SaxonTest.xml SaxonTest.xsl

Here is the C# code to call an XSL transformation using Saxon. This one may seem a little odd because the code doesn't save any file since what I'm doing is splitting a large XML file into multiple small files using <xsl:result-document>.

// Create a Processor instance.
Processor p = new Processor();
// Load the source document.
XdmNode node = p.NewDocumentBuilder().Build(new Uri(file));
// Create a transformer for the stylesheet.
XsltTransformer transformer = p.NewXsltCompiler().Compile(myStream).Load();
// Set the root node of the source document to be the initial context node.
transformer.InitialContextNode = node;
// BaseOutputUri is only necessary for xsl:result-document.
transformer.BaseOutputUri = new Uri(file);
// Create a serializer.
Serializer serializer = new Serializer();
transformer.Run(serializer);

Here is the stylesheet I used to leverage the XSL 2.0 equivalent of xalan:write.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
 version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:fo="http://www.w3.org/1999/XSL/Format"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:fn="http://www.w3.org/2005/xpath-functions"
 exclude-result-prefixes="fo xs fn">

  <xsl:output method="xml" indent="yes" encoding="UTF-8" name="xmlFormat"/>

  <xsl:template match="text()" />

  <xsl:template match="/">
    <xsl:for-each select="//node()[@fragment='true']">
      <xsl:variable name="filename" select="concat( /gpg-book/@local-id, '/', @local-id, '.xml' )"/>
      <xsl:result-document href="{$filename}" format="xmlFormat">
        <pcu-gpg-book>
          <xsl:copy-of select="/gpg-book/taxonomy_pcu"/>
          <xsl:copy-of select="/gpg-book/content-metadata"/>
          <xsl:copy-of select="/gpg-book/print-pub-metadata"/>
          <xsl:copy-of select="parent::node()"/>
        </pcu-gpg-book>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

Tuesday, June 9, 2009

Find Duplicate IDs in a Table with SQL

This is a simple way to find all the rows that have duplicate IDs / values in a table.


SELECT entryid, Count(entryid) AS ecount 
FROM filelocation 
GROUP BY entryid
HAVING (Count(entryid) > 1) -- This runs after the aggregate count() function

Thursday, May 7, 2009

Getting Unique XML Element Values with XSL 1.0

Today I needed to munge some dirty XML data. I still haven't taught myself XSL/XPath 2.0 yet, so I was limited to XSL 1.0 for now. The data I had looked like this, only much, much worse.

<Subject>Value 1|Value 2</Subject>
<Subject>Value 1|Value 2</Subject>
<Subject>Value 1|Value 2</Subject>
<Time>Time Value 1</Time>
<Time>Time Value 1</Time>
<Time>Time Value 1</Time>
<Subject>Value 3|Value 4</Subject>
<Subject>Value 3|Value 4</Subject>
<Subject>Value 3|Value 4</Subject>
<Time>Time Value 2</Time>
<Time>Time Value 2</Time>
<Time>Time Value 2</Time>

I wanted two things out of that series of elements: unique strings and the value before the |. As I type this, I realize there may be a bit of a bug here, but I'll have to test it out. Here's what I did for the series of subject elements. Can you spot the bug? ;-)

<xsl:variable name="subjects" select="/fragment/index-only-subjects//Subject[not(text()=preceding-sibling::Subject/text())]/text()"/>
  <xsl:for-each select="$subjects">
    <xsl:sort select="." data-type="text"/>
      <subject>
        <xsl:choose>
          <xsl:when test="contains(.,'|')">
            <xsl:value-of select="substring-before(.,'|')"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="."/>
          </xsl:otherwise>
        </xsl:choose>
      </subject>
  </xsl:for-each>

I'm pretty sure there's a way to do this with xsl:keys / key(), but I got this solution working first.

Sunday, May 3, 2009

Centering a div

This approach will center a <div> in a scenario where you want to have the body of your page centered on the screen, but leave the text left aligned.

body
{
 text-align: center;
 align: center;
}
 
#container
{
 text-align: left;
 margin: 0 auto;
}

align:center / margin: 0 auto are for IE/FF.

Saturday, April 4, 2009

See Unencrypted DTSRun Commands

If you come across encrypted DTSRun commands, here's a way to decrypt them.

Drop down to the command prompt and paste in the full DTSRun string:

C:\>DTSRun /~ZThisWouldBeYourBigUglyDTSRunCommandString /!X /!C

Everything after C:\> is directly from the Job's Command text box. You have to add /!X and /!C to the end. Hit Enter to run it and the results then end up in your "paste" buffer, so you can just Ctrl+V into Notepad or wherever.

This seems to fail if the DTS is connecting to a server not setup on your machine. You may see something like the error below. Just add the remote machine to Enterprise Manager.

Error: -2147467259 (80004005); Provider Error: 17 (11)
Error string: [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied.
Error source: Microsoft OLE DB Provider for SQL Server
Help file:
Help context: 0

Thanks to the guys over at RDA Blogs for posting about this.

T-SQL SPROC to Re-Create a Full Text Catalog

I cannot claim any ownership over this code. It's a colleague's, but it's so handy that I have to document it somewhere.

This will create an SPROC to re-create a full text catalog, which of course can then be tied to a Job.


CREATE PROCEDURE p_RecreateFT_Articles
 
AS 
 
ALTER TABLE [dbo].[Articles] WITH NOCHECK ADD 
 CONSTRAINT [PK_Articles] PRIMARY KEY  CLUSTERED 
 (
  [Article_ID]
 )  ON [PRIMARY]
 
IF EXISTS (SELECT * FROM dbo.sysfulltextcatalogs WHERE NAME = 'FT_Articles')
EXEC sp_fulltext_catalog 'FT_Articles', 'drop'

-- Create the catalog if it doesn't already exist.
IF NOT EXISTS (SELECT * FROM dbo.sysfulltextcatalogs WHERE NAME = 'FT_Articles')
EXEC sp_fulltext_catalog 'FT_Articles', 'create'

-- Add the full text index to the table 
EXEC sp_fulltext_table '[dbo].[Articles]', 'create', 'FT_Articles', 'PK_Articles'

-- Add the columns to the full text index 
EXEC sp_fulltext_column '[dbo].[Articles]', 'ArticleTitle', 'add' 
EXEC sp_fulltext_column '[dbo].[Articles]', 'ArticleBody', 'add'

-- Activate the index
EXEC sp_fulltext_table '[dbo].[Articles]', 'activate' 

-- Start population
EXEC sp_fulltext_catalog   'FT_Articles', 'start_full'

GO

Tuesday, March 31, 2009

T-SQL INSERT From One Table Into Another

This is just a simple example of doing an INSERT INTO one table from another, with a hard-coded value thrown into the mix.


INSERT INTO ProductSubjects  (ID, ProductID, SubjectID)
SELECT NEWID(), '52EB01B6-B768-4F9B-8F60-1A695A13D945', Subjects.ID 
FROM Subjects

T-SQL SELECT Using CASE and String Concatenation

Here I'm selecting the ID field followed by a concatenation of 3 different subject fields, depending on whether or not the fields are NULL.


SELECT
  ID, 
  CASE 
  WHEN Subject2 IS NULL THEN 
    Subject1
  WHEN Subject2 IS NOT NULL AND Subject3 IS NULL THEN 
    Subject1 +  ' : ' + Subject2
  ELSE Subject1 + ' : '  + Subject2 + ' : '  + Subject3
  END AS TheSubject
FROM Subjects
ORDER BY TheSubject

Saturday, March 28, 2009

Seeing Hourly Traffic with Google Analytics

Here's how to create a report in Google Analytics that will give you an hourly traffic report for your site.

Go to Custom Reporting
Click "Creat new custom report"
Open the "Site Usage" section on the left and drag Pageviews (or anything else you're looking for) over to one of the blue metric sections on the right
Open the "Visitors" section on the left and drag "Hour of the day" to the green dimension section on the right
Click edit next to the preset title and give your report a title if you want
Click Create Report

Sunday, February 8, 2009

Highlight a DIV Element onClick

This is just a quick script to highlight a DIV element when a user clicks on an enclosed link. It's a little small/silly, but I've had more than request for something like this come in.


<html>
<head>
<title>Highlight</title>
<style type="text/css">
 .highlight { background-color:#ffff00; }
</style>
</head>
<body>
<div id="div1">Row one <a href="javascript:changeHighlight('div1');">HIGHLIGHT</a></div>
<div id="div2">Row two <a href="javascript:changeHighlight('div2');">HIGHLIGHT</a></div>
<div id="div3">Row three <a href="javascript:changeHighlight('div3');">HIGHLIGHT</a></div>
<input type="hidden" id="currentDiv" />
<script language="javascript" type="text/javascript">
function changeHighlight(myElement)
{
 document.getElementById(myElement).className = 'highlight';
 var lastHighlight = document.getElementById('currentDiv').value;
 if(lastHighlight != '')
 {
  document.getElementById(lastHighlight).className = '';
 }
 document.getElementById("currentDiv").value = myElement;
}
</script>
</body>
</html>

UPDATE: This is OLD. You should look for a better solution.

Monday, January 19, 2009

Using PDFBox in C#

Last week I was trying to extract text from PDF files in an automated fashion. After some searching I found a CodeProject.com article describing how to use PDFBox in C#. As of this writing, the DLLs needed for the C# version are only in the old SourceForge version, which seems to be behind the Apache Incubator current version in Java.

For the SourceForge version, I created a console application, added references to FontBox-0.1.0-dev, ICSharpCode.SharpZipLib, IKVM.AWT.WinForms, IKVM.GNU.Classpath, IKVM.Runtime, and PDFBox-0.7.3.

Here's the sample code I used:


...
using org.pdfbox.pdmodel;
using org.pdfbox.util;
using System.IO;
...
static void Main(string[] args)
{
  // All the error catching is left for you to do.
  string[] files = {
   @"C:\9780791093757.pdf",
   @"C:\9780791095850.pdf",
   @"C:\9780816048526.pdf"
  };
  bool UseIndividualPages = false;

  foreach (string s in files)
  {
    string textFileDocument = Path.GetDirectoryName(s) + Path.DirectorySeparatorChar +
     Path.GetFileNameWithoutExtension(s) + ".txt";
  
    PDDocument pdfDocument = PDDocument.load(s);   
    PDFTextStripper pdfStripper = new PDFTextStripper();
    pdfStripper.setPageSeparator(Environment.NewLine + Environment.NewLine);
   
    if (UseIndividualPages) // Extracts one file per page in the PDF
    {
      for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
      // getNumberOfPages() is 1-based.
      {
        pdfStripper.setStartPage(i);
        pdfStripper.setEndPage(i);
        string textFilePage = Path.GetDirectoryName(s) + Path.DirectorySeparatorChar +
          Path.GetFileNameWithoutExtension(s) + " " + i.ToString() + ".txt";
        ExtractText(pdfStripper, pdfDocument, textFilePage);
      }
    }
    else // Extracts one file for the entire PDF
    {
      ExtractText(pdfStripper, pdfDocument, textFileDocument);
    }

    pdfDocument.close();   
  }

  Console.ReadLine();
}

static void ExtractText(PDFTextStripper textStripper, PDDocument document,
  string outputFile)
{
  if (File.Exists(outputFile)) File.Delete(outputFile);
  using (StreamWriter sw = new StreamWriter(outputFile))
  {
    sw.Write(textStripper.getText(document));
    Console.WriteLine(outputFile);
  }
}

This works well enough except for one substantial problem: end-of-line hyphens are included. In the PDF, the text is flowed so long words at the end of sentences are hyphenated. When I copy and paste the text out of Adobe Reader, the hyphens are left out and the words are whole. When I use PDFBox, the hyphens appear. I expected those characters to be special characters of some sort or another, but they are regular hyphens, which means I can't even remove them with regular expressions. Unfortunate.

I haven't found a way around this problem yet.

Sunday, January 18, 2009

Optimum WiFi on Metro-North's New Haven Line

I'm on the Metro-North New Haven line quite a bit these days. I thought having access to Optimum WiFi would be a boon, but that hasn't been the case. In nearly all cases, you have to be right on the platform of each station to get access, which is of little use on the train itself.

Here are some other general notes:

If you allow automatic connections, the network will connect by itself at each staion. I found it reconnected faster when I refreshed the network list.
You do not seem to need to login at each hotspot after you've logged in the first time.
The connection is not secure, so if you're not on a secured site, all the network traffic can be sniffed easily.
At most stations, I found I have just enough time to connect and do 1 Google search or refresh a page before the train starts moving and the signal cuts out.

Here are some comments on when you can connect at various stations:

Greens Farms, on the platform only.
Westport, on the platform only.
East Norwalk, on the platform only, but signal very poor.
South Norwalk, immediately approaching platform.
Rowayton, on the platform only.
Noroton Heights, on the platform only.
Stamford, immediately approaching platform, but then signal cuts out under the building.
Riverside, approaching platform and past it, but the signal is weak.
Greenwich, immediately approaching platform.

Saturday, January 17, 2009

You'll Get the Idea

I'll refine this over time.

CCI Mini-Mag HP HV (36 grain): Very inconsistent report, but no jams.

Federal Game Shok HV (40 grain): Clean and consistent. No jams.

Remington Yellow Jacket HP (33 grain): Clean and consistent. No jams.

Remington 22 Thunderbolt (40 grain): ?

Winchester Super X Power Point HP (40 grain): ?

Rendition Protocol