Saturday, February 15, 2014

First attempt at a Web Worker for a Lunr.js index

Most of the applications I've been building lately are internal mobile web apps using jQuery Mobile and they function offline using the Application Cache plus a bit of Local Storage. The latest one also has a search requirement using some textual data even if the user is offline. I decided to give Lunr.js a try and it works great. The one problem I've had is that it takes a couple of seconds to index the 400+ reports I'm trying to handle on an iPad 2 running iOS 7. It's not awful, but every time that loading spinner hangs I squirm.

Enter Web Workers.

This seemed like a perfect opportunity to try out a Web Worker. The search page has a couple of other features, so I don't want the user to be forced to wait for indexing to complete to use the page and, of course, I don't want to lock up the UI at all. Index creation only has to happen once, and moving it off to a background thread until complete seemed to make sense. The only issue to keep in mind, and it comes into play in my scenario, is that data is copied from the main thread to the worker thread. I have noticed a bit of a UI hitch when that fires, but it's been a huge improvement.

Here's a stripped down example with comments. You will need to run this from a web server. You cannot run it using local file:// access as far as I know.

The HTML:

<!DOCTYPE HTML>
<html>
<head>
 <title>Web Worker with Lunr.js</title>
</head>
<body>

 <p>Building...</p>

 <script type="text/javascript" src="javascript/lib/underscore-min.js"></script>
 <script type="text/javascript" src="javascript/lib/lunr.min.js"></script>
 <script type="text/javascript" src="javascript/lib/jquery-1.11.0.min.js"></script>

 <script type="text/javascript" src="javascript/SearchIndexWorker.js"></script>
 <script type="text/javascript" src="javascript/SearchMobileModule.js"></script>
 <script type="text/javascript" src="javascript/app.js"></script>

</body>
</html>

A small app.js file to bootstrap the process:

// You'll need to run this from a web server.

var app = app || {};

// I have two search modules: mobile and desktop.
app.isMobileDevice = true;

//  We'll use local data for this example. This is a small set.
//  The benefit comes when you have several hundred of these.
var data = {
    reports: [
        { 'reportId': 1, 'reportTitle': "Jane Doe visited Company ABC to review business renewal." },
        { 'reportId': 2, 'reportTitle': "John Smith visited XYZ, Inc. to review sales over lunch." }
    ]
};

// You'll have to decide when to init the index.
// We currently do it if the user visits the reporting section of the app.
$(function() {
 app.search.mobile.init(data.reports, $('p'));
})

I have two different search modules, one for desktop users which communicates with the server to retrieve the full set of data and one for mobile users which is limited to the past 6 months of data and works offline.

(function (app, $, _, lunr) {

    // Private
    
    var reportsPointer = [];
    
    var index;

    // Public
    
    var search = {};
    
    search.setIndex = function (serializedIndex) {
        // Source: http://www.garysieling.com/blog/building-a-full-text-index-in-javascript
        index = lunr.Index.load(serializedIndex); 
    };

    search.init = function (reports, $uiNotice) {
        console.log('Attempting to init the Lunr.js index.');
        
        // We're going to hold a pointer to the reports.
        reportsPointer = reports;

        // Create the Web Worker.
        // You may see a browser error in Chrome or Safari while testing that says:
        //   'Uncaught ReferenceError: importScripts is not defined'
        // Not sure why that happens, but it works.
        var worker = new Worker("javascript/SearchIndexWorker.js");
        
        // Here we're adding a listener for messages coming back from the worker to us. 
        worker.addEventListener('message', function (evt) {
            // The evt.data property has the response from the worker.
            // The Web Worker cannot modify a global, so it passes back the index for us to use.
            search.setIndex(JSON.parse(evt.data)); 

            // We'll update the UI here for now.
            $uiNotice.html('Ready! Open the console and try something like: app.search.mobile.query("Jane")');
            console.log('The Lunr.js index is ready.');
            
            // Memory footprint will be large with all that data copied around. This kills the worker.
            worker.terminate(); 
        }, false);
        
        // Sends a message to the worker and passes it *a copy* of the data it needs. 
        // I'm sending a string to be consistent across browsers.
        worker.postMessage(JSON.stringify({ reports: reports }));
    };

    search.query = function (query) {
        if (!query) return [];

        // We're going to keep things simple and handle all the results here.
        // Searching with Lunr.js isn't really the point here.
        var lunrResults = index.search(decodeURIComponent(query));
        _.each(lunrResults, function(el, idx, list) {
            $('<pre>').text(JSON.stringify(_.findWhere(reportsPointer, { reportId: parseInt(el.ref, 10) }))).appendTo('p');
        });
    };

    app.search = app.search || {};
    app.search.mobile = search;

}(window.app = window.app || {}, jQuery, _, lunr));

Finally, the Web Worker itself.

// This is the Web Worker script. No DOM, window, or document access at all.

// Import the scripts we'll need in this worker.
importScripts('lib/lunr.min.js', 'lib/underscore-min.js');

var index = lunr(function () {
    this.field('reportTitle');
    this.ref('reportId');
});

var buildIndex = function (reports) {
    if (!reports || reports.length === 0) return;

    for (var i = 0; i < reports.length; i++) {
        index.add(reports[i], false); // Don't emit any Lunr events.
    }
};

// This is a listener on the worker for incoming messages.
self.addEventListener('message', function (evt) {
    // evt.data has the data passed to this worker.
    var data = JSON.parse(evt.data);

    if (data.reports) {
        buildIndex(data.reports);
    }

    // Now we send a message back to the script that created this worker.
    self.postMessage(JSON.stringify(index.toJSON()));

    // Memory footprint can be large with a lot of data copied around. This kills the worker.
    self.close(); 
}, false);

I hope someone else finds this useful!