Lunr.js search powered by Umbraco

Lunr.js search powered by Umbraco

Lunr.js (or simply Lunr) is a powerful fulltext search engine, designed to run in the browser entirely without external dependencies.

In their own words, Lunr is “a bit like Solr, but much smaller and not as bright.” 🌞

For a site with somewhat limited amounts of content, Lunr is an excellent alternative to traditional, server-side based search. And being client-side only, it is particularly appealing to statically generated sites.

The Lunr site has lots of docs and guides to get you started. As an added bonus, Lunr is both open source and extensible 💖

In this post I’ll show you how Lunr can be powered by the Umbraco Delivery API, and share few little tricks I found to boost the search performance.

This GitHub repo contains everything you need to run the samples in this post.

Content for the search index

Content is required to build a search index. To this end I’ll use an Umbraco site with the Delivery API enabled and the blog sample NuGet package installed.

Creating a Lunr index

I’ve opted for using Node.js to create the Lunr index. In the following sections I’ll go through the individual parts of the Node.js script.

Fetching content for the index

I want to index the blog posts. For each blog post, I want the Lunr index to contain:

  • The ID (for reference).
  • The title (which is the blog post name in Umbraco).
  • The excerpt (which is just a bunch of lorem ipsum in the sample).
  • The tags.

I’ll also need the path of the blog post to create links from search results later on:

import dotenv from 'dotenv';

dotenv.config();

const response = await fetch(`${process.env.UMBRACO_HOST}/umbraco/delivery/api/v2/content?filter=contentType:post&fields=properties[excerpt,tags]`);
if (!response.ok) {
    throw new Error(`The Delivery API response was not OK: ${response.status}`);
}

const data = await response.json();

let id = 1;
const posts = data.items.map((item) => ({
    id: id++,
    path: item.route.path,
    title: item.name,
    excerpt: item.properties.excerpt,
    tags: item.properties.tags
}));

The script yields a posts data structure like this:

[
  {
    "id": 1,
    "path": "/building-a-community/",
    "title": "Building a community",
    "excerpt": "Maecenas ipsum dui, lobortis non dui eleifend [...]",
    "tags": ["Community", "Inspiration", "Awesome"]
  }, {
    "id": 2,
    "path": "/set-your-content-free/",
    "title": "Set your content free",
    "excerpt": "Fusce ut mauris ornare, mollis felis ac, convallis [...]",
    "tags": ["Content", "Awesome"]
  }
]

🙋 “Hey, what’s with that id assignment”, you ask? Good question!

The Delivery API returns GUIDs as IDs for all content, which is really ideal for most cases. But in this particular case, the IDs will be used by Lunr to build an inverted index.

This means the IDs will be repeated a LOT throughout the generated index.

An integer ID is quite a lot smaller than a GUID. By swapping the GUIDs with integers, the index size is reduced by a whopping 40% 🤘

Pre-building the index

As Lunr is based solely on JS, indexes can be built at runtime by the consuming clients (the browsers). But it kinda goes without saying - that’s not the most performant way to go about it.

For optimal performance, the Lunr index must be pre-built at build time. This allows for serving the index to the clients in a ready-to-go state:

import lunr from 'lunr';

const posts = [posts from the previous section];

const index = lunr(function () {
    this.ref('id');
    this.field('title');
    this.field('excerpt');
    this.field('tags');

    posts.forEach(post => {
        this.add(post);
    }, this);
});

Preparing for usage

There is one caveat with Lunr indexes: The raw index values are not stored within the index. A Lunr search result looks like this:

{
  "ref": "4",
  "score": 1.07,
  "matchData": {
    "metadata": {
      "lorem": {
        "excerpt": {}
      }
    }
  }
}

To present meaningful search results, I can’t really do without the raw index values. In other words, I need to make them available alongside the index. And that’s the last piece of the script.

First I’ll construct a raw property, which is just an index of the raw index values by post ID:

const posts = [posts from the previous section];
const index = [index from the previous section];

const raw = {};
posts.forEach(post => raw[post.id] = {
    path: post.path,
    title: post.title,
    excerpt: post.excerpt,
    tags: post.tags
});

The result looks like this:

{
  "1": {
    "path": "/building-a-community/",
    "title": "Building a community",
    "excerpt": "Maecenas ipsum dui, lobortis non dui eleifend [...]",
    "tags": ["Community", "Inspiration", "Awesome"]
  },
  "2": {
    "path": "/set-your-content-free/",
    ...
  }
}

Then I’ll combine the Lunr index and the raw index values in a single file, so the client can fetch the whole thing in a single request 👍

import {promises as fs} from 'fs';

const index = [index from the previous section];
const raw = [raw values from before];

const searchData = {raw, index};

await fs.writeFile('./public/search-data.json', JSON.stringify(searchData));

By adding the raw index values, the resulting search-data.json grows accordingly in size, but compared to the index size, the raw property is negligible.

Putting it to the test

Enough talk. Let’s see this thing in action 🚀

First and foremost, the Umbraco site needs to run in order to provide content to the index builder… so fire up a terminal in /src/Server start the site:

dotnet run

Now open up a new terminal in /src/Client and build the Lunr index with:

npm run build-search-index

You’ll find the resulting output file in /src/Client/public/search-data.json 👈

I have created a test page with a LitElement, which renders a Lunr search component based on the search-data.json file.

The Node.js setup contains an Express server to serve it all, so start that from /src/Client with:

npm run start

…and go to localhost:3000/ to try it all out 😀

So, in conclusion…

I am a sucker for technology that can be put to use without expert knowledge on the subject. And I must say I’m pretty impressed with Lunr. It’s super easy to get started, yet it boasts both a powerful query syntax and an extension API. Lunr also supports stemming for 14 languages, and you can even build your own stemmer 🌍

I will definitively be tinkering more with this in the future 🤓

If you’re able to keep the generated Lunr index at an acceptable size, I daresay this is a perfect fit for a statically generated site.

Since Lunr is an NPM module, it should also be possible to build a Lunr server with Express, when indexes grow too large to load on the client. Something to consider for a future blog post 🤔

For now - happy searching 💜