Tailored indexing for Umbraco Search

Tailored indexing for Umbraco Search

This week I had the privilege of being part of the latest Umbraco DevRel Deep Dive 🄳

It ended up wonderfully geekish - right up my alley.

As part of the talk with Lotte and Seb, I showed off a fair few code snippets for tweaking and tailoring content indexing in Umbraco Search. I figured it’d be appropriate to share those snippets here, alongside a little more context to the why and when.

Let’s get our geek on and explore some of the extension points in Umbraco Search šŸ¤“

Changing the default property indexing

Some property editors store numbers, some store texts. Some texts are meant for full text searching, some are meant for keyword filtering.

Umbraco Search aims to understand this kind of intent for each property editor type. To this end, it employs a concept called property value handlers. These are essentially property value converters for Search.

If you have custom property editors, you can help Search understand the intent of the stored data by creating your own property value handlers.

What’s more; if the core property value handlers don’t fulfill your search (indexing) requirements, they can easily be replaced šŸš€

For example: Search assumes that the ā€œfixed value editorsā€ from core (checkbox list, dropdown and radio button list) should be used for filtering, so the picked value(s) are indexed as keywords. This means that the picked value(s) won’t be available for free text search.

Now, say that I wanted radio button list properties to be both filterable (as keywords) and searchable. I could achieve that by implementing my own version of IPropertyValueHandler - like this:

using Umbraco.Cms.Core;
using Umbraco.Cms.Core.Models;
using Umbraco.Cms.Search.Core.Models.Indexing;
using Umbraco.Cms.Search.Core.PropertyValueHandlers;

namespace Site.Demos;

public class MyRadioButtonListPropertyValueHandler : IPropertyValueHandler
{
    // this property value handler can handle the radio button list from core
    public bool CanHandle(string propertyEditorAlias)
        => propertyEditorAlias is Constants.PropertyEditors.Aliases.RadioButtonList;

    public IEnumerable<IndexField> GetIndexFields(
        IProperty property,
        string? culture,
        string? segment,
        bool published,
        IContentBase contentContext)
        => property.GetValue(culture, segment, published) is string { Length: > 0 } value
            ? [
                new IndexField(
                    property.Alias,
                    new IndexValue
                    {
                        // index the value as both keyword for filtering
                        // and as text for full text searching
                        Keywords = [value],
                        Texts = [value]
                    },
                    culture,
                    segment)
            ]
            : [];
}

Indexing additional fields

When indexing content, Search gathers two sets of data:

  • The system fields required to make Search tick, and
  • The fields for all contained properties, using the property value handler concept described above.

This is usually just fine. But sometimes you need more index data to power search.

Perhaps you have domain specific and/or contextual data, which isn’t represented in the content model. Or maybe you need to perform up-front calculations to make specific search queries more performant or even possible.

To help you do all this, Search features content indexers. These are invoked at content level whenever a piece of content is indexed.

Let’s say you have modelled products as content in Umbraco, and that products are programmatically mapped to their respective categories - that is, the product categories are not part of the content model.

By implementing IContentIndexer, you can index the category mapping for each product for subsequent querying:

using Umbraco.Cms.Core.Models;
using Umbraco.Cms.Search.Core.Models.Indexing;
using Umbraco.Cms.Search.Core.Services.ContentIndexing;

namespace Site.Demos;

public class MyProductContentIndexer : IContentIndexer
{
    public async Task<IEnumerable<IndexField>> GetIndexFieldsAsync(
        IContentBase content,
        string?[] cultures,
        bool published,
        CancellationToken cancellationToken)
    {
        if (content.ContentType.Alias is not "product")
        {
            return [];
        }

        var categories = await GetCategories(content.Key);

        return categories.Length > 0
            ? [
                new IndexField(
                    "category",
                    new IndexValue
                    {
                        // index the product categories for filtering (faceting)
                        Keywords = categories
                    },
                    Culture: null,
                    Segment: null
                )
            ]
            : []
        ;
    }

    private async Task<string[]> GetCategories(Guid productId)
        => await Task.FromResult<string[]>(["implement", "your", "own"]);
}

Unlike property value handlers, content indexers must be registered explicitly:

using Umbraco.Cms.Core.Composing;
using Umbraco.Cms.Search.Core.Services.ContentIndexing;

namespace Site.Demos;

public class MySiteComposer : IComposer
{
    public void Compose(IUmbracoBuilder builder)
        => builder.Services.AddTransient<IContentIndexer, MyProductContentIndexer>();
}

If everything else fails…

…there’s a notification for that šŸ™ˆ

All jokes aside… Search fires the IndexingNotification just before indexing content. You should consider this a last resort when all other options have been exhausted… but it does have it’s merits in a pinch.

Manipulating property index data

You can alter all data going into the index with this notification.

For example, if your content model contains a property that must not be added to the published (public) content index, you can remove it - like this:

using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Search.Core.Notifications;
using SearchConstants =  Umbraco.Cms.Search.Core.Constants;

namespace Site.Demos;

public class MyIndexingNotificationHandler : INotificationHandler<IndexingNotification>
{
    public void Handle(IndexingNotification notification)
    {
        // only proceed if this is a notification for the published content index
        if (notification.IndexInfo.IndexAlias is not SearchConstants.IndexAliases.PublishedContent)
        {
            return;
        }

        // find the field that should be omitted from the index
        var fieldToOmit = notification
            .Fields
            .FirstOrDefault(field => field.FieldName == "secretPropertyAlias");

        if (fieldToOmit is null)
        {
            return;
        }

        // remove the field from the fields collection
        notification.Fields = notification
            .Fields
            .Except([fieldToOmit])
            .ToArray();
    }
}

Discarding the entire content

The IndexingNotification is a cancelable notification. This allows for cancelling the indexing of specific content altogether:

using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Search.Core.Notifications;
using SearchConstants =  Umbraco.Cms.Search.Core.Constants;

namespace Site.Demos;

public class MyIndexingNotificationHandler : INotificationHandler<IndexingNotification>
{
    // the ID of the super secret content type that should never be in the index
    private static readonly Guid SecretContentTypeId = Guid.Parse("6B730BA8-4560-4745-906C-FE08A2FF756C");

    public void Handle(IndexingNotification notification)
    {
        // grab the content type ID from the fields collection
        // - it is indexed a keyword value in the core "ContentTypeId" field
        var contentTypeIdAsKeyword = notification
            .Fields
            .FirstOrDefault(field => field.FieldName is SearchConstants.FieldNames.ContentTypeId)?
            .Value
            .Keywords?
            .FirstOrDefault()
            ?? string.Empty;

        // is this the super secret content type?
        if (Guid.TryParse(contentTypeIdAsKeyword, out var contentTypeId)
            && contentTypeId.Equals(SecretContentTypeId))
        {
            // yes - cancel the notification
            notification.Cancel = true;
        }
    }
}

That’s all, folks!

Yep. Those were the extension points I covered in my chat with Lotte and Seb. Potent stuff, if put to proper use šŸ’Ŗ

One thing is for sure: Umbraco Search is not meant ss a one-size-fits-all - nor should it be. Requirements will always differ from project to project, and Search definitely needs all the extension points, it can get.

I hope this sparked your interest and fueled your imagination ✨

Happy searching šŸ’œ