PostgreSQL provider for Umbraco Search

PostgreSQL provider for Umbraco Search

It’s one of those things everybody knows: Search is a NoSQL problem. Fire up an index in Elasticsearch, throw some content at it, and let it do its thing.

Relational databases can’t possibly support search beyond basic LIKE term% text queries… right?

Well. I set out to prove that wrong, by building a PostgreSQL search provider for Umbraco Search 🤓

A full-fledged search provider

I wanted a full-fledged search provider for PostgreSQL. No cutting corners, and no excuses like “…but it’s relational so that’s acceptable”.

The success criteria was to achieve full feature parity with my Elasticsearch search provider - that is:

  • Free-text querying across the four relevance tiers, with configurable boost.
  • Filtering - exact value filters and range filters (including negation).
  • Faceting - exact value facets and range facets.
  • Sorting by multiple fields and relevance score (from the relevance tiers).
  • Variance by culture and/or segment.
  • Protected content including an explicit bypass option.
  • Suggestions as an opt-in feature.
  • Multiple environments per PostgreSQL instance.

Why PostgreSQL?

I picked PostgreSQL for a few reasons:

  1. It has advanced full-text search support with tsvector and tsquery, including stemming, weighting and ranking (ts_rank), all backed by GIN indexes - an excellent match for the multi-relevance tier texts in Umbraco Search.
  2. There are a lot of hosting options, including very affordable ones.

It was also an excuse to experiment with PostgreSQL, what’s not to like?

Claude to the rescue

I had excellent help from Claude in building this search provider, and the resulting codebase bears all the hallmarks of Claude assisted coding 🤖

Normally, I’d be weary about this approach, given my lack of intimate PostgreSQL knowledge. However, my Elasticsearch search provider has an abundance of integration tests, and by porting those to the PostgreSQL implementation, functional parity was easy to track and prove 👌

A simple benchmarking program helped identify performance bottlenecks as the implementation progressed, resulting in more than a few reiterations along the way.

The result

There is no doubt that search favors NoSQL - particularly in a hyper-dynamic context like a CMS, where fields come and go in all shapes and sizes.

Functionally, the PostgreSQL search provider is entirely on par with its NoSQL counterparts. Performance… well, that’s a different matter 👀

Performance, of course, has many factors, including (but not limited to):

  • The number of documents in the index.
  • The complexity of the content model.
  • The resources available on the PostgreSQL instance.

Local benchmarks show the PostgreSQL search provider handling queries across 1000 documents with a handful of facets in ~15-20ms on average (P90). When increasing the number of documents to 5000, the average query time increases to ~60-80ms (P90).

The same benchmarks show the Elasticsearch and Typesense search providers handling queries across 1000 documents at ~6-8ms, and a significantly smaller increase in query time for 5000 documents.

So… what now?

As it turns out, it is indeed possible to create a search provider for a relational database 👏

I suppose the real question is: Would I use this myself?

Well… no. I probably wouldn’t 🙈

The PostgreSQL search provider makes sense for a small-ish site, where it can be expected to perform reasonably well. For a larger site, it’s likely a very bad fit.

The thing is… it only makes sense if you want to go out-of-process with Umbraco Search, and if you’re is on a tight budget for hosting. The latter might rhyme with smaller sites, but the need for out-of-process search probably won’t even materialize in the first place.

Also, both ExamineX and Typesense offer affordable alternatives which work with Umbraco Search - the using with the default, built-in search provider from Umbraco Search, the latter using my Typesense search provider.

Maybe it’ll be more relevant, if Umbraco some day supports PostgreSQL natively 😅

You’ll find a lot more details in the PostgreSQL search provider GitHub repo.

Happy (out-of-process) searching 💜