lnx 0.8.0 is out! Better logging, more features and cheaper to run.

lnx 0.8.0 is out and better than ever! Bringing with it more features, better relevancy and more! It also marks the 6 months lnx has existed since it's original conception as a little crappy REST api known as "cerberus" and how far it's come!

Now it's become the beast that is capable of searching 20,000 documents over 8,000 times a second and indexing a few hundred million documents in under an hour without breaking a sweat.

For full details on the update see the changelog

The highlights 🎉

  • Snapshot support added.
  • Massively customised and improved logging.
  • Fast-fuzzy correction now automatically adjusts its calculations when documents are deleted rather than being append only.
  • You can now delete documents by a query following the same logic as the search system.
  • Specific documents can now be deleted directly via the /indexes/:index/documents/:document_id route with a DELETE method.
  • Improved memory efficiency and performance for fast-fuzzy.
  • Better signal handling.

Snapshot support 📷

It's been on the map for a while but it's finally here, a super useful tool for small indexes allowing you to save and load from previous snapshots and server states.

Although this isn't necessarily recommended for large indexes as you'll find the time taking to generate the snapshot can cause issues / become excessively large, but it's up to you and your workload to decide I won't stop you :)

Snapshots in their simplest form are just a zip file of the server data with some compression, realistically you can do this manually or with some other tool however, it's nice to have the support inbuilt.

To start using snapshots it's pretty simple there are some basic commands:

  • --snapshot is a subcommand to generate a one-off snapshot of the current server state.
  • --load-snapshot <file> Loads / recreates the server state based off the given snapshot.
  • --snapshot-interval <hours> The interval time to automatically take a snapshot. This could be every 24 hours or could be 60 hours, there is one limitation to this which is the value must be between 1..255 although if you want to go beyond those ranges it might be best to look at an external tool/setup to create the snapshots.
  • --snapshot-directory <path> The output path, by default this is snapshots and is mounted relative to the current working directory.

Enjoy log in style 😎

We've moved to tracing! This grants us a huge range of improvements both to developer experience and debugging experiences as you'll see below:

The new default format

Pretty but compact.

Pretty logs

For the extra flamboyant we provide the ability to spice up your logs with --pretty-logs flag, this isn't great for automatic log ingestion, but it's much nicer to read while debugging or testing.

JSON logs

I recommend using this + the --verbose-logs flags for automatic ingestion of logs.

This provides a super clean system for ingesting logs in a common format via the --json-logs flag, if a system can parse tracing's JSON formatter it can parse lnx's logs. If combined with the --verbose-logs flag you can also generate more in depth information like thread name and ids.

New query design 🔍

One of the biggest peeves I've had with lnx's search API was that some query kinds required specific additional context but others didn't. Which made the API semi-inconsistent between how you define query kinds in the payload.

To fix this we've moved to a similar system as Elastic Search going with the
kind: { <data> } format. For example:

{
    "query": {
        "fuzzy": { "ctx": "Helo world" },
        "occur": "must"
    }
}

Delete by query and id 🗑️

Not only can you delete by terms now via the traditional /indexes/:index/documents endpoint by passing a dictionary of columns-term pairs but you can now delete by a search query and delete specific documents directly using the /indexes/:index/documents/:document_id endpoint.

Warning: Deletes via a search query are quite dangerous if misused and could lead to accidentally removing docs you didn't want to remove. Use at your own risk!

This allows you to do queries like this:

This supports the full scope of the search API. This will also respect the limits and offset specified by the query, so it may take several queries to page through and remove docs on bigger queries.

Comparing lnx to MeiliSearch and TypeSense

One of the things that we do on every release is run a set of benchmarks and stress tests on lnx to find issues with scaling and performance, this has caught several bugs that wouldn't have been found until you try to index 50GB of data in a single commit.
It also allows us to do is compare and contrast with other systems that exist.

One thing to note is testing/benchmarks are done on the tiny movies dataset -> ~18 Thousand documents (few hundred MB) datasets because indexing large datasets on TypeSense and MeiliSearch can take quite a while and also push the engines slightly out of their design range. I.e the data cannot all fit inside of memory or at least, it isn't given enough memory to allow this to happen, but mostly it's the fact that it takes a long time to run several tests on 50GB datasets with 50 million documents.

We test on 5 levels of concurrency and using the typing mode (search sent on each character addition) on the lnx-cli tool:

  • Low concurrency (50 concurrent clients)
  • Medium concurrency (125 concurrent clients)
  • High concurrency (250 concurrent clients)
  • Extreme concurrency (400 concurrent clients)

Server specs:

  • The tests are run within docker, one a single standalone node. With the following specs:
  • 10 cores provided to container (AMD EPYC™ 7702)
  • up to 64GB ram (but each service certainly shouldn't use it all)
  • 1 TB NVMe SSD (but definitely shouldn't need all of it)

First up, MeiliSearch

MeiliSearch just released their v0.25.0 engine which you can read about here so we'll use this to test against. We're also using their example movie dataset.

CLI Command:
lnx-cli bench -a http://127.0.0.1:7700 -c {concurrency} -d ./samples/movies.json -m typing -o ./out -t ./samples/simple-search-words.json --target meilisearch --threads 4

Results:

50 Concurrency (15350 Succesful  requests) ->
Avg Throughput 1069.19 req/sec  Avg Latency: 46ms

125 Concurrency (38374 Succesful  requests) ->
Avg Throughput 1176.34 req/sec  Avg Latency: 106ms Errors/Timeouts: 1 x 408

250 Concurrency (73184 Succesful requests) ->
Avg Throughput 1156.27 req/sec  Avg Latency: 210ms Errors/Timeouts: 3 x 408, 37313 x 500

400 Concurrency (116160 Succesful requests) ->
Avg Throughput 1254.73 req/sec  Avg Latency: 394ms Errors/Timeouts: 27 x 408, 37313 x 500

Overall MeiliSearch performed pretty well on the good hardware, it's got some improved performance over the previous v0.24.0 on the 400 concurrency test jumping from 1000 req/sec to 1200.

Typesense

Now, this test actually got me pretty frustrated here, I haven't run TypeSense before so I may be missing something silly, but overall trying to work out how to do stuff and do things like adding multiple documents in one go seemed like a real hassle and was quite frustrating to get working, to begin with. Mostly because the API will return a 404 when it's an invalid method instead of a 405 method not allowed status.

Notibly the issues I ran into:

  • Misleading error when I accidentally did a POST request :(
  • Adding bulk documents is a whole separate system where you have line separate JSON, and I couldn't find a way to just, give it an array of objects any other way.
  • So much of the API seems to behave weirdly when you first use it, like returning 200 OK despite being a completely invalid payload on bulk uploads. (You know how many times I sat there trying to upload the docs to find out why my bulk upload took 300ms and then just didn't store any documents!?)

This might be and probably is me being dumb and just misusing TypeSense. However, if I can run into these issues easily, I would argue that plenty of others can and I think it would be a good area to improve on even if there is a lot of client lib around for it.

CLI Command:
lnx-cli bench -a http://127.0.0.1:8108 -c {concurrency} -d ./samples/movies.json -m typing -o ./out -t ./samples/simple-search-words.json --target typesense --threads 4

Results:

50 Concurrency (15350 Succesful  requests) ->
Avg Throughput 1401.17 req/sec  Avg Latency: 35ms

125 Concurrency (38374 Succesful  requests) ->
Avg Throughput 1012.28 req/sec  Avg Latency: 125ms

250 Concurrency (68668 Succesful requests) ->
Avg Throughput 1369.76 req/sec Avg Latency: 186ms Errors/Timeouts:

When running the benchmark TypeSense starts dropping connections randomly which the benchmarker can't currently handle correctly so for the most part, you can consider this as a partial DNF. Other than the latency which is based on the successful requests.

400 Concurrency (120098 Succesful requests) ->
Avg Throughput 1090.93 req/sec  Avg Latency: 376ms Errors/Timeouts:

Surprisingly, less connection reset errors than the 250 concurrent clients/searches, so I assume TypeSense is simply unstable at anything beyond 250 concurrent clients on a single node. That being said, overall it did return more successful requests than MeiliSearch.

I was quite surprised and a little bit disappointed that TypeSense's performance wasn't a bit higher, I was especially surprised that it started dropping connections and became unstable at the 250 concurrency test, although you may not be having 250 concurrent searches/clients connect at once, this is only a tiny dataset of ~19k documents easily fitting in memory and being 10MB in size total.

I was also expecting a bit more performance because of the added hassle of setting things up like the schema but realistically MeiliSearch takes the cake in this scenario, other than indexing data a bit faster.

lnx

Now, it would probably be unfair for me to say lnx is the easiest and simplest to set up because it's not. MeiliSearch wins on the simplicity side.

However, where lnx does shine is performance and scaling, so let's see how it goes. Fully disclosure: lnx is running with fast-fuzzy enabled as this is typically the expected default configuration now.

CLI Command:
lnx-cli bench -a http://127.0.0.1:6000 -c {concurrency} -d ./samples/movies.json -m typing -o ./out -t ./samples/simple-search-words.json --target lnx --threads 4

Results:

50 Concurrency (15350 Succesful  requests) ->
Avg Throughput 7657.31 req/sec  Avg Latency: 7ms

125 Concurrency (38375 Succesful  requests) ->
Avg Throughput 7946.92 req/sec  Avg Latency: 15ms

125 Concurrency (38374 Succesful  requests) ->
Avg Throughput 8639.53 req/sec  Avg Latency: 28ms

400 Concurrency (116160 Succesful requests) + No Threadpool Adjustments ->
Avg Throughput 6301.08 req/sec  Avg Latency: 63ms

400 Concurrency (116160 Succesful requests) + W/ Threadpool Adjustments ->
Avg Throughput 8204.06 req/sec Avg Latency: 48ms

Threadpool adjustments mean that the index was rebuilt with an adjusted pool size to be more optimised/make better use of the CPU at the higher concurrency. As you see from the above, we gained nearly 2,000 extra searches a second and dropped 15ms off the latency average just by changing one field!

Now that's what I'm talking about! 0 requests dropped, maximum performance across all concurrency tests. You can push lnx harder than this but you may start running out of file descriptors unless you increase them or use another machine to add the additional load.

Plotted Results

Overall review

We can see that lnx is considerably more performant than the existing systems around. A lot of this is due to the fast-fuzzy system, which has a slightly different behaviour to traditional fuzzy searching I've explained more about it in my previous blog post however, this also changes how the system's relevancy behaves, lnx probably isn't for everyone but I would highly recommend you try the fast fuzzy system and judge how relevant it is for yourself, you may even find it does a better job on small sentences with compound errors.

Although I was in general, fairly disappointed with TypeSense, it is the only engine out of the three to support high availability out of the box which you may want / need. Although you can set lnx up to have high availability, you would need to use a complicated 3rd party / external setup with things like Kubernetes which may not be ideal.

If you just want something to be stupidly easy to set up and use straight away, MeiliSearch definitely wins on that round, the fact that it was also roughly the same performance as TypeSense but took a fraction of the setup was impressive however, TypeSense does beat out MeiliSearch on the indexing times but this may change with future improvements MeiliSearch releases.

Now, if you need something reasonably easy to set up, but can ingest hundreds of millions of documents in minutes and scale just as well for large datasets as it does for small datasets, lnx will be your friend. Unlike the others, it doesn't care/need to be capable of having all docs fit in memory to be performant and provides a significant cost efficiency gain along with that.

Thank you's

Now as always, thank you to QuickWit for producing and maintaining Tantivy which is the heart of lnx and is where 99% of its performance comes from. If you need big data full-text search QuickWit is your best friend, they just released 0.2 which you can find here