What's new in lnx 0.9

🙏 It's been well over 4 months since I last posted about a new lnx release, sorry about all of the delays! It's been a really busy time for me recently.

Apologies over, lnx 0.9 is out! This is largely a set of quality of life improvements making it easier for developers to build a robust API around lnx along with more relevancy improvements and as always, just that little bit faster.

The list of notable changes includes:

  • Synonyms support was added with an easy-to-use API interface.
  • All loaded stop words for a given index can now be accessed via the API.
  • Schemas can be defined as fields to be required or optional where optional fields which are not provided, will be populated with the field's default value (for single fields this null is, for multi-value fields this is [])
  • Marking a field as fast is now a simple boolean rather than an enum single/multi removes the confusion around which option to choose. LNX will work out what variant you want :)
  • Fields now need to be marked multi: true in order to be a multi-valued field. If the field is not marked as multi-valued, but multiple values are provided, the last value in the array is taken.
  • Fast fuzzy now scores based on distance and the BM25 score, greatly improving search relevancy.

Quality of life changes

Improving the developer experience was a pretty big thing in this update, I wanted to improve the schema system so that you're able to rely on it a bit more in terms of document structure.

As mentioned above, schemas can now mark fields as optional or required, where missing optional fields will be automatically populated at query time with their defaults.

This is not all however, the scheme will now convert documents to be in line with the schema itself, rather than the original behaviour which was to return each field as an array of values.

Before 😩
{
  "status": 200,
  "data": {
    "hits": [
      {
        "doc": {
          "title": ["The Truman Show"],
          "id": ["37165"],
          "release_date": [896922000],
          "genres": [
            "Comedy",
            "Drama"
          ],
          "overview": ["Truman Burbank is the star of <<Truncated for demo>>"]
        },
        "document_id": "17921891771372870337",
        "score": 88.80075
      }
    ],
    "count": 316,
    "time_taken": 0.0002955
  }
}
After 😊
{
  "status": 200,
  "data": {
    "hits": [
      {
        "doc": {
          "title": "The Truman Show",
          "id": "37165",
          "release_date": 896922000,
          "genres": [
            "Comedy",
            "Drama"
          ],
          "overview": "Truman Burbank is the star of <<Truncated for demo>>"
        },
        "document_id": "17921891771372870337",
        "score": 88.80075
      }
    ],
    "count": 316,
    "time_taken": 0.0002955
  }
}

Relevancy improvements

Relevancy for the fast-fuzzy queries has been improved before they simply fed into the compound_lookup function and were searched like a regular text search, without really taking into account the edit distance/typos made in the word.

Now, the system will get all of the top typos for each term in the query, and add them to the query, boosted by the edit distance to get to each word, this is calculated simply by boost = max_distance - distance this now means words which are closer to the given query will be scored higher and therefore, appear higher up in search results.

lnx-cli changes

The lnx-cli tool has been updated to support the new 0.9 instances.

Typesense has also been added to the supported comparison engines for benchmarking. Though the document upload timer is woefully inaccurate for Typesense due to documents being added individually vs in one bulk upload.

Speaking at Rust London

As a round-off to the above changes, I was also invited to speak at the Rust London meetup with our good friends at Quickwit. If you fancy watching someone who's never spoken at an event before go completely over the time limit, then do look below:

What's next

The plan with 0.10 is to prioritise adding high availability to lnx in a way that doesn't heavily affect indexing performance so keep an eye out for that! Other planned improvements include scoring by edit distance for standard fuzzy queries (dependant on https://github.com/quickwit-oss/tantivy/pull/1244) and also a complete rewrite of the server itself, moving over to poem-openapi which should bring with it a much-needed improvement to the documentation and maintainability of lnx.