August 17th, 2024

Xapian: Open source search engine library

Xapian is an open-source search engine library supporting multiple programming languages, offering advanced indexing and search capabilities. The latest version is 1.4.26, released in July 2024.

Read original articleLink Icon
Xapian: Open source search engine library

Xapian is an open-source search engine library released under the GPL v2+ license, developed in C++. It provides bindings for various programming languages, including Perl, Python (both versions), PHP, Java, Tcl, C#, Ruby, Lua, Erlang, Node.js, and R. Xapian is designed to be a flexible toolkit that enables developers to integrate advanced indexing and search capabilities into their applications. It features built-in support for multiple weighting models and a comprehensive set of boolean query operators. For users seeking a packaged search engine solution, Xapian offers Omega, an application built on its framework, which can be customized as user needs evolve. The latest stable version of Xapian is 1.4.26, released on July 18, 2024, while the previous stable version is 1.2.25, released on September 26, 2017.

- Xapian is an open-source search engine library under GPL v2+.

- It supports multiple programming languages through various bindings.

- The toolkit allows for advanced indexing and search capabilities.

- Omega is a customizable application built on Xapian for website search solutions.

- The latest stable version is 1.4.26, released in July 2024.

Link Icon 8 comments
By @rbanffy - 8 months
I remember having used, a very long time ago, a self-hosted search engine on my library of PDFs, and it was unbelievably useful.

I dream about a similar thing that can do OCR on scanned docs and extract text from my also sprawling library of epub and mobi files. If someone builds something like this, with maybe a LOCAL LLM to extract text descriptions from photos and movies as well as indexing metadata for everything, subtitles from movies and lyrics for songs, and add that to a NAS appliance, it’d be a killer.

By @infocollector - 8 months
This project has been around and maintained for more than a decade! Small footprint, good speed. One downside might be GPL v2 for commercial use.
By @openrisk - 8 months
used also by recoll, the desktop search app: https://www.recoll.org/
By @dvdkon - 8 months
Xapian is nice. I've used it before to add interactive autocomplete to a Python web app, since my previous favourite, Whoosh, is unmaintained and somehow slower than grep on a folder (I remember it being pretty fast years back, I'd love to know what happened).

I'd say my favourite thing about Xapian is that it's just a simple library you can embed in any app, no need for a separate database and JVM tuning. For simple usecases and small-to-medium datasets, it just works.

By @donio - 8 months
Love Xapian, been using it for many years via notmuch (mail) and recoll (document indexing, mainly PDFs in my case).

It's been trouble free and very performant, a real workhorse.

https://notmuchmail.org/

https://www.recoll.org/

By @jfmc - 8 months
Xapian is used in https://www.djcbsoftware.nl/code/mu/ for indexing emails.
By @importsaas - 8 months
By @Vuizur - 8 months
I once wanted to compile a program that used Xapian on Windows. It was basically impossible for mortals.

Imo people should use cross-platform alternatives.