Xapian: Open source search engine library
Xapian is an open-source search engine library supporting multiple programming languages, offering advanced indexing and search capabilities. The latest version is 1.4.26, released in July 2024.
Read original articleXapian is an open-source search engine library released under the GPL v2+ license, developed in C++. It provides bindings for various programming languages, including Perl, Python (both versions), PHP, Java, Tcl, C#, Ruby, Lua, Erlang, Node.js, and R. Xapian is designed to be a flexible toolkit that enables developers to integrate advanced indexing and search capabilities into their applications. It features built-in support for multiple weighting models and a comprehensive set of boolean query operators. For users seeking a packaged search engine solution, Xapian offers Omega, an application built on its framework, which can be customized as user needs evolve. The latest stable version of Xapian is 1.4.26, released on July 18, 2024, while the previous stable version is 1.2.25, released on September 26, 2017.
- Xapian is an open-source search engine library under GPL v2+.
- It supports multiple programming languages through various bindings.
- The toolkit allows for advanced indexing and search capabilities.
- Omega is a customizable application built on Xapian for website search solutions.
- The latest stable version is 1.4.26, released in July 2024.
Related
Open Source Python ETL
Amphi is an open-source Python ETL tool for data extraction, preparation, and cleaning. It offers a graphical interface, supports structured and unstructured data, promotes low-code development, and integrates generative AI. Available for public beta testing in JupyterLab.
ScholArxiv – an open-source aesthetic and minimal research paper explorer
ScholArxiv is an open-source app on GitHub for searching, reading, and bookmarking academic papers from arXiv, featuring offline access, summaries, and community contributions under the GNU General Public License.
Usearch: Single-File Similarity Search
USearch is a high-performance similarity search engine optimized for vectors and text, supporting multiple programming languages and platforms, claiming to be up to 10 times faster than FAISS.
Show HN: Index and search *all* your documents
The doc-parser-searcher GitHub repository offers a tool for indexing and searching documents using Apache Lucene and Tika, featuring OCR capabilities, customizable settings, and requiring Java 18 for operation.
GNU APL 1.9 Released
GNU APL 1.9 has been released, featuring bug fixes and a free implementation of ISO standard 13751. Redundant formats are unsupported, and users should access updates from Savannah SVN and GIT archives.
I dream about a similar thing that can do OCR on scanned docs and extract text from my also sprawling library of epub and mobi files. If someone builds something like this, with maybe a LOCAL LLM to extract text descriptions from photos and movies as well as indexing metadata for everything, subtitles from movies and lyrics for songs, and add that to a NAS appliance, it’d be a killer.
I'd say my favourite thing about Xapian is that it's just a simple library you can embed in any app, no need for a separate database and JVM tuning. For simple usecases and small-to-medium datasets, it just works.
It's been trouble free and very performant, a real workhorse.
Imo people should use cross-platform alternatives.
Related
Open Source Python ETL
Amphi is an open-source Python ETL tool for data extraction, preparation, and cleaning. It offers a graphical interface, supports structured and unstructured data, promotes low-code development, and integrates generative AI. Available for public beta testing in JupyterLab.
ScholArxiv – an open-source aesthetic and minimal research paper explorer
ScholArxiv is an open-source app on GitHub for searching, reading, and bookmarking academic papers from arXiv, featuring offline access, summaries, and community contributions under the GNU General Public License.
Usearch: Single-File Similarity Search
USearch is a high-performance similarity search engine optimized for vectors and text, supporting multiple programming languages and platforms, claiming to be up to 10 times faster than FAISS.
Show HN: Index and search *all* your documents
The doc-parser-searcher GitHub repository offers a tool for indexing and searching documents using Apache Lucene and Tika, featuring OCR capabilities, customizable settings, and requiring Java 18 for operation.
GNU APL 1.9 Released
GNU APL 1.9 has been released, featuring bug fixes and a free implementation of ISO standard 13751. Redundant formats are unsupported, and users should access updates from Savannah SVN and GIT archives.