SQLite vector search extension that runs anywhere
sqlite-vec is an SQLite extension for fast vector search, supporting float, int8, and binary vectors. It is compatible with multiple platforms and easy to install across various programming languages.
Read original articleThe GitHub repository sqlite-vec is an SQLite extension that facilitates fast vector search capabilities. It is designed for efficient storage and querying of float, int8, and binary vectors, and is compatible with multiple platforms including Linux, MacOS, Windows, and browsers via WASM. The extension is written in pure C, has no dependencies, and features virtual tables for vector storage and the ability to pre-filter vectors using subqueries.
Installation of sqlite-vec is straightforward across various programming languages, including Python, Node.js, Ruby, Go, Rust, Datasette, and sqlite-utils. Users can install it using package managers like pip, npm, gem, go get, and cargo.
A sample usage demonstrates how to create a virtual table for vector examples, insert sample embeddings, and perform a query to find the closest vectors based on a specified embedding. The project is supported by Mozilla and other sponsors, which contributes to its ongoing development and maintenance. For further details, users can refer to the official documentation or the GitHub repository.
Related
DuckDB: Vector Similarity Search Extension
The vss extension in DuckDB enhances vector similarity search with HNSW indexing for ARRAY columns. Users can optimize queries with distance metrics but should be cautious due to limitations and experimental features.
Vectorlite: Fast Vector Search for SQLite
Vectorlite is a runtime-loadable extension for SQLite enabling fast vector search with hnswlib on Windows, MacOS, and Linux. It supports SIMD acceleration, various distance types, and customizable HNSW parameters. Installation via `pip install vectorlite-py` in Python is suggested for usage. The GitHub page offers examples, API references, benchmarks, and more for detailed exploration.
Show HN: SQLite Transaction Benchmarking Tool
The "sqlite-bench" GitHub project tests SQLite transaction behavior. It provides code, compilation guidelines, and running instructions. To protect SSD, run benchmarks on in-memory filesystem first. Docker image available.
Sqlitefs: SQLite as a Filesystem
The sqlitefs GitHub repository provides a tool to mount SQLite database files as filesystems on Linux and MacOS, enabling standard filesystem operations with SQLite databases for easier data manipulation.
SQLite-vec v0.1.0: a vector search SQLite extension that runs everywhere
sqlite-vec v0.1.0 is a new SQLite extension for vector search, supporting multiple programming languages and operating systems. It focuses on brute-force search, with future updates planned for ANN indexing.
- Users express eagerness to try sqlite-vec for various projects, including recommendation engines and text analysis.
- There are discussions about installation challenges and compatibility across different Python environments.
- The author, AlexG, engages with the community, providing insights into the extension's portability and performance.
- Some users share their experiences with similar tools, indicating a competitive landscape in vector search solutions.
- Questions arise regarding specific features, such as maximum vector size and potential use cases in other applications.
sqlite-vec works on MacOS, Linux, Windows, Raspberry Pis, in the browser with WASM, and (theoretically) on mobile devices. I focused a lot on making it as portable as possible. It's also pretty fast - benchmarks are hard to do accurately, but I'd comfortable saying that it's a very very fast brute-force vector search solution.
One experimental feature I'm working on: You can directly query vectors that are in-memory as a contiguous block of memory (ie NumPy), without any copying or cloning. You can see the benchmarks for that feature here under "sqlite-vec static", and it's competitive with faiss/usearch/duckdb https://alexgarcia.xyz/blog/2024/sqlite-vec-stable-release/i...
I'd like to do a benchmark to compare it with sqlite-vec, but I guess it is not a fair comparison given that sqlite-vec uses brute-force only.
One thing I'd recommend is to include recall rate in your benchmark data.
Brute force approach is a good starting point but doesn't scale with serious production workload.
I’m writing a new vector search SQLite Extension - https://news.ycombinator.com/item?id=40243168 - May 2024 (85 comments)
I've been looking for something like this for a while.
My pyenv python3.12.2's sqlite won't load extensions even after installing with what I think are the correct command line flags. Argh!
My brew installed python3.12's sqlite will load extensions though, so I can proceed.
Related
DuckDB: Vector Similarity Search Extension
The vss extension in DuckDB enhances vector similarity search with HNSW indexing for ARRAY columns. Users can optimize queries with distance metrics but should be cautious due to limitations and experimental features.
Vectorlite: Fast Vector Search for SQLite
Vectorlite is a runtime-loadable extension for SQLite enabling fast vector search with hnswlib on Windows, MacOS, and Linux. It supports SIMD acceleration, various distance types, and customizable HNSW parameters. Installation via `pip install vectorlite-py` in Python is suggested for usage. The GitHub page offers examples, API references, benchmarks, and more for detailed exploration.
Show HN: SQLite Transaction Benchmarking Tool
The "sqlite-bench" GitHub project tests SQLite transaction behavior. It provides code, compilation guidelines, and running instructions. To protect SSD, run benchmarks on in-memory filesystem first. Docker image available.
Sqlitefs: SQLite as a Filesystem
The sqlitefs GitHub repository provides a tool to mount SQLite database files as filesystems on Linux and MacOS, enabling standard filesystem operations with SQLite databases for easier data manipulation.
SQLite-vec v0.1.0: a vector search SQLite extension that runs everywhere
sqlite-vec v0.1.0 is a new SQLite extension for vector search, supporting multiple programming languages and operating systems. It focuses on brute-force search, with future updates planned for ANN indexing.