August 23rd, 2024

Show HN: No-Code ETL Framework for Vector Databases

VectorETL is a lightweight ETL framework for converting data into vector embeddings, supporting various sources and databases, with features for batch processing, easy installation, and detailed documentation for users.

Read original articleLink Icon
Show HN: No-Code ETL Framework for Vector Databases

VectorETL is a lightweight ETL framework tailored for vector databases, enabling the conversion of various data sources into vector embeddings for storage in multiple vector databases. It is particularly suited for applications involving semantic search and recommendation systems. The framework features a modular architecture that accommodates diverse data sources, embedding models, and vector databases, including support for batch processing and configurable chunking for text data. Installation can be done via pip or directly from its GitHub repository. Users can run the ETL process using a command that references a configuration file, which is structured into three sections: source, embedding, and target. An example configuration illustrates how to extract data from a PostgreSQL database, embed it using an OpenAI model, and store it in Pinecone. The project encourages contributions and offers detailed documentation for users seeking to implement or improve vector search capabilities.

- VectorETL is designed for converting data into vector embeddings for vector databases.

- It supports various data sources, embedding models, and vector databases.

- The framework allows for batch processing and configurable chunking of text data.

- Installation is straightforward via pip or GitHub.

- Contributions to the project are welcomed, and comprehensive documentation is available.

Related

Show HN: txtai: open-source, production-focused vector search and RAG

Show HN: txtai: open-source, production-focused vector search and RAG

The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.

txtai: Open-source vector search and RAG for minimalists

txtai: Open-source vector search and RAG for minimalists

txtai is a versatile tool for semantic search, LLM orchestration, and language model workflows. It offers features like vector search with SQL, topic modeling, and multimodal indexing, supporting various tasks with language models. Built with Python and open-source under Apache 2.0 license.

SQLite-vec v0.1.0: a vector search SQLite extension that runs everywhere

SQLite-vec v0.1.0: a vector search SQLite extension that runs everywhere

sqlite-vec v0.1.0 is a new SQLite extension for vector search, supporting multiple programming languages and operating systems. It focuses on brute-force search, with future updates planned for ANN indexing.

SQLite vector search extension that runs anywhere

SQLite vector search extension that runs anywhere

sqlite-vec is an SQLite extension for fast vector search, supporting float, int8, and binary vectors. It is compatible with multiple platforms and easy to install across various programming languages.

I built a vector embedding database in Go for learning purposes

I built a vector embedding database in Go for learning purposes

VecDB is a vector embedding database for educational and production use, utilizing a key-value model for vector storage, supporting raw vector and text embedding operations, with customizable server configuration.

Link Icon 0 comments