September 17th, 2024

Hybrid Search with PostgreSQL and Pgvector

Hybrid search improves relevancy in vector similarity searches by combining methods in PostgreSQL with pgvector. It enhances recall, index size, and query latency, utilizing reciprocal ranked fusion for result merging.

Read original article

Hybrid Search with PostgreSQL and Pgvector

Hybrid search combines different search methods to enhance the relevancy of results in vector similarity searches. This approach is particularly useful in PostgreSQL with the pgvector extension, where it can improve key metrics like recall, index size, and query latency. Recall measures the relevancy of returned results, and boosting it often involves trade-offs with other metrics. Hybrid search employs multiple search methods, ranks the results from each, and merges them to produce a final ranking. A popular scoring method for this is reciprocal ranked fusion (RRF), which assigns weights to results based on their ranks. The blog post discusses implementing hybrid search using PostgreSQL's full-text search capabilities alongside vector similarity search. It outlines the necessary setup, including creating a PostgreSQL database, generating random data, and computing vector embeddings. The author demonstrates how to create indexes for both full-text and vector searches, and provides SQL queries to execute individual and hybrid searches. The results indicate that while vector searches can identify semantic relationships, full-text searches can pinpoint exact phrase matches, making their combination beneficial for improving search outcomes.

- Hybrid search enhances relevancy in vector similarity searches by combining multiple search methods.

- PostgreSQL with pgvector is used to implement hybrid search effectively.

- Reciprocal ranked fusion (RRF) is a key scoring method for merging results from different search methods.

- The approach can improve key metrics like recall, index size, and query latency.

- Combining vector and full-text searches can yield better results than using either method alone.

BM42 – a new baseline for hybrid search

Qdrant introduces BM42, combining BM25 with embeddings to enhance text retrieval. Addressing SPLADE's limitations, it leverages transformer models for semantic information extraction, promising improved retrieval quality and adaptability across domains.

Hybrid Search in CrateDB - ranking and scoring calculations in pure SQL

CrateDB's hybrid search enhances relevancy using kNN, BM25, and geospatial search. It integrates semantic and lexical searches, improving results in contexts like e-commerce through advanced query structuring and ranking techniques.

Postgres as a Search Engine

Postgres can function as a search engine by integrating full-text, semantic, and fuzzy search techniques, enhancing retrieval quality and allowing for effective ranking and relevance tuning within existing databases.

Understanding Pgvector's HNSW Index Storage in Postgres

The article analyzes the HNSW index in pgvector for PostgreSQL, detailing its structure, metadata, optimizations for space efficiency, and a C parser that converts the index into JSON for visualization.

PGVector's Missing Features

Trieve's blog post outlines PGVector's limitations in vector search, including issues with required words, performance, and support for sparse vectors, suggesting dedicated solutions like Trieve for advanced search needs.

1 comments

By @anshumankmr - 5 months

Hey, is there a way to do this in Langchain ? I have been trying to determine the way to do this cause I am working on a chat bot for the same.

Hybrid Search with PostgreSQL and Pgvector