July 29th, 2024

Hybrid Search in CrateDB - ranking and scoring calculations in pure SQL

CrateDB's hybrid search enhances relevancy using kNN, BM25, and geospatial search. It integrates semantic and lexical searches, improving results in contexts like e-commerce through advanced query structuring and ranking techniques.

Read original articleLink Icon
Hybrid Search in CrateDB - ranking and scoring calculations in pure SQL

CrateDB's hybrid search combines multiple search algorithms to enhance relevancy and accuracy. It supports three primary search functions: k-nearest neighbors (kNN) search, BM25 (full-text) search, and geospatial search. Hybrid search is particularly effective when integrating semantic search, which understands context, with lexical search, which focuses on keyword frequency. For instance, in an e-commerce context, a user searching for "gpu ASUS" would benefit from results that match both the product type and brand.

BM25 is a bag-of-words algorithm that ranks documents based on keyword occurrences, document length, and average document length. CrateDB utilizes Lucene's capabilities to allow for various search customizations, including fuzziness and analyzers. Vector search transforms data into dense vectors, enabling similarity calculations based on vector proximity. This method is useful for clustering and recommendations.

Hybrid search can be implemented through techniques like convex combination, which applies weighted scores from different search methods, or reciprocal rank fusion (RRF), which merges ranks without considering specific scores. Both methods aim to produce a single, more relevant result set by combining the strengths of different search approaches.

To execute hybrid searches, users can structure queries using common table expressions to join results from both search methods, allowing for a comprehensive search experience that leverages the strengths of each algorithm. This approach is particularly beneficial for applications requiring nuanced search capabilities, such as those found in e-commerce and data analytics.

Link Icon 1 comments