September 18th, 2024

What if GitHub had vector search?

GitHub is enhancing its search functionality by integrating Manticore Search for semantic capabilities, improving accuracy and relevance through vector search, and planning a hybrid model for better user experience.

Read original articleLink Icon
What if GitHub had vector search?

GitHub's traditional search functionality often struggles with providing relevant results, particularly when users search using natural language or specific queries. This limitation is especially evident in issues and pull requests, where precise details are crucial. To address this, a project utilizing Manticore Search has been developed, which incorporates semantic search capabilities. Semantic search goes beyond keyword matching by understanding the context and intent behind queries, thereby improving search accuracy and user experience. Manticore Search, an open-source search engine, supports vector search, allowing for customizable semantic search options. The integration of vector search into GitHub's issue search demo has shown promising results, enabling users to find relevant information more efficiently. This approach enhances the search experience by recognizing synonyms and related terms, thus reducing irrelevant results. While traditional keyword searches remain effective for specific queries, the future of GitHub search is likely to involve a hybrid model that combines the strengths of both semantic and keyword searches. This evolution aims to provide developers with a more intuitive and productive search experience, facilitating better collaboration and knowledge sharing across projects.

- GitHub's search struggles with natural language queries, leading to irrelevant results.

- Manticore Search introduces semantic search to improve search accuracy and context understanding.

- The integration of vector search allows for more relevant results by recognizing synonyms and related terms.

- A hybrid search model combining semantic and keyword searches is anticipated for GitHub's future.

- Enhanced search capabilities aim to boost developer productivity and collaboration.

Link Icon 3 comments
By @donhardman - 4 months
This is interesting — has anyone tried something like this with Elasticsearch or maybe Milvus/Meilisearch/Typesense? I know those are popular for vector searches, but I haven't seen anything specific for improving GitHub searches like this.

I'm curious how this compares performance-wise, especially when it comes to large repositories with tons of issues and PRs. Also, how scalable is it? I feel like semantic search has a lot of potential here, but does anyone know if GitHub itself has plans to integrate something similar?

By @costco - 5 months
This is cool but would be even cooler if it could do code similarity search. The cost would probably be prohibitively high though unless you only did like 10000 repos. https://github.com/microsoft/CodeBERT/tree/master/UniXcoder#...