GraphRAG (from Microsoft) is now open-source!
GraphRAG, a GitHub tool, enhances question-answering over private datasets with structured retrieval and response generation. It outperforms naive RAG methods, offering semantic analysis and diverse, comprehensive data summaries efficiently.
Read original articleGraphRAG, a graph-based approach to retrieval-augmented generation (RAG) facilitating question-answering over private datasets, is now available on GitHub. The tool offers structured information retrieval and response generation superior to naive RAG methods. It utilizes a large language model to extract a knowledge graph from text documents, enabling semantic analysis and hierarchical data summaries. Community summaries derived from the graph aid in answering global questions comprehensively by considering the entire dataset. Evaluation against naive RAG and source text summarization shows GraphRAG's superiority in terms of comprehensiveness and diversity, with competitive performance at lower token costs. Ongoing research aims to optimize GraphRAG's efficiency and response quality, exploring methods to reduce upfront indexing costs and customize prompts. By releasing GraphRAG publicly, Microsoft Research aims to enhance data understanding at a global level and welcomes community feedback for further improvements.
Related
Surprise, your data warehouse can RAG
A blog post by Maciej Gryka explores "Retrieval-Augmented Generation" (RAG) to enhance AI systems. It discusses building RAG pipelines, using text embeddings for data retrieval, and optimizing data infrastructure for effective implementation.
Show HN: R2R V2 – A open source RAG engine with prod features
The R2R GitHub repository offers an open-source RAG answer engine for scalable systems, featuring multimodal support, hybrid search, and a RESTful API. It includes installation guides, a dashboard, and community support. Developers benefit from configurable functionalities and resources for integration. Full documentation is available on the repository for exploration and contribution.
It looks like they just provide in the prompt a number of examples that follow the schema they want [2].
[1] https://github.com/microsoft/graphrag/blob/main/graphrag/ind...
[2] https://github.com/microsoft/graphrag/blob/main/graphrag/ind...
Disclaimer: I'm the author of txtai
[1] https://neuml.hashnode.dev/introducing-the-semantic-graph
[2] https://neuml.hashnode.dev/generate-knowledge-with-semantic-...
[3] https://neuml.hashnode.dev/build-knowledge-graphs-with-llm-d...
[4] https://neuml.hashnode.dev/advanced-rag-with-graph-path-trav...
Knowledge graphs don't replace traditional semantic search, but they do unlock a whole new set of abilities when performing RAG, like both traversing down extremely long contexts and traversing across different contexts in a coherent, efficient way.
The only thing about KGs is that it's garbage-in-garbage-out and I've found my feeble attempts at using LLMs to generate graphs sorely lacking.. I can't wait to try this out.
At indexing time:
- run LLM over every data point multiple times ("gleanings") for entity extraction and constructing a graph index
- run an LLM over the graph multiple times to create clusters ("communities")
At query time:
- Run the LLM across all clusters, creating an answer from each and score them
- Run the LLM across all but the lowest scoring answers to produce a "global answer"
...aren't the compute requirements here untenable for any decent sized dataset?
For that we wrote the GraphRAG-SDK that is also generating a stable Ontology. https://github.com/FalkorDB/GraphRAG-SDK
While building the backend of this, I have focused on building a composable set of APIs suitable for machine consumption - i.e to act as agentic tools.
I was looking for a good RAG framework to process the large amount of pdfs I have crawled, so the agents can then design and run simulations. This comes at just the right time! I am looking forward to trying it out
Related
Surprise, your data warehouse can RAG
A blog post by Maciej Gryka explores "Retrieval-Augmented Generation" (RAG) to enhance AI systems. It discusses building RAG pipelines, using text embeddings for data retrieval, and optimizing data infrastructure for effective implementation.
Show HN: R2R V2 – A open source RAG engine with prod features
The R2R GitHub repository offers an open-source RAG answer engine for scalable systems, featuring multimodal support, hybrid search, and a RESTful API. It includes installation guides, a dashboard, and community support. Developers benefit from configurable functionalities and resources for integration. Full documentation is available on the repository for exploration and contribution.