July 2nd, 2024

GraphRAG (from Microsoft) is now open-source!

GraphRAG, a GitHub tool, enhances question-answering over private datasets with structured retrieval and response generation. It outperforms naive RAG methods, offering semantic analysis and diverse, comprehensive data summaries efficiently.

Read original articleLink Icon
GraphRAG (from Microsoft) is now open-source!

GraphRAG, a graph-based approach to retrieval-augmented generation (RAG) facilitating question-answering over private datasets, is now available on GitHub. The tool offers structured information retrieval and response generation superior to naive RAG methods. It utilizes a large language model to extract a knowledge graph from text documents, enabling semantic analysis and hierarchical data summaries. Community summaries derived from the graph aid in answering global questions comprehensively by considering the entire dataset. Evaluation against naive RAG and source text summarization shows GraphRAG's superiority in terms of comprehensiveness and diversity, with competitive performance at lower token costs. Ongoing research aims to optimize GraphRAG's efficiency and response quality, exploring methods to reduce upfront indexing costs and customize prompts. By releasing GraphRAG publicly, Microsoft Research aims to enhance data understanding at a global level and welcomes community feedback for further improvements.

Link Icon 21 comments
By @sansseriff - 7 months
I find it interesting that their entity extraction method for building a knowledge graph does not use or require one of the 'in-vogue' extraction libraries like instructor, Marvin, or Guardrails (all of which build off of pydantic). They just tell the llm to list graph nodes and edges in a list, and do some basic delimiter parsing, and load the result right into a networkx graph [1]. Is this because GPT-4 and the like have become very reliable at following specific formatting instructions, like a certain .json schema?

It looks like they just provide in the prompt a number of examples that follow the schema they want [2].

[1] https://github.com/microsoft/graphrag/blob/main/graphrag/ind...

[2] https://github.com/microsoft/graphrag/blob/main/graphrag/ind...

By @a_wild_dandan - 7 months
I am ecstatic that Microsoft open sourced this. After watching the demo video[1], my mind raced with all of the possibilities that GraphRAG unlocks. I'm planning to try GraphRAG + Llama3 on my MacBook, since it has 96GB of unified (V)RAM. I think this tool could be a legit game changer.

[1] https://www.youtube.com/watch?v=r09tJfON6kE

By @glesperance - 7 months
For those like me that were looking for something more substantial on the GraphRag Method -> https://arxiv.org/pdf/2404.16130
By @dmezzetti - 7 months
txtai has been working in the graph-vector space since 2022. Building semantic graphs with vector similarity for example. [1] [2] [3] [4]

Disclaimer: I'm the author of txtai

[1] https://neuml.hashnode.dev/introducing-the-semantic-graph

[2] https://neuml.hashnode.dev/generate-knowledge-with-semantic-...

[3] https://neuml.hashnode.dev/build-knowledge-graphs-with-llm-d...

[4] https://neuml.hashnode.dev/advanced-rag-with-graph-path-trav...

By @laborcontract - 7 months
I've been waiting for this.

Knowledge graphs don't replace traditional semantic search, but they do unlock a whole new set of abilities when performing RAG, like both traversing down extremely long contexts and traversing across different contexts in a coherent, efficient way.

The only thing about KGs is that it's garbage-in-garbage-out and I've found my feeble attempts at using LLMs to generate graphs sorely lacking.. I can't wait to try this out.

By @dweinus - 7 months
If I understand the paper right...

At indexing time:

- run LLM over every data point multiple times ("gleanings") for entity extraction and constructing a graph index

- run an LLM over the graph multiple times to create clusters ("communities")

At query time:

- Run the LLM across all clusters, creating an answer from each and score them

- Run the LLM across all but the lowest scoring answers to produce a "global answer"

...aren't the compute requirements here untenable for any decent sized dataset?

By @gkorland - 7 months
The GraphRAG project is great and really shows the why Vector Databases can provide a full RAG solution when it comes to none trivial search queries. But in order to build a full an accurate Knowledge Graph we found out you need more than just loading the text to LLM.

For that we wrote the GraphRAG-SDK that is also generating a stable Ontology. https://github.com/FalkorDB/GraphRAG-SDK

By @throwaway4aday - 7 months
This is awesome! I've done a lot of little projects exploring the use of graphs with LLMs and it's great to see that this approach really pays off. Stupid me for trying to prematurely optimize when the solution is just prompt engineering and burning a bunch of tokens on multiple passes. Going to give this a try and see if my jaw drops. If it's as good as it looks then I'll have to put in the work to get it out of Python-land.
By @michaelnny - 7 months
Anyone know how to use GraphRAG to build the knowledge graph on a large collection of private documents, where some might have complex structure (tables, links to other docs), and the content or terms in one document could be related to other documents as well?
By @darksaints - 7 months
LlamaIndex has something called the Knowledge Graph RAG Query engine. Is this related in any way?
By @ftkftk - 7 months
I've been looking forward to playing with this since reading the paper. I was considering implementing it myself based on the paper but I figured the code would just be a few weeks behind and patience did indeed pay off :)
By @boywitharupee - 7 months
how different is this compared to Facebook's open-source tool Faiss[1]?

[1] https://github.com/facebookresearch/faiss/

By @loufe - 7 months
I find the choice of the Russo-Ukrainian war an interesting choice of topic as an example. I could see it being an intentional choice as a means to target military data analysis contracts.
By @bitsinthesky - 7 months
So can someone explain how this is different/superior to Raptor RAG? I don't have the current concentration to figure it out for myself...
By @shreezus - 7 months
This is great - I have been interested in KG-enhanced RAG for some time and think there is a lot of potential in this space!
By @fumeux_fume - 7 months
Knowledge graph or just a graph? My fear is that the term is being borrowed to help hype more AI products.
By @ravi1krkr - 7 months
can we use the Graph RAG with ollama and other opensource embedding models instead of openai and azureopenai
By @piotrrojek - 7 months
Does anyone know how to run it against Ollama?
By @justanotheratom - 7 months
does it support multi-tenancy?
By @malux85 - 7 months
I’m the creator of https://atomictessellator.com

While building the backend of this, I have focused on building a composable set of APIs suitable for machine consumption - i.e to act as agentic tools.

I was looking for a good RAG framework to process the large amount of pdfs I have crawled, so the agents can then design and run simulations. This comes at just the right time! I am looking forward to trying it out