What Is a Knowledge Graph?
Knowledge graphs structure real-world entities and their relationships, enhancing applications like search engines and AI. They can be tailored for specific uses, with property graph databases like Neo4j offering design advantages.
Read original articleA knowledge graph is a structured representation of real-world entities and their interconnections, typically stored in a graph database. It organizes data into nodes (representing entities) and relationships (showing how entities are connected), along with organizing principles that provide a framework for understanding the data. Knowledge graphs are valuable for various applications, including enhancing search engines, supporting real-time applications, and grounding generative AI for improved question-answering. While some may perceive knowledge graphs as complex systems that integrate vast datasets, they can also be designed for specific use cases with a narrower focus. The Google Knowledge Graph exemplifies this concept by organizing information about entities to deliver contextually relevant search results. Key components of a knowledge graph include nodes, relationships, and organizing principles, which can be simple or complex depending on the use case. Ontologies, which define concepts and relationships within a domain, can serve as organizing principles but are not always necessary. Implementing a knowledge graph in a property graph database, like Neo4j, offers advantages such as ease of design, flexibility, and superior performance for complex queries compared to alternatives like RDF databases.
- Knowledge graphs represent entities and their relationships in a structured format.
- They can be tailored for specific use cases, simplifying their design and implementation.
- Google’s Knowledge Graph is a prominent example of how knowledge graphs enhance search functionality.
- Property graph databases, such as Neo4j, are ideal for implementing knowledge graphs due to their intuitive structure and performance benefits.
- Ontologies can be used as organizing principles but are not always required for effective knowledge graph design.
Related
GraphRAG (from Microsoft) is now open-source!
GraphRAG, a GitHub tool, enhances question-answering over private datasets with structured retrieval and response generation. It outperforms naive RAG methods, offering semantic analysis and diverse, comprehensive data summaries efficiently.
Knowledge Graphs in RAG: Hype vs. Ragas Analysis
The analysis questions Microsoft's GraphRAG paper on knowledge graphs in RAG systems, revealing Neo4j's potential over FAISS in context retrieval. The study scrutinizes metrics and aims for precise quantification beyond vague claims.
Build a Knowledge Graph-Based Agent with Llama 3.1, Nvidia Nim, and LangChain
A knowledge graph-based agent using Llama 3.1 and Neo4j enhances retrieval-augmented generation by accurately retrieving drug side effects through dynamic queries and structured data from the FDA's database.
- There is a debate over the effectiveness of Neo4j and its query language, Cypher, with some users expressing frustration and preferring other standards-based solutions.
- Several commenters highlight the importance of shared ontologies and the challenges of integrating disparate data sources into a cohesive knowledge graph.
- Users share their experiences with knowledge graphs in various applications, from product searches to historical research, emphasizing their versatility.
- Some commenters advocate for alternative approaches, such as Datalog or Prolog, suggesting they may offer more flexibility than traditional graph databases.
- There is a recognition of the evolving role of knowledge graphs in conjunction with AI technologies, particularly in enhancing data retrieval and understanding.
While knowledge graphs are useful in many ways, personally I wouldn't use Neo4J to build a knowledge graph as it doesn't really play to any of their strengths.
Also, I would rather stab myself with a fork than try to use Cypher to query a concept graph when better standards-based options are available.
The overall DX is quite nice. The apoc-extended set of plugins[0] make it very seamless to work with embeddings and and LLMs during local dev/testing. The Graph Data Science package comes preloaded with a series of community detection algorithms[1] like Louvain and Leiden.
Performance has been very, very good as long as your strategy to enter the graph is sound and you've structured your graph in such a way that you can meaningfully traverse the adjacent properties/nodes.
We've currently deployed the Community edition to AWS ECS Fargate using AWS Copilot + EFS as a persistent volume. There were some kinks with respect to the docs, but it works great otherwise.
It's worth a look for any teams that are trying to improve their RAG or are exploring GRAG in general. It's not a silver bullet; you still need to have some "insight" into how to process your input data source for the graph to do its magic. But the combination of the built-in graph algorithms and the ergonomics of Cypher make it possible to perform certain types of queries and "explorations" that would otherwise be either harder to optimize or more expensive in a relational store.
[0] https://neo4j.com/labs/apoc/5/ml/openai/
[1] https://neo4j.com/docs/graph-data-science/current/algorithms...
RDFox is a tool that uses Datalog internally. RelationalAI uses a datalog based approach. Another example is Mangle Datalog, my own humble open source project that can be found on GitHub.
The language in the article about relational being "non native graph" is a bit biased. With some developer attention, there are massive opportunities to store data in a distributed manner and with te right indices querying can be fast. Though to be fair, good performance will always need developer attention.
Take a bunch of tables and covert each row into a tuple (rowkey, columnName, value). Now take the union of all the tables.
^ knowledge graph
That’s it…but it’s not very useful yet. It becomes more useful if you apply a shared ontology during the import—ie translate all the columns into the same namespace. Suppose we had a “contacts” table with columns {“first name”, “last name”, …} and a “events” table with columns {“participant given name”, “participant family name”, …} — basically you need to unify the word you use to describe the concept “first name”/“given name”/whatever across all sources.
This can be cool/useful because you now only need one table (of triples) to describe all your structured data, but it’s also a pain because you may need to perform lots of self-joins or recursive queries to recover your data in order to do useful things with it. The final table has a very simple “meta” schema, and you erase the schema from each individual source so you can push the schema into the data.
When you try to grasp any complex topic your brain starts to build and connect a fuzzy network of topics and their respective positive or negative correlations and of course the weights between the connections.
Once you have unfuzzied the picture in your head you realize that the network is active and dynamic and that this network has different "modes" of operation and that some weights and correlations can change over time, while others are always static.
Mastering the dynamics of the knowledge graph is the final step in understanding it.
LLMs are great, but knowledge graphs are IMO indispensable to tame their shortcomings.
Prolog is better than datalog in a lot of ways: CLPZ, abduction, homoiconicity, being able to choosing search strategies for different problems, tabling, etc.
There's been some work to integrate prolog with LLMs:
https://swi-prolog.discourse.group/t/llm-swi-prolog-and-larg...
https://aneeshsathe.com/2024/05/10/dancing-on-the-shoulders-...
With LLMs enabling easy, if noisy, KG creation extracting knowledge into a computable form will lead to advances.
Drug discovery already uses the tech heavily, wouldn’t be surprised if it expands to more domains quickly now.
I have not yet found an application that combines all those functions and I have been considering to build one myself.
Knowledge Graph (disambiguation) https://en.wikipedia.org/wiki/Knowledge_Graph_(disambiguatio...
Knowledge graph: https://en.wikipedia.org/wiki/Knowledge_graph :
> In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. [1][2]
> Since the development of the Semantic Web, knowledge graphs have often been associated with linked open data projects, focusing on the connections between concepts and entities. [3][4] They are also historically associated with and used by search engines such as Google, Bing, Yext and Yahoo; knowledge-engines and question-answering services such as WolframAlpha, Apple's Siri, and Amazon Alexa; and social networks
Ideally, a Knowledge Graph - starting with maybe a "personal knowledge base" in a text document format that can be rendered to HTML with templates - can be linked with other data about things with correlate-able names; ideally you can JOIN a knowledge graph with other graphs as you can if the Node and Edge Relations with Schema and URIs make it possible to JOIN.
A knowledge graph is a collection of nodes and edges (or nodes and edge nodes) with schema so that it is query-able and JOIN-able with.
A Named Graph URI may be the graphid ?g of an RDF statement in a quadstore:
?g ?s ?p ?o // ?o_datatype ?o_lang
It works great. I’m not a db expert but the flexibility and explicitness of the graph scheme clicks for me. It took me a while to come around on cypher but now that I’m there it makes sense.
A few years ago, a Good Samaritan on HN told me my bookmarks (I had about 10,000 at the time) were my "knowledge graph".
I had no idea what that was, but upon researching the concept, I was mind-blown by the simple truth of what I had been told.
Since then I became even more rapacious with my bookmarking (and especially editing their "Name" field to add tags and keywords), and I have about 30,000 bookmarks now.
And they truly are my knowledge graph. More so than the 1,000 or so text files where I store my notes on various topics. Mainly because of the Bookmarks Search feature. Let's say I want to refresh my understanding of Permutations, Combinations, Factorials (like I wanted to do, and did, yesterday). All I have to do is enter those keywords into Bookmarks Search and I instantly get the best (i.e. most relevant and intuitive for me) articles I've ever found on these topics (because I've bookmarked and tagged/keyworded them in the past). I find that more and more bookmarks end up with 404's these days, but then there's Internet Archive. Invaluable.
Which brings me to Chrome Bookmarks. There seems to have been no innovation in the last 15 years (other than the time they replaced the prior [and post/current] system with the horrible "Cards", and thankfully reversed course due to the howling protests and decided to instead offer "Cards" as a Chrome Extension [which is apparently not popular]). For example:
- One still cannot exclusively search folder names.
- One cannot search within a single folder only (by extension, one cannot search within selected multiple folders only).
- One cannot use Regex to search, or do any kind of Fuzzy Search.
- One cannot download (cache) a copy of a bookmarked page (in case the page 404s in the future) to store permanently in the some Zotero type storage system.
- When adding a new bookmark, one cannot conduct a search (using text keywords) for the folder one wants to store it in.
I realize that some of these features are available via third-party Chrome Extensions and the like. However, I stopped relying on third party anything after I relied on one for tagging bookmarks and this third party Extension decided one day to 404 on me (it shut down).
Please Google do more with Chrome Bookmarks.
Rant over.
(Addendum 1: Regarding the text files: yeah, I've tried Zettelkasten software. The one that came closest to my liking was FeatherWiki. However, in the end I continued with plain ol' text files because they're the most Lindy [plain text files are likely going to be the last format rendered unreadable by whatever destroys civilization as we know it].)
(Addendum 2: there seem to be some people out there who have uncanny storage and retrieval systems. E.g. @Balajis on X comes to mind. When he's in a debate/argument on X, he has an uncanny ability to pull out relevant contextual material, instantly, from his back pocket when the situation demands it. Let's hope they share their systems.)
Related
GraphRAG (from Microsoft) is now open-source!
GraphRAG, a GitHub tool, enhances question-answering over private datasets with structured retrieval and response generation. It outperforms naive RAG methods, offering semantic analysis and diverse, comprehensive data summaries efficiently.
Knowledge Graphs in RAG: Hype vs. Ragas Analysis
The analysis questions Microsoft's GraphRAG paper on knowledge graphs in RAG systems, revealing Neo4j's potential over FAISS in context retrieval. The study scrutinizes metrics and aims for precise quantification beyond vague claims.
Build a Knowledge Graph-Based Agent with Llama 3.1, Nvidia Nim, and LangChain
A knowledge graph-based agent using Llama 3.1 and Neo4j enhances retrieval-augmented generation by accurately retrieving drug side effects through dynamic queries and structured data from the FDA's database.