August 16th, 2024

What Is a Knowledge Graph?

Knowledge graphs structure real-world entities and their relationships, enhancing applications like search engines and AI. They can be tailored for specific uses, with property graph databases like Neo4j offering design advantages.

Read original articleLink Icon
CuriositySkepticismFrustration
What Is a Knowledge Graph?

A knowledge graph is a structured representation of real-world entities and their interconnections, typically stored in a graph database. It organizes data into nodes (representing entities) and relationships (showing how entities are connected), along with organizing principles that provide a framework for understanding the data. Knowledge graphs are valuable for various applications, including enhancing search engines, supporting real-time applications, and grounding generative AI for improved question-answering. While some may perceive knowledge graphs as complex systems that integrate vast datasets, they can also be designed for specific use cases with a narrower focus. The Google Knowledge Graph exemplifies this concept by organizing information about entities to deliver contextually relevant search results. Key components of a knowledge graph include nodes, relationships, and organizing principles, which can be simple or complex depending on the use case. Ontologies, which define concepts and relationships within a domain, can serve as organizing principles but are not always necessary. Implementing a knowledge graph in a property graph database, like Neo4j, offers advantages such as ease of design, flexibility, and superior performance for complex queries compared to alternatives like RDF databases.

- Knowledge graphs represent entities and their relationships in a structured format.

- They can be tailored for specific use cases, simplifying their design and implementation.

- Google’s Knowledge Graph is a prominent example of how knowledge graphs enhance search functionality.

- Property graph databases, such as Neo4j, are ideal for implementing knowledge graphs due to their intuitive structure and performance benefits.

- Ontologies can be used as organizing principles but are not always required for effective knowledge graph design.

AI: What people are saying
The comments on the article about knowledge graphs reveal a range of perspectives and insights on their implementation and utility.
  • There is a debate over the effectiveness of Neo4j and its query language, Cypher, with some users expressing frustration and preferring other standards-based solutions.
  • Several commenters highlight the importance of shared ontologies and the challenges of integrating disparate data sources into a cohesive knowledge graph.
  • Users share their experiences with knowledge graphs in various applications, from product searches to historical research, emphasizing their versatility.
  • Some commenters advocate for alternative approaches, such as Datalog or Prolog, suggesting they may offer more flexibility than traditional graph databases.
  • There is a recognition of the evolving role of knowledge graphs in conjunction with AI technologies, particularly in enhancing data retrieval and understanding.
Link Icon 15 comments
By @kmerroll - 2 months
Good article on the high level concepts of a knowledge graph, but some concerning mischaracterizations of core functions of ontologies supporting the class schema and continued disparaging of competing standards-based (RDF triple-store) solutions. That the author omits the updates for property annotations using RDF* is probably not an accident and glosses over the issues with their proprietary clunky query language.

While knowledge graphs are useful in many ways, personally I wouldn't use Neo4J to build a knowledge graph as it doesn't really play to any of their strengths.

Also, I would rather stab myself with a fork than try to use Cypher to query a concept graph when better standards-based options are available.

By @CharlieDigital - 2 months
I've been working on an implementation of graph RAG (GRAG) using Neo4j as the underlying store.

The overall DX is quite nice. The apoc-extended set of plugins[0] make it very seamless to work with embeddings and and LLMs during local dev/testing. The Graph Data Science package comes preloaded with a series of community detection algorithms[1] like Louvain and Leiden.

Performance has been very, very good as long as your strategy to enter the graph is sound and you've structured your graph in such a way that you can meaningfully traverse the adjacent properties/nodes.

We've currently deployed the Community edition to AWS ECS Fargate using AWS Copilot + EFS as a persistent volume. There were some kinks with respect to the docs, but it works great otherwise.

It's worth a look for any teams that are trying to improve their RAG or are exploring GRAG in general. It's not a silver bullet; you still need to have some "insight" into how to process your input data source for the graph to do its magic. But the combination of the built-in graph algorithms and the ergonomics of Cypher make it possible to perform certain types of queries and "explorations" that would otherwise be either harder to optimize or more expensive in a relational store.

[0] https://neo4j.com/labs/apoc/5/ml/openai/

[1] https://neo4j.com/docs/graph-data-science/current/algorithms...

By @burakemir - 2 months
In addition to labelled property graphs and triples, a list of approaches to knowledge graph should consider facts(tuples) that are connected via common values as a form of graph, with datalog queries to query them. This is a lot more flexible than either approach IMHO and also more easily connected to existing relational data.

RDFox is a tool that uses Datalog internally. RelationalAI uses a datalog based approach. Another example is Mangle Datalog, my own humble open source project that can be found on GitHub.

The language in the article about relational being "non native graph" is a bit biased. With some developer attention, there are massive opportunities to store data in a distributed manner and with te right indices querying can be fast. Though to be fair, good performance will always need developer attention.

By @cgearhart - about 2 months
A knowledge graph is really just a projection of structured data from disparate sources into a common schema.

Take a bunch of tables and covert each row into a tuple (rowkey, columnName, value). Now take the union of all the tables.

^ knowledge graph

That’s it…but it’s not very useful yet. It becomes more useful if you apply a shared ontology during the import—ie translate all the columns into the same namespace. Suppose we had a “contacts” table with columns {“first name”, “last name”, …} and a “events” table with columns {“participant given name”, “participant family name”, …} — basically you need to unify the word you use to describe the concept “first name”/“given name”/whatever across all sources.

This can be cool/useful because you now only need one table (of triples) to describe all your structured data, but it’s also a pain because you may need to perform lots of self-joins or recursive queries to recover your data in order to do useful things with it. The final table has a very simple “meta” schema, and you erase the schema from each individual source so you can push the schema into the data.

By @findthewords - 2 months
For me a knowledge graph is a complex network.

When you try to grasp any complex topic your brain starts to build and connect a fuzzy network of topics and their respective positive or negative correlations and of course the weights between the connections.

Once you have unfuzzied the picture in your head you realize that the network is active and dynamic and that this network has different "modes" of operation and that some weights and correlations can change over time, while others are always static.

Mastering the dynamics of the knowledge graph is the final step in understanding it.

By @nemo44x - 2 months
What's great about knowledge graphs and property graphs in general is once you really get it (and it's not too difficult, especially if you come from a CS background) you start to see graphs all over the place. It's a really nice way to work with data for certain classes of problems. Once you get "enough" data in and "enough" of a variety of things connected, you start to see remarkable relationships emerge.
By @alexgunnarson - 2 months
We employ a knowledge graph at Deft (https://shopdeft.com) to enable searches over ~1M products, amounting to about 1B triples. Because of the complexity of the queries involved, the expressiveness of our data model — supporting n-ary/reified relations, negation, disjunction, linguistic vagueness, etc. — and our real-time latency targets, we built a graph DB engine "from scratch" (certain components are of course from open-source projects). Even RedisGraph wasn't fast enough for the purpose; ours (Deftgraph) is 700x faster on our queries thanks to some SOTA optimizations from various recent papers. You'll notice on our site that the overall search latency is generally acceptable but not great; the vast proportion of that latency comes from 1) LLMs and 2) a less-optimized other graph DB, Datomic, that we still store some of our data in for legacy reasons.

LLMs are great, but knowledge graphs are IMO indispensable to tame their shortcomings.

By @ristos - about 2 months
I would say that a better alternative to graph dbs would be prolog or datalog, because they're expressive enough to describe hypergraphs.

Prolog is better than datalog in a lot of ways: CLPZ, abduction, homoiconicity, being able to choosing search strategies for different problems, tabling, etc.

There's been some work to integrate prolog with LLMs:

https://swi-prolog.discourse.group/t/llm-swi-prolog-and-larg...

By @ic_fly2 - 2 months
As someone running neo4j in production I can just warn that the DBs are a pain and need a lot more care and love than Postgres or Oracle DBs. Even much larger instances. Maybe their cloud offerings are better, but they are quite expensive.
By @openrisk - 2 months
It will be quite a plot twist if Graph RAG paves the way for making knowledge graphs / semantic networks and the like cutting edge again... New "AI" meets old "AI" etc.
By @boredgargoyle - 2 months
Wrote this a while ago. KGs are definitely the next new old thing.

https://aneeshsathe.com/2024/05/10/dancing-on-the-shoulders-...

With LLMs enabling easy, if noisy, KG creation extracting knowledge into a computable form will lead to advances.

Drug discovery already uses the tech heavily, wouldn’t be surprised if it expands to more domains quickly now.

By @fjfaase - 2 months
An interesting use of Knowledge Graphs is doing research into historic document, such when doing genealogical research or researching into some historic event, person or location. In those applications, you often have that sources do not have direct references (a person name in one document cannot always be identified with with 100% certainty) or are contradicting each other (one source gives a different date than another). In this case another layer is needed. There is some need for attaching a source identification, the actual document (scans), an author and/or an authority to a source. In case you are extracting information from historical documents, it might be needed to transcribe the contents and in that case it would be nice to be able to mark parts of the text, to quickly verify the source of a fact.

I have not yet found an application that combines all those functions and I have been considering to build one myself.

By @westurner - 2 months
> free copy of the O’Reilly book "Building Knowledge Graphs: A Practitioner’s Guide"

Knowledge Graph (disambiguation) https://en.wikipedia.org/wiki/Knowledge_Graph_(disambiguatio...

Knowledge graph: https://en.wikipedia.org/wiki/Knowledge_graph :

> In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. [1][2]

> Since the development of the Semantic Web, knowledge graphs have often been associated with linked open data projects, focusing on the connections between concepts and entities. [3][4] They are also historically associated with and used by search engines such as Google, Bing, Yext and Yahoo; knowledge-engines and question-answering services such as WolframAlpha, Apple's Siri, and Amazon Alexa; and social networks

Ideally, a Knowledge Graph - starting with maybe a "personal knowledge base" in a text document format that can be rendered to HTML with templates - can be linked with other data about things with correlate-able names; ideally you can JOIN a knowledge graph with other graphs as you can if the Node and Edge Relations with Schema and URIs make it possible to JOIN.

A knowledge graph is a collection of nodes and edges (or nodes and edge nodes) with schema so that it is query-able and JOIN-able with.

A Named Graph URI may be the graphid ?g of an RDF statement in a quadstore:

  ?g ?s ?p ?o   // ?o_datatype ?o_lang
By @loughnane - 2 months
I’ve got a django side project that uses neo4j. I use it to map out the static content in the domain space and a postgres database that handles more transactional stuff.

It works great. I’m not a db expert but the flexibility and explicitness of the graph scheme clicks for me. It took me a while to come around on cypher but now that I’m there it makes sense.

By @profsummergig - 2 months
A rant about Chrome Bookmarks Manager (it's on-topic, I promise).

A few years ago, a Good Samaritan on HN told me my bookmarks (I had about 10,000 at the time) were my "knowledge graph".

I had no idea what that was, but upon researching the concept, I was mind-blown by the simple truth of what I had been told.

Since then I became even more rapacious with my bookmarking (and especially editing their "Name" field to add tags and keywords), and I have about 30,000 bookmarks now.

And they truly are my knowledge graph. More so than the 1,000 or so text files where I store my notes on various topics. Mainly because of the Bookmarks Search feature. Let's say I want to refresh my understanding of Permutations, Combinations, Factorials (like I wanted to do, and did, yesterday). All I have to do is enter those keywords into Bookmarks Search and I instantly get the best (i.e. most relevant and intuitive for me) articles I've ever found on these topics (because I've bookmarked and tagged/keyworded them in the past). I find that more and more bookmarks end up with 404's these days, but then there's Internet Archive. Invaluable.

Which brings me to Chrome Bookmarks. There seems to have been no innovation in the last 15 years (other than the time they replaced the prior [and post/current] system with the horrible "Cards", and thankfully reversed course due to the howling protests and decided to instead offer "Cards" as a Chrome Extension [which is apparently not popular]). For example:

- One still cannot exclusively search folder names.

- One cannot search within a single folder only (by extension, one cannot search within selected multiple folders only).

- One cannot use Regex to search, or do any kind of Fuzzy Search.

- One cannot download (cache) a copy of a bookmarked page (in case the page 404s in the future) to store permanently in the some Zotero type storage system.

- When adding a new bookmark, one cannot conduct a search (using text keywords) for the folder one wants to store it in.

I realize that some of these features are available via third-party Chrome Extensions and the like. However, I stopped relying on third party anything after I relied on one for tagging bookmarks and this third party Extension decided one day to 404 on me (it shut down).

Please Google do more with Chrome Bookmarks.

Rant over.

(Addendum 1: Regarding the text files: yeah, I've tried Zettelkasten software. The one that came closest to my liking was FeatherWiki. However, in the end I continued with plain ol' text files because they're the most Lindy [plain text files are likely going to be the last format rendered unreadable by whatever destroys civilization as we know it].)

(Addendum 2: there seem to be some people out there who have uncanny storage and retrieval systems. E.g. @Balajis on X comes to mind. When he's in a debate/argument on X, he has an uncanny ability to pull out relevant contextual material, instantly, from his back pocket when the situation demands it. Let's hope they share their systems.)