Vectors & Graphs
Ontologies and Knowledge Graphs offer a way to connect embedding vectors to structured knowledge, enhancing their meaning and explainability. Let's delve into how this works.
Imagine LLM embedding vectors as coordinates to points in space, on a 2D map the longitude and latitude help you find your position in space, with an LLM, dimensions like 'fruitiness' and 'techiness' help find a word's position in language. For instance, 'banana' and 'grape' score high on fruitiness but low on techiness, unlike Microsoft and Google, which lie on the opposite end of the spectrum. Apple is a bit more complex, does it refer to the tech giant or the fruit? This ambiguity underscores the challenge, especially since modern embedding vectors have thousands of dimensions and take context into account.
Understanding high-dimensional spaces is a tall order for the human mind. We can visualise the idea of ‘word-space’ in two dimensions and stretch our imagination to a third. But envisioning beyond that introduces us to what's known as the LLMs 'latent space' — a highly abstract realm where each word (or word part) occupies a unique position across a multitude of dimensions.
This latent space is so complex we have to treat it like a black box. Identifying the specific index in an embedding vector that is responsible for 'fruitiness' is not possible. This opacity understandably limits our willingness to trust these models with critical tasks.
LLMs can explain the meaning of their outputs in natural language, but this method often lacks precision. Techniques like Retrieval-Augmented Generation (RAG) and integrated function calls facilitate returning tables of data that provide sharper accuracy but also reveal a stark divide between the LLM's language processing and the concrete results of an SQL query. There is no real connection between the model and the data at a conceptual level.
I firmly believe that ontologies and knowledge graphs will play an increasingly vital role in bridging this model-data divide:
🔵 By capturing the core concepts on which your organisation operates within an ontology, you clarify the classes you care about.
🔵 By vectorising these classes, you can identify the concepts associated with a specific piece of text.
Furthermore, Knowledge Graphs allow for the mapping of discrete entities to these concepts, creating a connection between instance data and ontological classes. For instance, querying 'What is Apple's share price?' not only locates the appropriate 'Apple' in the LLMs latent space but also links it to a node with the label ‘apple’ and a type of ‘company’. With tight enough integration, this could help the LLM explain what it is ‘thinking’ about.
This more fluid transition from text to data is subtle but profound, it is narrowing the gap between continuous and discrete knowledge representations, opening exciting possibilities to improve the accuracy, explainability and trust of the models that we work with.
⭕ https://www.knowledge-graph-guys.com/blog/continuous-and-discrete