Making Metadata Meaningful
Metadata is essentially "data about data." It provides descriptive, structural, and contextual information, making other data easier to understand, locate, and use effectively. By capturing essential details—such as a dataset's origin, structure, purpose, relationships, and meaning—metadata enables data to be organised and contextualised.
Here is a key insight: Semantics gives data meaning. By adding semantics to the metadata, we make all metadata meaningful.
Network To System
According to the Free Energy Principle, the boundary of a system is coupled to its environment in a way that selectively regulates information exchange. This selective coupling helps the system resist entropy, increasing order and minimising uncertainty. Rosen goes further, suggesting that in living systems, the presence of this boundary enables internal Closure To Efficient Causation. Efficient causation, an Aristotelian term, refers to the immediate source of change in a system—the "cause" that directly brings about an effect. In other words, only a system with a boundary can possess agency, autonomy, and self-determination.
The Prototype Trap
If you’ve struggled to take your LLM project from prototype to production, and even tried RAG but still didn’t achieve the accuracy you needed, it might be time to consider GraphRAG. GraphRAG combines the power of retrieval-augmented generation with the structure of knowledge graphs, delivering more reliable and accurate results.
Ying and Yang
LLMs like ChatGPT have taken the world by storm, but for enterprises, they are only half of the equation. Knowledge Graphs (KGs) are the other half, providing the reliability and structured understanding that LLMs lack.
The AI Iceberg
When discussing AI, we often focus on the algorithms—the visible 'tip of the iceberg.' But let's not forget what's submerged: a complex framework of data pipelines. The engineers who build and maintain these pipelines work tirelessly to clean and aggregate vast datasets. In fact, most of the grunt work often goes into this data preparation.
The Ontological Index
Traditional filtering is typically based on tabular (rows in a database) or tree-like (JSON documents) data formats. The landscape changes significantly when the data itself is structured as a graph. When employing HNSW in a graph-based setup, both continuous vectors and discrete facets become vertices in the same graph. This allows for more nuanced relationships and more efficient alignment. Furthermore, the upper layers within HNSW represent a form of compression. With your data in a graph, you can move beyond the classic HNSW node-degree compression algorithms to consider more semantic forms of compression, which take domain-specific ontologies into account.
Semantic Compression
Vectors need Graphs! Embedding vectors are a pivotal tool when using Generative AI. While vectors might initially seem an unlikely partner to graphs, their relationship is more intricate than it first appears.
GraphRAG
Data leaders are adapting to the profound shift brought about by GenAI. As organizations incorporate AI into their data strategies, Graph Retrieval-Augmented Generation is emerging as a transformative solution, bridging the gap between AI and Data. This post explores GraphRAG and how it integrates into your broader data strategy.
Vectors & Graphs
Ontologies and Knowledge Graphs offer a way to connect embedding vectors to structured knowledge, enhancing their meaning and explainability
The Great Compression
We are witnessing an era of information compression, spearheaded by large language models (LLMs) that proficiently process web text. These LLMs handle an inconceivably vast array of word combinations, reducing them to a mere trillion parameters. Embedding models, like text-embedding-ada-002, further condense this into 1536 dimensions. Reflect on this when using Retrieval-Augmented Generation (RAG): the essence of the web's information, distilled into 1536 coordinates.
Data Connectivity and the Free Energy Principle
Think of your organisation as a living entity—its survival depends on a well-defined information boundary, which functions like a semi-permeable membrane. Karl Friston's Free Energy Principle (FEP) models this boundary as a Markov Blanket and says that to sustain itself, a system must minimise its free energy.
Continuous and Discrete
We can think of information existing in a continuous stream or in discrete chunks. Large Language Models (LLMs) fall under the category of continuous knowledge representation, while Knowledge Graphs belong to the discrete realm. Each approach has its merits, and understanding the implications of their differences is essential.
How Ontologies Can Unlock LLM for Business
This post delves into the transformative capabilities of Large Language Models, such as GPT-4, and examines the crucial role that the structured intelligence of ontologies can play in deploying LLMs in production environments.
Graph of Thought
Imagine the next phase: a ‘Graph of Thought’ where thoughts are modelled as nodes connected by edges. Directed Acyclic Graphs (DAGs) have revolutionised data pipeline orchestration tools by modelling the flow of dependencies in a graph without circular loops. Unlike trees, DAGs can model paths that fork and then converge back together! This is a game-changer, and surely it's only a matter of time before LLM thought prompting embraces this powerful ability too.
LLMs + Ontologies
This collaborative partnership between LLMs and ontologies establishes a reinforcing feedback loop of continuous improvement. As LLMs help generate better ontologies faster and more dynamically, the ontologies, in turn, elevate the performance of LLMs by offering a more comprehensive context of the data and text they analyse
How do you make your data more intelligent?
Concentrate on the fundamentals first. Don’t get distracted by side projects. AI needs data! You can now buy general intelligence, but only you can provide your private data. However, every organisation's data is currently disconnected and poorly organised. If you want to use all this intelligence in a way that is meaningful to your organisation, then you must first get your data into a shape that is ready for use with AI.
Your Ontological Core
There are pragmatic techniques organisations can use here: they can connect their data by giving each data point a URL and structure their data by linking these URLs to concepts in shared ontologies. Linked Data compresses very well, and you can use it to finetune a private model that reflects your organisation's identity. This is your solid core. Your solid core projects out holographically onto the surface boundary of your organisation, and in an AI-driven world, it will increasingly come to define you.
Data, Graphs and AI
In an enterprise context, AI models unlock their true potential not merely through the vast quantities of information they process, but when that information is relevant, connected, cleaned, structured, and enriched with semantics—all tasks that lie at the heart of data management expertise. Yet, the reverse is equally true: our data strategies gain direction and sophistication when guided by the insights and capabilities AI brings to the table. Moreover, AI can help you connect, clean, structure, and enrich your data with semantic metadata.
Network of Networks,
Taking control of how you prepare your data gives you agency. The key to preparing your data for AI is enhancing its compressibility. This involves focusing on the relationships between data items, not just the items themselves, because better connectivity equals better compressibility.
You can add meaningful connections to your data in two main ways:
Data Products + Ontologies
Many organisations are reorganising their data around data products, and the most advanced are connecting their data with knowledge graphs for use with Gen AI. These efforts should not be separate endeavours; the real value lies in bridging them.
This is where DPROD comes in. It is a freely available semantic ontology that defines data products, serving as both a specification and a first small step in creating a Distributed Knowledge Graph.