The AI Iceberg
The AI Iceberg: When discussing AI, we often focus on the algorithms—the visible 'tip of the iceberg.' But let's not forget what's submerged: a complex framework of data pipelines. The engineers who build and maintain these pipelines work tirelessly to clean and aggregate vast datasets. In fact, most of the grunt work often goes into this data preparation.
🔵 Seeing the Big Picture
The secret to the success of Large Language Models (LLMs) is more than just smart algorithms; it's more than just having lots of data. It's also about having quality data that is connected together in meaningful ways. It is these connections that allow the algorithms to find interesting abstractions. But how are data engineers supposed to provide this connectivity within the context of their own organisations? Engineers often lack a way to see how their individual contributions fit into the broader data landscape. Architecturally, we are frequently left working in the dark.
🔵 Empowering Connectivity
The truth is, that many established organisations are like beggars sitting on top of a goldmine. They already have a vast treasure trove of carefully collected and curated internal data, but this data is currently fractured and fragmented. It lacks overall organisation and semantic consistency. The trick is to weave the organisation's data into one coherent system. I think this can be accomplished by bringing conceptual semantics closer to the data engineers. Engineers have a natural affinity for order and efficiency. Empowered with a view of the bigger picture, we will naturally apply ourselves to connecting the dots.
🔵 Keep Your Eyes on the Ball
The technological landscape is evolving at breakneck speed. Don't just get dazzled by the latest shiny AI model; instead, remember to think about the data that underlies it. If you are an organisation of a reasonable size, then you will already own lots of data, so first focus on connecting all the data you already have. This will vastly accelerate all of your AI projects. Here are some practical guidelines:
⭕ Utilise Graph Structures: Maximise data connectivity by organising as much of your data as possible into graph structures.
⭕ Shift Data Publication Left: One team can't handle all data integration tasks. Distribute responsibilities for better efficiency. The Data Mesh articulates this point very well.
⭕ Shift Schema Right: A central team should manage a shared ontology and ensure data publishers adhere to consistent semantics wherever possible. A failure here leads to data chaos. In my opinion, the Data Mesh currently falls short on this aspect.
⭕ Leverage AI for Semantic Organisation: Use LLMs in tandem with ontologies to create a reinforcing feedback loop. AI can help organise your data better, which in turn makes your AI smarter.
🔵 Semantic Layer Architecture: https://www.knowledge-graph-guys.com/blog/the-semantic-layer
🔵 Learning as Compression: https://www.knowledge-graph-guys.com/blog/the-great-compression