Semantic Compression
Vectors need Graphs! Embedding vectors are a pivotal tool when using Generative AI. While vectors might initially seem an unlikely partner to graphs, their relationship is more intricate than it first appears.
π΅ Embedding Vectors
We can think of embedding vectors as multidimensional coordinates: much like longitude and latitude help you pinpoint a location on a two-dimensional map, embedding vectors serve as navigational markers in a multidimensional realm that can encompass words, images, or sounds. To use GenAI effectively, we often need to store these vectors in some kind of index that we can query. It turns out that graphs can be utilised to transform these embedding vectors into a layered, navigable map that makes the queries more efficient.
π΅ Distance Measures
Various measures (such as Euclidean or Cosine distance) can be used to calculate the similarity or distance between vectors. However, a brute-force computational approach is neither time - nor cost-effective. This is where the 'Hierarchical Navigable Small Worlds' (HNSW) algorithm comes into play. HNSW constructs a multi-layered hierarchical graph from the embedding vectors. The top layers, with fewer nodes, act as convenient entry points, facilitating navigation through the graph of nearest neighbours in vector space, thus speeding up the search process.
π΅ Continuous to Discrete
How does HNSW convert these continuous vectors into discrete nodes layered in a hierarchical graph? Firstly, it employs the K-nearest neighbour algorithm to calculate the distance between vectors and identify the closest βKβ neighbours. Each vector becomes a node with an edge connecting it to its nearest neighbours. If you set βKβ to βthreeβ, then each node will add links to its three closest neighbours. In some ways, βKβ sets the discrete boundary. Collectively, all these nodes and edges form a graph. Some nodes in the graph will have links to more nodes than others (we refer to this as the degree). The bottom layer of the hierarchy encompasses all nodes, but only nodes with a certain degree ascend to the next layer. You can have as many layers as you like, but generally, the top layer will only have a few entry points.
π΅ Knowledge Compression
Having high-degree nodes in the upper layers diminishes the risk of succumbing to local minima in the search space. However, the true brilliance here is that discretisation acts as a form of compression. There's a growing realisation that compression and understanding are deeply intertwined, and it's this compression that hastens the search process.
The revelation is that this discretisation process also occurs on the input side. We do it when we coin a new word in the dictionary and enumerate its synonyms. And here is the kicker, we can enact it for our data too - by creating new classes in our Ontology!
β HNSW Paper: https://arxiv.org/abs/1603.09320
β Continuous and Discrete: https://www.knowledge-graph-guys.com/blog/continuous-and-discrete
β Transformers and GNNS: https://www.knowledge-graph-guys.com/blog/transformers-and-gnns