Transformers and GNNs
It is widely recognised that Large Language Models, such as GPT, are built on the Transformer architecture. However, what may be less widely acknowledged is that it is possible to view the Transformers themselves as a type of Graph Neural Network.
The value of this insight is that it reveals an exciting new possibility: that is the potential power to bridge the gap between structured and unstructured data, opening the doors to the creation of far more powerful and versatile models.
Whilst text is often regarded as unstructured, in fact, Transformers reveal the complex semantic structure that underlies language. Fortunately, Graphs are not only adept at but ideal for, handling this level of complexity. It, therefore, seems logical to conclude, that by combining the two, we may actually be able to break down the supposed barrier between structured and unstructured data.
Let’s consider how this might work. Transformers analyse sentences by assigning importance to each word in relation to others, helping them predict or generate the next words in a sentence. This 'attention mechanism' evaluates pairwise interactions between all tokens in a sequence, and these interactions can be seen as edges in a complete graph. Thus, Transformers can be thought of as graph-based models where tokens represent nodes and attention weights represent edges.
It is my opinion that the integration of Transformers with Graph Neural Networks holds great potential for advancing the field of machine learning. Vitally, in our organisations, the perceived distinction between our text and our databases may no longer need to exist, as full data integration becomes a potential reality. Suddenly powerful AI within our own organisations does not seem like such a distant prospect after all.