AI Hype
There has been a lot of AI hype recently around OpenAI’s new "Strawberry" project, which they have allegedly demoed first to senior execs at Uber and now to the Feds. We've come a long way from the early days of AI with peer-reviewed papers, to just papers, to papers that were half marketing, and now to reading the tea leaves of various leaked, uncorroborated rumours! Nevertheless, I can’t help but find myself interested because it touches upon areas I’m passionate about: reasoning, data, and graph path traversal.
"Strawberry" is the new project name for the leaked Q* algorithm. Information is vague and uncorroborated, but the consensus is that the algorithm attempts to address the issue of LLMs’ inability to carry out true formal reasoning.
The leaking of the Q* algorithm coincided with a period of high volatility within OpenAI, culminating in the sacking and subsequent reinstatement of Sam Altman, and ultimately the departure of Chief Scientist Ilya Sutskever. The leak also generated a lot of speculation about the name. The ‘Q’ part seemed relatively uncontroversial, with most commentators agreeing that it was likely a reference to Q-learning. Q-learning is a type of reinforcement learning; it’s a model-free algorithm, which means it doesn’t require a model of the environment. Instead, it learns from experience by interacting.
The ‘Star’ part has two main explanations within the community:
🔵 A* Graph Algorithm: The A* algorithm can efficiently find the shortest path between two nodes in a weighted graph, which piques my interest!
🔵 STaR Paper: This demonstrates how an LLM can generate reasoning guesses in a supervised learning environment where the correct answers are known. The correct reasoning statements are then stored and used to fine-tune the model.
The second explanation seems more probable because it aligns with Ilya Sutskever’s paper "Let’s Verify Step by Step," which combines generated reasoning data with active learning. Additionally, rumours abound that all the AI labs are employing ‘good old-fashioned human’ reasoners to curate their reasoning datasets.
In truth, both guesses could be false, but I find the speculation about the meaning of ‘*/Star’ interesting in itself as I think it reflects a division of opinion within the community. The A* explanation comes from those who believe the probabilistic nature of LLMs means they will never be able to reason on their own, whereas the STaR explanation comes from those who believe they can learn it with enough reasoning data.
Thus, I think we can draw useful information from all the AI hyper-chatter: Firstly, there is consensus that current LLMs don’t reason well, and this is a significant problem. Secondly, there is a debate: Will reasoning come from an external algorithm not part of the transformer, or will it be possible to generate enough reasoning data for the transformers to ‘grok’ reasoning?