Can LLM Reason?
The question of whether Large Language Models (LLMs) can truly reason or are merely "faking it" is crucial for estimating the future progress of foundational models.
Deductive reasoning fundamentally involves a process of logical entailment, where proposition A entails proposition B, and B entails C, creating a chain of deductive inferences. In the real world, feedback loops can occur where C might, in turn, influence or entail A. However, setting aside such loops and incompleteness theorems, we can define deductive reasoning as the ability to achieve provable closure within a symbolic system. In such a system, if the premises are true and the reasoning valid, the conclusions are necessarily true. Thus, reasoning in this context ensures logical validity and truth preservation, offering certainty.
This type of reasoning poses challenges for LLMs. Given their probabilistic nature, it seems unlikely they can achieve true deductive closure, which could be a significant limitation for organisations wanting to deploy LLMs in real-world production environments where certainty is required for key aspects of their operations.
However, not all reasoning is deductive. Inductive reasoning, for instance, involves drawing general conclusions from specific observations, leading to probable but not certain outcomes. LLMs excel at statistical generalisation and syllogism. While we may not fully understand how LLMs compress and generalise information, it’s clear they do so effectively, which is why they are so useful.
Therefore, while LLMs are capable of performing inductive reasoning, they will likely struggle with true deductive reasoning. There is, however, a caveat: LLMs may learn to mimic deductive reasoning so convincingly that it becomes difficult to tell whether they are truly reasoning or merely simulating it. Currently, many AI labs are likely training the next generation of models on large reasoning datasets, hoping that, with sufficiently vast datasets, deep networks, and human reviewers, these models will approximate reasoning to a degree that is functionally indistinguishable from true reasoning. Huge amounts of money and resources are being spent on the bet that simply scaling up will work.
If I had to guess, I’d say this approach will not fully succeed. Reasoning can naturally take you well outside the distribution of your training set; logical entailment could lead anywhere, and when you reintroduce recursive nonlinear loops into the equation, it could continue indefinitely.
Finally, I wish to introduce a third category of reasoning—abductive reasoning—which involves collecting observations, making deductions, and selecting the most probable hypotheses. A practical implementation of this might integrate the inductive curve-fitting capabilities of large language models with the formal inference mechanisms of knowledge graphs, creating a dynamic neural-symbolic loop. Based on my experience, this approach holds considerable promise.