LLMs do not have mental models and therefore lack real understanding of concepts and ideas

Large Language Models (LLMs), including ChatGPT, generate responses by predicting the most likely next token based on patterns learned from vast amounts of text.

They don’t form internal representations of concepts or causal relationships as humans do, and therefore cannot possess real understanding. Without mental models, LLMs can’t simulate or manipulate concepts in the way humans can. They lack the capacity for grounded reasoning, so their “understanding” is often times superficial and purely statistical in nature.

This means that they often fail at seemingly simple tasks like multiplication with multiple digits, as the training set of written multiplications is sparse and so it does not „know“ how to multiply. Another example is the now famous "How many 'r's are in the word 'Strawberry'?" question, where most models fail at answering it correctly.

Ambiguity of language

This lack of mental models is exasperated by the limitations and issues of interpretation of natural language. Words can be interpreted differently and so using natural language as a definitive source as well as result format is inherently flawed. A person then might ask a question to ensure they understood the context and semantic meaning correctly, while a LLM might just output an answer, which misses the point by misunderstanding the true meaning.

Reasoning models try to solve this issue

Reasoning models are using chain-of-though-thinking and internal reasoning to combat their lack of real understanding. They might also invoke other tools (e.g. python for math or data analysis) to answer questions, that the LLM itself can not answer. This does not bring „true“ understanding but improves it.

Agents as the next evolution

Autonomous agents build on reasoning models by introducing the ability to plan, remember intermediate results, and work iteratively toward complex goals.

In my opinion agents (as well as reasoning models) can solve the issue of lacking real understanding for the most part by using external tools or other methods. It might well be, that future iterations also gain better long-term planning capabilities, which they are currently lacking.