World models and AI: Mission impossible?
A natural next step in improving LLMs and generative AI may turn out to be a dead end

While different AI systems continue to surprise and shock people, there is a growing sense that the current approaches to improving the underlying technology are capping out. The buzz is in new applications of the same underlying LLM (large language model) systems, rather than fundamentally new AI capabilities. An increasingly common view among leaders in the industry is that scaling to bigger data and training sets is no longer leading to major improvements and that, instead, AI systems need to start building what are known in the field as ‘world models’. This has been publicly stated amongst others, by Elon Musk and the CEO of DeepMind, Demis Hassabis.
From my philosophical point of view, this makes perfect sense. It is an explicit move to address a fundamental issue I have already written about: LLMs have no access to or understanding of a real world behind the language, pixels or data. Many of the ongoing issues with generative AI systems, especially ongoing problems with hallucinations, can be traced back to this limitation. LLMs may be able to understand how words or images fit together, but have no understanding of how the world behind them works.
As I’ve written before, humans (and likely animals) encode their understanding of the world in various theories, pictures, mental models and abstract representations. This isn’t how current AI systems work and the clue is in the name. LLMs are ‘Large Language Models’, rather than models of the world, and copying the knowledge structure humans use makes intuitive sense.
The important question isn’t, however, whether adding ‘world models’ is a good idea. It is whether it will work or is even possible. My prediction is that it is much more difficult than people expect, so much so that I expect it to be impossible. There is the challenge of building accurate and useful world models (science has spent hundreds of years on this and only got so far). But connecting the world models to the language (and pictures) that is native to LLMs is just as challenging.
What is a ‘world model’?
To be able to understand, or at least make reliable predictions that go beyond current data, any AI system (or person) needs some kind of reliable representation of the world they are focused on. In AI research, this type of representation is called a ‘world model’ and has various definitions.
One definition from a prominent proponent is that a world model is a “persistent, stable, updatable (and ideally up-to-date) internal representation of some set of entities within some slice of the world.” Another definition, this time from DeepMind, is that a world model for a particular agent (could be human or AI) is a “predictive model of its environment”. The important caveat here is that the environment we care about is the real world.
You might assume that this is the sort of thing that large AI systems already build from their training data, given they have access to all the data we can possibly throw at them. However, it has been demonstrated in various ways that they don’t, which is unsurprising from a structural philosophical point of view.
A research paper published last year provides a nice example.1 The researchers trained an LLM on the orbital mechanics of solar systems so it could predict the trajectory of planets orbiting a sun. The training data was (from a human perspective) huge - it was given data from 10 million simulated solar systems. The encouraging part of the results was that the system was able to make highly accurate predictions for any given solar system. However, the underlying physics it used looked nothing like the correct Newtonian mechanics (that both describes the real world and was used to generate the simulated data). Moreover, the physics it used was highly instance specific and varied across different solar systems. In short, it could predict specific instances accurately but had no coherent understanding of the underlying physics and could not generalise.
The authors describe the problem as follows:
A foundation model [their term for an LLM or similar system] uses datasets to output predictions given inputs, whereas a world model describes state structure implicit in that data.
To explain the underlying dynamics, it is helpful to take a simpler example and consider words like ‘left’ and ‘right’. These terms describe specific spatial relationships that are tied to a speaker’s location and perspective. Intuitively, you can learn as many uses of the words as possible, but unless you grasp the relevant spatial concepts, we would never say you really understand the words. That is, there is a gap between what you can learn directly from our language that uses the words ‘left’ and ‘right’ and the spatial meaning, as encoded in a world model, of the words.
We can see this difference play out in two very different ways of learning and applying the words ‘left’ and ‘right’. One is to see how they are used together in sentences and learn various rules about what follows as a consequence when people use these words in combination with others. The second is to grasp the spatial meaning and translate sentences with the word into spatial representations (either in our head or by drawing a picture) and then predict based on these. Think of the difference between a student who crams for a maths or physics exam by learning all the rules and what to do when versus a student who understands the underlying theories or concepts.
LLMs and gen AI are like the student cramming without understanding. Which is one reason why they require so much data. However, those students don’t really understand what they are studying, as they don’t really have a world model to work with.
Are world models feasible?
The research about Newtonian physics explains why people are excited about adding ‘world models’ to existing gen AI systems. The underling LLM (or transformer) methodology does not build coherent world models but without these world models any knowledge of the real world will be limited. Given we do know how to build computerised models of Newtonian physics it makes sense to combine the two and add world models into broader LLM architecture. The obvious results should be to harness the capabilities of LLMs while improving their accuracy about the real world.
Intuitively it sounds easy, but there are many deep challenges. I do not have any insight into the potential and likely technical side so will approach it from a philosophical perspective. To boil it down to the simplest structure, this approach requires us to be able to build or train accurate world models and to connect these in a coherent and consistent way to the linguistic and pictorial functions of LLMs. The first is difficult enough - scientists have been working at this for centuries with amazing, but still limited, results. But the second is far more difficult than people generally assume.
Connecting language to entities or relationships in the world is easy for humans to do but incredibly difficult to explain precisely. There is a long history within philosophy of people trying to do this, with limited success. If we cannot explain how humans manage this, then it is unlikely we will be able to program computers to do the same thing.
Meaning - a vexed problem
Many of the hardest problems to solve are those phenomena that we find so obvious and intuitive that we don’t realise there is anything mysterious about them. A good example is the problem of consciousness. It is so normal, intimate and familiar that it can be hard to realise it is something that we don’t know where it comes from. A similar example is the idea of a ‘meaning’ of a word in the philosophy of language. What exactly is the meaning of words? How are they determined or decided? How do we learn them?
To abridge a very long story (the entry on Theories of Meaning in the Stanford Encyclopedia of Philosophy is over 23,000 words and ends with a list of questions rather than answers), we don’t really know. There are lots of things going on and we don’t really know how they fit together - and this particularly matters as understanding the meaning of words and sentences is crucial for figuring out if they are true. We can illustrate some of the core issues with what should be a simple case of meaning: proper names. And let’s use the example of everyone’s favourite person today, Donald Trump.
Clearly, the meaning of the name ‘Donald Trump’ is the person who is currently the US President. In philosophical language, the real person is the reference of the name. As such, it seems quite simple. However, lots of complications quickly arise.
1. For one, there isn’t just one person with the name ‘Donald Trump’. How do we determine which person is the reference in a given situation?
2. There are many other names that refer to the same person. POTUS, Leader of the Free World, Orange Man, Drumpf, 45, and many other names are used to refer to the same person. If the meaning of these words is the person they refer to, it should follow that all these names mean the same thing. Yet to say that POTUS and ‘Orange Man’ mean the same thing simply seems wrong, even if they currently refer to the same person.
3. We often use names of people in hypotheticals or counter-factuals. Perhaps: “If it hadn’t been for the covid pandemic, Donald Trump would have been re-elected in 2020.” How can the meaning of a word be a real person today when the sentence is about a counterfactual that never happened?
I could go on but you should be getting the picture. Connecting a simple word like a proper name to the real world referent is difficult. It gets harder with more complex instances like the word ‘chair’. As I’ve written before, it’s essentially impossible to come up with a coherent, universal definition of the word chair (especially as distinct to a similar word like stool). And then we can get to really hard words like power, or love, or the ‘global order’.
Two things should stand out from this extremely brief introduction. We all manage to understand meanings of words, use them and decide if sentences are true or not without any effort every day. But we don’t understand how it works in any precise way. Ordinarily this wouldn’t matter, except that we aren’t dealing with humans who do all of this instinctively.
We are trying to teach artificial computer systems how to connect words with entities within world models, that is, what the words mean. We need to understand it precisely to be able to train a different type of system to do the same thing. I may be proven wrong, but the fact we don’t know how to do this for humans means we don’t even know how to start with AI systems.
Moreover, we have already faced this problem with computers before and given up. The original dream of computer language translation was to teach it the meanings of words and sentences. That approach didn’t get anywhere and instead all language translation relies on statistical analyses of words, sentences and phrases. The underlying models are increasingly sophisticated and accurate, but they use the same approach and have the same limitation as LLMs. They are like students who have crammed and practised to be able to answer the maths or physics questions without actually understanding the theories.
World models - a dead end?
So I find myself in an interesting position. I strongly believe that adding world models into the architecture of gen AI systems is essential for them to improve and become more accurate and intelligent. This is how humans (and likely animals) operate. The first wrinkle is that, as far as I can tell, we have no clue about how to connect world models to the language that is native to LLMs. We don’t know how it works for humans and so don’t know how to code it for machines.
A second wrinkle is that we don’t know how we humans build coherent world models outside a limited set of fields where we already have good mathematical theories (like Newtonian physics). But that is a story for another post.
If you want to read an explanation rather than the original paper, I’d recommend

