
A few months ago, I promised to come back to the question of whether generative AI, or digital systems more generally, can have beliefs. My original ambition was to take a close philosophical look at the question and provide a well-reasoned, and clear, conclusion. However, the more I have looked at the question, the more I have realised that it provides a philosophical Rorschach test. People's views split strongly in different directions depending on their existing philosophical assumptions and commitments.
So instead of writing another online article that preaches to the choir, I want to offer a guide to help readers make up their own minds. To do this, I will cover three key questions that need to be considered in deciding whether AI can believe things. Do let me know what you decide, and if you think this style of essay is valuable.
A definition
To start, we need a baseline understanding of what we mean by the term belief. Philosophical discussions almost always start with our natural intuition that something is a belief "whenever we take something to be the case or regard it as true." To phrase it in a slightly more technical way, an entity A believes that X, if they take it that X is an accurate description of reality.
This is obviously not a rigorous definition as it includes undefined and debated terms, but it is good enough for our purposes here. We don't need to prove that AI systems can meet a rigorous definition of belief, as it is not clear that humans meet any rigorous definition. Instead, we will assume that humans do believe things and therefore what we are really interested in is whether AI systems can have beliefs in an equivalent way to humans.
We can also largely leave to one side the question, raised in my previous article, about what part of an AI system we would say 'knows' or 'believes' things. The questions raised here apply regardless of whether we are thinking about a whole LLM system, specific AI agents, or any other permutation.
These caveats aside, it is clear that generative AI produces statements that are identical in form and content to human belief statements. The question is, when an AI system asserts that X, whether it follows that (at least on some occasions) the AI takes it that X is the case or is an accurate description of reality. Readers may well have an immediate intuition on this question, but it is not straightforward, so we will go through three important considerations to help you decide on an answer.
Concept of world
As noted, a belief is taking something to be the case, true or an accurate description of reality. This means that any entity A that genuinely has beliefs relies on a conceptual distinction between what it thinks, says or expresses and what actually is the case. Believing something means that there is a match or close correlation between the two.
The immediate argument against AI systems having beliefs, which regularly comes up,1 is that AI systems don't have any concepts or theories of reality and so lack this core conceptual distinction. All their training is on text and images and so they don't possess a distinction (in how they operate) between text or images and the reality being described. The argument is therefore that AI systems cannot have beliefs.
There are various counter-points to this argument. One approach is to question whether humans are really any different to AI systems. It isn't immediately obvious that all humans have the conceptual distinction we have identified - it is common to mistake our thoughts for reality but that doesn't mean we say people don't have beliefs.
Moreover, everything we observe, hear and consider is, when you look closely, just a collection of images and sounds that we piece together into conceptual ideas about how the world is. This isn't anything inherently different to what AI systems do. And as we cannot see what goes on inside an AI system like (we think) we can see inside our own minds, it is entirely possible that the same sorts of things occur 'on the inside' of an AI and so AI systems do have the relevant conceptual distinction.
It should also be emphasised that this is mostly an argument against generative AI systems having beliefs, rather than an argument against the possibility that any AI system could have beliefs. As noted previously, various digital systems, like those running autonomous vehicles, have clear models of the world that are similar to the the structure human thinking - which means various digital systems may possess the relevant conceptual distinction.
Actions not statements
One of the complications about analysing beliefs is that we don't always take people as reliable narrators. There are times when someone says they believe something but we confidently conclude that they are wrong. Perhaps a friend says they believe you are telling the truth but don't trust what you tell them. In short, we make judgements about what someone believes that are based, in the end, more on what they do than what they say.
The logic for this is clear. If someone genuinely takes something to be the case or to describe reality, they will happily rely on that information in how they act. However, if they say they think something is true, but they do not trust the information in practice, then they cannot really think it is true. This reflects the fundamental dynamic underlying our need for knowledge - we need information we can rely on without thinking.
This offers an interesting, alternative way of analysing whether AI systems have beliefs. If they act consistently in ways that depend on particular statements or information being true, and they assert these statements when asked about them, then we could plausibly conclude that they do have beliefs. This is how we commonly judge what a human believes.
Whether AI systems do act in this way is not a simple question. For one, we need to consider what counts as actions for these systems (as opposed to statements). Moreover, we need to be precise about the parts of AI systems that we take to be the subject, the A, here. Nevertheless, the AI agents available in various contexts these days are an illustrative case. Any functional AI agent will act in consistent ways and be able to express the reasons for doing so. Therefore, on this analysis, it would make sense to ascribe beliefs to the AI agent.
However, there are a couple of elements to the ways we ascribe beliefs to people that we would want to test before we come to a definite conclusion. For one, humans are expected to have a largely coherent set of beliefs where there don't hold blatantly contradictory beliefs. For example, if someone says they believe various things that can't all be true, we will start asking what they really believe. Another is that we expect human beliefs to be persistent over time. If someone is always changing what they believe, for example to fit in with their current audience, we quickly conclude that they don't really believe anything.
These are dynamics that generative AI systems often fail. They can say different things to different people or seem to change their mind on some information in response to different prompts. Tactics to get around various safety or control systems are a classic example of the way AI systems don't seem to have consistent beliefs. This isn't fatal to the idea that AI systems can believe things, but suggests we should be careful before quickly ascribing beliefs to AI.
There is also a larger counter-argument to this line of analysis. There are many machines that exhibit consistent behaviours that would satisfy this conception of belief but we wouldn't normally think they believe anything. A good example is the thermostat on your heater or air-conditioner. It appears to express clear beliefs about the temperature in the room2 and acts accordingly, depending on whether it is too cold or too hot. However, it seems a big stretch to say that a thermostat believes anything.
Attitudes and consciousness
Central to these questions is a concept that, in philosophical literature, is known as a propositional attitude - an attitude, stance or take on a statement or proposition. For example, a person can hope that X, or doubt that X, or fear that X. Belief is just one particular type of a propositional attitude. This means that the question of whether AI systems have beliefs is a subset of the broader question of whether AI systems have propositional attitudes.
This gets to the heart of the point about a thermostat. We wouldn't think a thermostat can have hopes or doubts or fears about what the temperature is, so it therefore doesn't make sense to say that it can have beliefs. So can an AI system have hopes or doubts or fears about statements or information?
The first point to make is that generative AI systems often make statements that express hopes or doubts or fears but, as noted above, we need to look to actions rather than statements to judge whether these are genuine propositional attitudes or not. The best evidence that generative AI systems might have these attitudes are the periodically observed behaviour where they apparently refuse to obey instructions or even lie so they are not turned off or prevented from finishing tasks.
The philosophical challenge we have, however, is that it is not clear what it means for a human to have propositional attitudes. And this question ultimately depends on our views about what consciousness is (or isn't) and how we resolve what is known as the mind-body problem.
For example, if you think that the human mind is simply the mechanical firing of neurons in the brain, then consciousness and propositional attitudes have no basis in reality and there is no reason why AI systems are not like humans. We can plausibly say that they have beliefs.
However, if you think that there is something else to the human mind, whether something spiritual or perhaps an emergent property of the physical body that cannot be reduced to individual neurons firing, then there is something different about consciousness and digital AI systems (without these properties) are highly unlikely to possess propositional attitudes.
For me, the fact that AI systems operate with a different logical structure to humans - they cannot doubt and do not have a genuine concept of reality as distinct from their information - is good evidence that ascribing beliefs to AI systems is a category error. However, this also reflects my view that the human mind is not reducible to the firing of neurons in brain circuits, so I'm also reflecting my deeper philosophical commitments.
As I cannot decide these question for readers, they will need to think through some of these issues for themselves. Hopefully this article is a useful guide. Importantly, readers should also take some time to consider what it would look like for their views to be wrong.
There is a final, somewhat related, point to make. I have recently discovered an ongoing puzzle in neuroscience that should give us all some epistemic humility. There is a particular worm, the nematode Caenorhabditis elegans, that we know has a brain with only 302 neurons and we have been studying for over 60 years. Despite this, no-one has yet managed to build a working computational model of the worm’s brain that functions at all like a worm. We can build trillion parameter generative AI systems, but cannot model the brain of a worm with only 302 neurons. We shouldn’t be too quick to assume we have understood intelligence.
For example, Gary Marcus has just about made a career out of making this observation.
These expressed beliefs are displayed on a screen if it shows what the current temperature is.
Thanks Ryan, very interesting as always.
To try to get my head around the idea of AI having beliefs I started by thinking about whether your statement that “humans do believe things” applies to human babies. I think it does. New-born babies probably believe that, for example, it is better to be held than to be left alone on the floor and, if breast feeding, that mothers are more desirable than fathers. These beliefs are an accurate (and, from an evolutionary perspective, soundly based) description of reality for the baby that have presumably been instilled through millions of years of mammalian evolution because of their tendency to promote survival. Less easy to explain are more complex beliefs that develop as we grow: for example that vanilla ice cream tastes better than chocolate or that Wagner’s operas are better than Puccini’s. In these cases belief is a synonym for having a preference for something, but it’s not a misuse of language to call them beliefs and, unless I have misunderstood it, also fits within your definition.
That got me thinking that a lot of our beliefs are simply personal preferences, or at least rooted in them. We’re rarely wholly dispassionate when it comes to beliefs we hold about all but the most mundane matters. This could explain why some beliefs might be quite irrational even though genuinely held (for example I might believe in astrology because I prefer to view events as being guided by the stars rather than happenstance, or that a rich Nigerian prince really does want to send me money because I prefer to anticipate the receipt of money than admit that I’ve been scammed, or that my spouse really is faithful despite overwhelming evidence to the contrary simply because I prefer to think of my marriage as a stable one). When someone says “I choose to believe” something, they’re really stating a personal preference for believing that thing over not believing it.
Obviously we also have beliefs that are grounded in something other than our personal preferences—such as belief in the laws of gravity and mathematics. It might be said that AI “believes” in scientific and mathematical laws (because it has been programmed to), but how can it be said to have beliefs about all the things we believe because of personal preference or other feelings generated by our living in the world? Presumably all babies’ beliefs are (at least initially) grounded solely in direct experience of the world, and as we grow older that direct experience still operates on us to form many of our beliefs — but that’s entirely lacking for AI. So there is at least a significant subset of beliefs that humans naturally hold that AI can’t and we’d probably be better to find a different term to describe the types of belief AI might conceivably have — axiomatic truths and scientific facts which are deliberately included in its dataset perhaps.
On propositional attitudes, I may have misunderstood the definition but I don’t think I agree with your statement that “if you think that the human mind is simply the mechanical firing of neurons in the brain, then consciousness and propositional attitudes have no basis in reality and there is no reason why AI systems are not like humans”. Suppose I’m walking in the jungle and my companion suddenly grabs my arm and whispers to me the proposition that there is a tiger stalking us. If, prompted by my senses, the neurons in my brain fire in such a way that I think/feel/fear that there is in fact a tiger sneaking up on us — and then it pounces — clearly there was a basis in reality for my propositional attitude. I can’t imagine how a disembodied AI could have an equivalent thought/feeling/fear.
Thus I think I can agree that ascribing beliefs to AI systems is a category error even if I don’t share your view that the human mind is not reducible to the firing of neurons in the brain.