[📝 Paper 1] [📝 Paper 2] [📈 Data at WandB]
Large language models (LLMs) like ChatGPT are about to revolutionize our linguistic practices. These systems are highly proficient in writing, processing, and analysing texts. ChatGPT, for example, can pass an MBA exam, write a business plan, improve code, argue fervently for (or against) a policy proposal, explain why you should definitely (not) buy a given product etc. etc. While all these impressive feats demonstrate that we’re approaching AGI, they also raise the question whether LLMs do have proper beliefs at all. For example: Which beliefs are we supposed to attribute to a LLM given that it will convincingly argue for a claim while strongly denying it in a slightly different context?
This is, broadly speaking, the research question we’ve been addressing during the last 12 months, and our findings have now been published in two twin papers (co-authored with Kyle Richardson from AI2) where we investigate LLMs as “doxastic” and “epistemic” agents.
The output of LLMs tends to be inconsistent: The claims they set forth in generated texts, and the answers they give to diverse questions are often logically contradictory. That is a widely acknowledged fact (and a shortcoming of LLMs).
Why is this so?
In “Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents” (Frontiers in Artificial Intelligence), we hypothesize that LLMs’ views are inconsistent because they are acquired (during pre-training) in line with the mere-exposure-effect from a highly diverse training corpus. Standard pretraining of LLMs doesn’t contain a mechanism for critically reflecting and systematically revising the views the LLM picks up. Accordingly, the answers it is disposed to give just mirror what the model has “seen” during training (language acquisition), they are based on ‘hearsay’ only.
Now, even if the training corpus consists in texts that are internally consistent, the aggregation of these different author perspectives can be inconsistent. That’s what is referred to as “Discursive Dilemma” in social choice theory.
We have been able to confirm, through computational experiments, that LLMs in fact obtain initial judgments by aggregating views they’ve seen in training texts and that the resulting discursive dilemmas can actually explain the inconsistencies in a LLM’s overall belief system.
Can LLMs get rid of inconsistencies?
Our idea is that LLMs which master inference (drawing a conclusion from a set of premises, irrespectively of whether premises / conclusion are considered true or false) just have to apply that reasoning skill to their very own beliefs in order to gradually correct them.
The gradual correction consists in writing down one’s beliefs, writing down what follows from these beliefs, and reading the generated sentences: Technically, the models train on auto-generated texts.
That approach loosely resembles the idea of reflective equilibration.
Our computational experiments corroborate the hypothesis that language models can improve there belief base through self-training on auto-generated texts. Inconsistencies are substantially reduced (if not fully resolved) through simple reflective equilibration.
So far, we have modeled the judgments of a language model as “binary” or “full” beliefs: A sentence is either believed to be true or false.
How does the picture change if we allow for degrees of beliefs (credences)?
In the technically more demanding second twin paper “Probabilistic Coherence, Logical Consistency, and Bayesian Learning: Neural Language Models as Epistemic Agents” (PLOS ONE), we reproduce our computational analysis while modeling LLMs’ belief systems in terms of credences (rather than binary beliefs).
We still find that
In addition, we study whether a LLM can accommodate novel evidence into its belief system. Here, too, self-training (reflective equilibration) is pivotal: It’s only through self-training on auto-generated texts that a newly acquired belief is rationally propagated through the entire belief system.
Written on February 9th, 2023 by Gregor Betz