this paper seems to be about the limits of accurate classification of true and false statements in LLM models
No, that's not what it is about and I'm really not sure where you are picking that perspective up. It is discussing the limits on the ability to model the representations, but it's not about the inherent ability of the model to classify. Tegmark's recent interest has entirely been about linear representations of world models in LLMs, such as the other paper he coauthored a few weeks before this one looking at representation of space and time: Language Models Represent Space and Time
This seems unsurprising since the way LLMs work is essentially taking a probabilistic walk through an array of every possible next word or token based on multidimensional analysis of patterns of each.
That's not how they work. You are confusing their training from their operation. They are trained to predict the next tokens, but how they accomplish that is much more complex and opaque. Training is well understood. Operation is not, especially on the largest models. Though Anthropic is making good headway in the past few months with the perspective of virtual neurons mapped onto the lower dimensional actual nodes and looking at activation around features instead of nodes.
Llama-13B is the best
It's definitely not the best and I'm not sure where you got that impression.
Because this is multidimensional and it's AI finding patterns there are patterns being matched beyond the simplistic examples I've been offering as analogues, patterns that humans cannot see, patterns that extend beyond simple obvious correlations we humans might see in training data.
All LLM activations are multidimensional. That's how the networks work, with multidimensional vectors in a virtual network fuzzily mapping to the underlying network nodes and layers. But you seem to think that because it's a complex modeling of language relationships that it can't be modeling world models? I'm not really clear what point you are trying to make here.
Again, there's many papers pointing to how LLMs establish world models abstracted from the input, from the Othello-GPT paper and follow-up by a DeepMind researcher to Tegmark's two recent papers. This isn't an isolated paper but part of a broader trend. To be saying that this isn't actually happening means claiming multiple different researchers across Harvard, MIT, and institutions leading in the development of the tech are all getting it wrong.
And none of the LLM papers these days are peer reviewed because no one is waiting months to publish in a field where things are moving so quickly that your findings will likely be secondary or uninteresting by the time you publish. For example both Stanford's model collapse one and Are Emergent Abilities of Large Language Models a Mirage? were published to arXiv and not peer reviewed journals, while both getting a ton of attention, in part because of how negative takes on LLMs get more press coverage these days. Go ahead and point to an influential LLM paper from the last year published in a peer reviewed journal and not arXiv. Even Wei's CoT paper, probably the most influential in the past two years, was published there.