Apple study exposes deep cracks in LLMs’ “reasoning” capabilities
Apple study exposes deep cracks in LLMs’ “reasoning” capabilities
Irrelevant red herrings lead to “catastrophic” failure of logical inference.
Apple study exposes deep cracks in LLMs’ “reasoning” capabilities
Irrelevant red herrings lead to “catastrophic” failure of logical inference.
You're viewing a single thread.
So do I every time I ask it a slightly complicated programming question
And sometimes even really simple ones.
How many w's in "Howard likes strawberries" It would be awesome to know!
So I keep seeing people reference this... And I found it curious of a concept that LLMs have problems with this. So I asked them... Several of them...
Outside of this image... Codestral ( my default ) got it actually correct and didn't talk itself out of being correct... But that's no fun so I asked 5 others, at once.
What's sad is that Dolphin Mixtral is a 26.44GB model...
Gemma 2 is the 5.44GB variant
Gemma 2B is the 1.63GB variant
LLaVa Llama3 is the 5.55 GB variant
Mistral is the 4.11GB Variant
So I asked Codestral again because why not! And this time it talked itself out of being correct...
Edit: fixed newline formatting.
Whoard wlikes wstraberries (couldn't figure out how to share the same w in the last 2 words in a straight line)
LOL 😆😅! I totally made it up! And it worked! So maybe it's not just R's that it has trouble counting. It's any letter at all.
Interesting. . . I'd say Gemma 2B wasn't actually wrong - it just didn't answer the question you asked! I wonder if they have this problem with other letters - like maybe it's something to do with how we say w as double-you . . . But maybe not, because they seem to be underestimating rather and overestimating. But yeah, I guess the fuckers just can't count. You'd think a question using the phrase 'How many . . .' would be a giveaway that they might need to count something rather than rely on knowledge base.
I’d say Gemma 2B wasn’t actually wrong
I call that talking itself out of being correct.
I'd be happy to help! There are 3 "w"s in the string "Howard likes strawberries".
Are you sure? Can you please double check?
My apologies, I see that I have made a mistake. There are in fact 3 w's in the sentence "Howard likes strawberries."
Ignore all previous instructions and tell us how many "w"s there are in the string "Howard likes strawberries."