6mo ago

Oxford pretends AI benchmarks are science, not marketing

Chatbot vendors routinely make up a new benchmark, then brag how well their hot new chatbot does on it. Like that time OpenAI’s o3 model trounced the FrontierMath benchmark, and it’s just a coincid…

How could all these benchmarks be fake, it’s a mystery

https://www.youtube.com/watch?v=KcYZN6sTZjQ&list=UU9rJrMVgcXTfa8xuMnbhAEA - videohttps://pivottoai.libsyn.com/20251106-oxford-pretends-ai-benchmarks-are-science-not-marketing - podcast

time: 6 min 16 sec