ClockBench: Even the best AI models can't reliably read the clock
ClockBench: Even the best AI models can't reliably read the clock
ClockBench evaluates whether models can read analog clocks - a task that is trivial for humans, but current frontier models struggle with.
cross-posted from: https://programming.dev/post/37407786