Skip Navigation

Over just a few months, ChatGPT went from accurately answering a simple math problem 98% of the time to just 2%, study finds

Can we discuss how it's possible that the paid model (gpt4) got worse and the free one (gpt3.5) got better? Is it because the free one is being trained on a larger pool of users or what?

41 comments
  • I don't agree that ChatGPT has gotten dumber, but I do think I’ve noticed small differences in how it’s engineered.

    I’ve experimented with writing apps that use the OpenAI api to use the GPT model, and this is the biggest non-obvious problem you have to deal with that can cause it to seem significantly smarter or dumber.

    The version of GPT 3.5 and 4 used in ChatGPT can only “remember” 4096 tokens at once. That’s a total of its output, the user’s input, and “system messages,” which are messages the software sends to give GPT the necessary context to understand. The standard one is “You are ChatGPT, a large language model developed by OpenAI. Knowledge Cutoff: 2021-09. Current date: YYYY-MM-DD.” It receives an even longer one on the iOS app. If you enable the new Custom Instructions feature, those also take up the token limit.

    It needs token space to remember your conversation, or else it gets a goldfish memory problem. But if you program it to waste too much token space remembering stuff you told it before, then it has fewer tokens to dedicate to generating each new response, so they have to be shorter, less detailed, and it can’t spend as much energy making sure they’re logically correct.

    The model itself is definitely getting smarter as time goes on, but I think we’ve seen them experiment with different ways of engineering around the token limits when employing GPT in ChatGPT. That’s the difference people are noticing.

  • GPT releases model tunes using a month-day versioning system.

    For GPT-4 there are 2 releases

    • 0314 - Original Release, good at math
    • 0613 - Recent update, tagged to "GPT-4" in chat gpt and "gpt-4" in API calls.

    If you want 0314 you need API access, Azure, or know someone sharing access.

    It is entirely possible to use a version of GPT-4 that is very much like the version we used on opening day. just a little diy

    I don't know why thier tune is bad for 0613. Altman has made some statements they dont say much,.

  • Has it ever been good at mathematical/logical problems? It seems it's good at text-based problems like imitating a writing style or even writing code, but if you ask it a logic puzzle like "if two cars take 3 hours to reach NYC, how long will 5 cars take?" it often fails completely.

    Humans are capable of both understanding language and logical thought, I'm not sure if the latter will ever be easy for the LLMs to do, and perhaps older Symbolic approaches to AI might perform better in this space.

41 comments