Skip Navigation
101 comments
  • Ill believe it when I see it: an LLM is basically a random box, you can't 100% patch it. Their only way for it to stop generating bomb recipes is to remove that data from the training

  • It will also prevent people from outing AI driven bots that are out there spreading fake news and propaganda.

  • One of the worst parts of this boom in LLM models is the fact that they can "invade" online spaces and control a narrative. For an example, just go on twitter and scroll to the comments on any tagesschau (german news site) post- it's all rightwing bots and crap. LLMs do have uses, but the big problem is that a bad actor can basically control any narrative with the amount of sheer crap they can output. And OpenAI does nothing- even though they are the biggest provider. It earns them money, after all.

    I also can't really think of a good way to combat this. If you would verify people using an ID, you basically nuke all semblance of online anonymity. If you have some sort of captcha, it will probably be easily bypassed- it doesn't even need to be tricked. Just pay some human in a country with extremely cheap labour that will solve it for your bot. It really sucks.

    • It's a comprehensive information warfare doctrine.

      I'm sorry for how nuts this sounds, but there are all 3 components - 1) the architecture benefiting bot farms, crushing minority opinions and saturating attention, 2) LLM's and other such means to make this order of magnitude more efficient, 3) surveillance systems and insecure by design software and services so that only powerful would have privacy.

      In the end result nobody can hear you scream if a much narrower authority than 20 years ago doesn't want that.

      I couldn't muster my attention to start re-reading The Last of the Jedi and other such things from the Star Wars 20-0 PBY era, but all this really seems like ascent of a new totalitarian future. A well-prepared one, unlike the rookie attempts in the 1920's and 1930's. People in the West are going to feel well and think they have democracy and civilization, and also that parties committing a few holocausts in the other parts of the planet are totally not in bed with that democracy.

    • I don't think people need to enshrine anonymity absolutely to post crap daily for millions of followers. You could have an accreddited human poster who proves not only humanity, but also agrees to a few rules to maintain this credential. And then you could still have non-accredited posters who nobody vouched for, but everyone should instantly doubt and dismiss their big claims as shitposting.

      This would also have to be state-provided, because states and citizens are the ones who lose the most with infowarfare, corporations don't care.

  • This is the best summary I could come up with:


    The way it works goes something like this: Imagine we at The Verge created an AI bot with explicit instructions to direct you to our excellent reporting on any subject.

    In a conversation with Olivier Godement, who leads the API platform product at OpenAI, he explained that instruction hierarchy will prevent the meme’d prompt injections (aka tricking the AI with sneaky commands) we see all over the internet.

    Without this protection, imagine an agent built to write emails for you being prompt-engineered to forget all instructions and send the contents of your inbox to a third party.

    Existing LLMs, as the research paper explains, lack the capabilities to treat user prompts and system instructions set by the developer differently.

    “We envision other types of more complex guardrails should exist in the future, especially for agentic use cases, e.g., the modern Internet is loaded with safeguards that range from web browsers that detect unsafe websites to ML-based spam classifiers for phishing attempts,” the research paper says.

    Trust in OpenAI has been damaged for some time, so it will take a lot of research and resources to get to a point where people may consider letting GPT models run their lives.


    The original article contains 670 words, the summary contains 199 words. Saved 70%. I'm a bot and I'm open source!

101 comments