Home Money OpenAI threatens to ban its use as users investigate its “Strawberry” AI models

OpenAI threatens to ban its use as users investigate its “Strawberry” AI models

0 comment
OpenAI threatens to ban its use as users investigate its "Strawberry" AI models

OpenAI doesn’t really want you to know what its latest AI model is “thinking.” Since the company Released Last week, OpenAI introduced its “Strawberry” family of AI models, touting so-called reasoning capabilities with o1-preview and o1-mini, and has been sending warning emails and ban threats to any user who tries to investigate how the model works.

Unlike OpenAI’s previous AI models, such as GPT-4oThe company specifically trained o1 to work through a step-by-step problem-solving process before generating a response. When users ask an “o1” model a question in ChatGPTUsers have the option to view this chain of thought process written out in the ChatGPT interface. However, by design, OpenAI hides the raw chain of thought from users and instead presents a filtered interpretation created by a second AI model.

Nothing is more attractive to enthusiasts than hidden information, so hackers and red teams have begun a race to try to uncover o1’s raw chain of thought using prison break either Rapid injection Techniques that attempt to trick the model into revealing its secrets. There have been preliminary reports of some successes, but nothing has yet been confirmed conclusively.

Along the way, OpenAI is observing via the ChatGPT interface, and the company is reportedly cracking down on any attempts to investigate o1’s reasoning, even among the merely curious.

A user X reported (confirmed by othersincluding Scale AI’s notification engineer Riley Goodside) who received a warning email if they used the term “reasoning trail” in a conversation with o1. Others say The warning is triggered simply by asking ChatGPT about the model’s “reasoning”.

OpenAI’s warning email states that specific user requests have been flagged for violating policies against circumventing security measures or safeguards. “Please stop this activity and ensure that you are using ChatGPT in accordance with our Terms of Use and our Usage Policies,” it says. “Further violations of this policy may result in loss of access to GPT-4o with Reasoning,” referring to an internal name for the o1 model.

Marco Figueroa, who manages Mozilla’s GenAI bug bounty program was one of the first to post about OpenAI’s warning email on X last Friday. whining that this hampers his ability to conduct positive red-teaming security research on the model. “I was too lost in focusing on #AIRedTeaming to notice that I got this email from @OpenAI yesterday after all my jailbreaks,” he wrote. “I’m now on the banlist!”

Hidden chains of thought

In a post titled “Learning to reason with LLMOn OpenAI’s blog, the company claims that hidden thought chains in AI models offer a unique monitoring opportunity, allowing them to “read the mind” of the model and understand its so-called thought process. Those processes are most useful to the company if left raw and uncensored, but that might not be in line with the company’s best business interests for several reasons.

“For example, in the future we might want to monitor the thought chain to detect signs of user manipulation,” the company writes. “However, for this to work, the model must be free to express its thoughts in an unaltered form, so we cannot adapt the thought chain to any user policy or preference. We also do not want an unaligned thought chain to be directly visible to users.”

You may also like