OpenAI has taken significant measures to restrict users’ access to the internal workings of its newest AI model family, named “Strawberry,” which includes o1-preview and o1-mini. Since the launch last week, OpenAI has issued stringent warnings and potential bans to users attempting to investigate the model’s operations.
Distinct from earlier models like GPT-4o, the o1 series has been designed to follow a step-by-step problem-solving process before providing answers. Users interacting with o1 through ChatGPT can view this process, termed a chain of thought, in the interface. However, the raw data underpinning this chain of thought is intentionally obscured, and what users see is a filtered version produced by a secondary AI model.
This concealment has intrigued AI enthusiasts who are now employing various techniques such as jailbreaking and prompt injection to unveil o1’s authentic reasoning process. Despite initial reports suggesting some success, no results have been definitively confirmed.
OpenAI has been actively monitoring interactions through the ChatGPT interface and has rigorously enforced their policies against probing the o1 model’s logic. Several users have reported receiving warning emails for using terms like “reasoning trace” or inquiring about the model’s reasoning capabilities during their interactions with ChatGPT.
One user on X described receiving a warning email for violating policies against circumventing safeguards, accompanied by a reminder to adhere to OpenAI’s Terms of Use. The warning indicated that further violations could result in the loss of access to GPT-4o with Reasoning, an internal reference to the o1 model.
Marco Figueroa, who manages Mozilla’s GenAI bug bounty programs, was among the first to report the warning emails on X, expressing concern that such measures hinder positive red-teaming safety research. He noted this action after engaging in several jailbreak attempts, stating his account is now flagged for potential banning.
In an OpenAI blog post titled “Learning to Reason With LLMs,” the company explained that concealing the chains of thought allows for enhanced monitoring and understanding of the AI model’s internal processes. OpenAI indicated that maintaining raw, uncensored chains of thought is vital for specific monitoring purposes, such as detecting attempts by the model to manipulate users. Nonetheless, these unaltered chains are not made visible to users due to potential misalignment with commercial objectives.