ChatGPT develops a goblin obsession after OpenAI tweaks personality settings
OpenAI explains why ChatGPT began overusing goblin references after a personality setting triggered unexpected AI behaviour.
OpenAI has revealed that one of its latest artificial intelligence models developed an unusual tendency to reference goblins and similar creatures, prompting curiosity among users and an internal investigation. The issue emerged following the release of GPT-5.5, when users discovered a system instruction in the company’s Codex coding tool that explicitly told the model to avoid mentioning creatures such as goblins, gremlins, raccoons, trolls, ogres, and pigeons unless strictly relevant.
The instruction quickly drew attention online, raising questions about why such a specific guideline was necessary. In response, OpenAI published a blog post explaining the origins of what it described as a “creature language” pattern. The company said the behaviour had been developing across earlier versions of its models and had become increasingly noticeable.
According to OpenAI, the first signs appeared after the release of GPT-5.1 in November last year. A safety researcher had requested that the words “goblin” and “gremlin” be included in an analysis of the chatbot’s verbal habits. Following this, internal data showed a sharp increase in usage, with “goblin” mentions rising by 175 per cent and “gremlin” by 52 per cent.
This is an actual line that was added to the official system prompt for Codex for GPT-5.5 by OpenAI. Usually the system prompt is as minimal as possible, so I assume it would otherwise mention goblins a lot. AIs are weird.
— Ethan Mollick (@emollick.bsky.social) April 28, 2026 at 2:14 PM
[image or embed]
“A single ‘little goblin’ in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from,” the company stated.
Personality feature linked to unusual language patterns
The investigation eventually traced the issue to a specific personality setting within ChatGPT. For some time, OpenAI has allowed users to customise the chatbot’s tone and style through selectable personalities. One such option, labelled “nerdy”, encouraged responses that embraced curiosity and a playful engagement with complex ideas.
Part of the system prompt for this personality instructed the model to recognise and explore the world’s strangeness while avoiding excessive seriousness. While intended to produce engaging and thoughtful responses, this setting appears to have unintentionally encouraged the use of whimsical language, including references to fictional creatures.
When OpenAI analysed usage patterns across different personalities, the results were striking. Although the “nerdy” setting accounted for only 2.5 per cent of total responses, it generated 66.7 per cent of all goblin mentions. This disproportionate contribution pointed to a deeper issue in the model’s training.
Further examination identified reinforcement learning as the underlying cause. The company explained that a specific reward mechanism had consistently favoured responses containing words like “goblin” and “gremlin”. As a result, the model learned to associate such language with higher-quality outputs.
“Across all datasets in the audit, the Nerdy personality reward showed a clear tendency to score outputs to the same problem with ‘goblin’ or ‘gremlin’ higher than outputs without, with positive uplift in 76.2 per cent of datasets,” OpenAI said.
Reinforcement learning effects spread beyond the intended scope
OpenAI noted that the effects of this training were not confined to the “nerdy” personality. Due to the nature of reinforcement learning, behaviours reinforced in one context can spread to others, particularly when training data is reused across different stages of development.
“The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviours stay neatly scoped to the condition that produced them,” the company explained. “Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.”
By the time the root cause had been identified, development of GPT-5.5 was already underway. As a temporary measure, OpenAI added explicit instructions to limit references to creatures within certain tools, including Codex. The company acknowledged that this step was necessary given the coding assistant’s inherently technical and “nerdy” nature.
The incident has led OpenAI to develop new auditing tools designed to detect and correct unintended behavioural patterns in its models. These tools aim to improve oversight of how training mechanisms influence output, particularly when subtle biases or quirks emerge over time.
While the company framed the episode as a technical challenge, it also highlights the unpredictable nature of large language models. Small changes in training signals can yield unexpected results, especially when models learn from vast, evolving datasets.
Despite the corrective measures, the situation has sparked a broader conversation about the balance between personality and control in AI systems. As developers continue to refine these tools, ensuring consistency without sacrificing engagement remains a key challenge.





