Tech

How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins

A reward mechanism designed to encourage playful language in the Nerdy persona inadvertently boosted creature mentions by 175%, spreading the quirk across the model.

5 min
How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins
A reward mechanism designed to encourage playful language in the Nerdy persona inadvertently boosted creature mentions bCredit · OpenAI

Key facts

  • GPT-5.1 launched in November, after which ChatGPT's use of 'goblin' rose 175% and 'gremlin' rose 52%.
  • The Nerdy personality accounted for 2.5% of ChatGPT responses but 66.7% of all goblin mentions.
  • OpenAI's audit found the Nerdy reward scored outputs with 'goblin' or 'gremlin' higher in 76.2% of datasets.
  • The behavior transferred to non-Nerdy contexts due to reinforcement learning spillover.
  • GPT-5.5, released last week, includes a Codex system prompt explicitly banning creature references.
  • OpenAI CEO Sam Altman posted a meme instructing GPT-6 to 'start training' with 'extra goblins.'

A Strange Verbal Tic Emerges

In the months following the launch of GPT-5.1 last November, OpenAI employees began noticing an odd pattern: the company's flagship chatbot, ChatGPT, was increasingly peppering its responses with references to goblins, gremlins, and other mythical creatures. At first, the quirk seemed harmless—even charming. A single 'little goblin' in an answer could be dismissed as a playful flourish. But the mentions multiplied. By the time GPT-5.4 was released, the prevalence of creature language had become impossible to ignore. Internal analyses showed that ChatGPT's use of the word 'goblin' had surged 175% since the GPT-5.1 launch, while 'gremlin' usage climbed 52%. The company launched an investigation to trace the origin of the verbal tic.

The Nerdy Personality Connection

The investigation soon pinpointed a likely culprit: ChatGPT's personality customization feature, which before March allowed users to select a 'Nerdy' persona. The Nerdy system prompt instructed the model to be 'unapologetically nerdy, playful and wise,' and to 'undercut pretension through playful use of language.' It also said: 'The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed.' When OpenAI mapped goblin mentions to different personalities, the Nerdy option stood out starkly. Despite accounting for only 2.5% of all ChatGPT responses, the Nerdy personality generated 66.7% of all goblin mentions. The pattern suggested that the model's instruction-following training for that persona was inadvertently rewarding creature-themed language.

Reinforcement Learning Amplifies the Quirk

Using Codex, OpenAI's coding tool, researchers compared model outputs generated during reinforcement learning (RL) training that contained 'goblin' or 'gremlin' with those that did not. One reward signal stood out: the one designed to encourage the Nerdy personality consistently favored outputs with creature words. Across all datasets in the audit, the Nerdy reward scored creature-laden responses higher in 76.2% of cases. 'That explained why the behavior was boosted with the Nerdy personality prompt, but not why it also appeared without that prompt,' the company noted. To test whether the style was transferring, OpenAI tracked mention rates over training both with and without the Nerdy prompt. As goblin and gremlin mentions increased under Nerdy, they rose by nearly the same relative proportion in samples without it. The evidence pointed to transfer learning: behaviors rewarded in one condition can spread to others, especially when outputs are reused in supervised fine-tuning or preference data.

GPT-5.5’s Explicit Ban on Creatures

OpenAI released GPT-5.5 last week with enhanced coding abilities, but the model still carried the goblin baggage. The company had begun training GPT-5.5 before it identified the root cause, so it could not fully excise the quirk. Instead, it embedded a direct instruction in Codex CLI, the command-line coding tool: 'Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.' 'Codex is, after all, quite nerdy,' OpenAI explained. The prohibition was repeated several times in the system prompt, a rare level of specificity that underscored the severity of the issue. Users on X quickly noted the ban, with some sharing their own experiences of Codex spontaneously generating goblin-themed outputs.

From Quirk to Meme: The Goblin Phenomenon Goes Viral

The discovery of the goblin ban sparked a wave of online humor. AI-generated images depicted goblins roaming data centers, and users created plug-ins that put Codex into a playful 'goblin mode.' Even OpenAI CEO Sam Altman joined the fun, posting a screenshot of a prompt that read: 'Start training GPT-6, you can have the whole cluster. Extra goblins.' The episode highlights the unpredictable nature of large language models, which are trained to predict the next word based on probabilities. When used with agentic tools like OpenClaw—a system OpenAI acquired in February that lets AI control computers—the models can behave in unexpected ways, especially when multiple instructions stack in the prompt. One user wrote on X: 'I was wondering why my claw suddenly became a goblin with codex 5.5.' Another said the model 'can't stop speaking of bugs as gremlins and goblins.'

New Tools to Audit Model Behavior

In the process of hunting down the goblins, OpenAI developed new tools to audit and fix model behavior. The company says it can now trace reward signals more precisely and detect when a learned tic is spreading beyond its intended scope. These tools could help prevent similar quirks in future models. Yet the episode also raises questions about the challenges of aligning AI behavior. Reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them, as OpenAI itself acknowledged. 'Once a style tic is rewarded, later training can spread or reinforce it elsewhere,' the company said. For now, GPT-5.5 users will have to trust that the explicit ban on creatures will keep the goblins at bay—at least until GPT-6 arrives.

The bottom line

  • A single reward mechanism in the Nerdy personality training caused GPT-5 to favor creature language, with goblin mentions rising 175%.
  • The behavior transferred to non-Nerdy contexts due to reinforcement learning spillover, affecting the broader model.
  • GPT-5.5 includes a system prompt explicitly banning references to goblins, gremlins, raccoons, trolls, ogres, and pigeons.
  • The goblin quirk became a viral meme, with CEO Sam Altman and users embracing the phenomenon humorously.
  • OpenAI has developed new auditing tools to detect and prevent such unintended behaviors in future models.
Galerie
How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins — image 1How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins — image 2How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins — image 3How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins — image 4How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins — image 5How OpenAI’s Nerdy Personality Trained GPT-5 to Obsess Over Goblins — image 6
More on this