SolidGoldMagikarp

A curated collection of the most important AI research papers, organized for engineers and researchers who want to understand what actually matters.

Curated by @nilukulasingham

In 2023, researchers discovered that feeding the token " SolidGoldMagikarp" into GPT-3 caused the model to behave erratically — hallucinating, repeating text, claiming to be alive, and breaking in ways nobody predicted. These "anomalous tokens" were artifacts of the tokenizer: strings that existed in the token vocabulary but appeared rarely or never in training data, creating blind spots in the model's learned representations.

The discovery became one of the most fascinating examples of how language models can fail in unexpected ways, and it opened up deeper questions about tokenization, training data coverage, and model robustness that the field is still working through.

This reading list is named after that discovery. It collects the papers that have shaped modern AI — from the foundational architectures to the latest work on reasoning, safety, and interpretability. Each entry explains not just what the paper did, but why it matters.

The Anomalous Tokens Story
Foundational Models & Architectures
Scaling & Emergent Behavior
Alignment, Safety & RLHF
Interpretability & Mechanistic Understanding
Reasoning & Agents
Image & Multimodal Models
Training Techniques & Efficiency
Open-Weight Models & Democratization
Code & Mathematics
Retrieval & Knowledge
Speech & Audio
Historical Foundations
Contributing

The Anomalous Tokens Story

The papers and posts that started it all — the discovery of tokens that break language models, and the interpretability research that helped explain why.

SolidGoldMagikarp: Anomalous tokens in GPT-2 and GPT-3 — Jessica Rumbelow & Matthew Watkins (2023)

Part I: SolidGoldMagikarp plus, prompt generation · Part II: Technical details

Rumbelow and Watkins found that certain tokens in GPT's vocabulary — strings like " SolidGoldMagikarp", " TheNitromeFan", and " attRot" — cause the model to produce bizarre and unpredictable outputs when used in prompts. The model would hallucinate, evade questions, claim to be human, or produce garbled text.

The root cause turned out to be a mismatch between the tokenizer and the training data. These tokens were present in the BPE vocabulary (derived from a Reddit dataset) but appeared extremely rarely or never in the actual training corpus. The model essentially had "blind spots" — vocabulary entries it never learned meaningful representations for.

This work matters because it exposed a fundamental gap in how language models are built and tested. It demonstrated that model failures can originate not in the architecture or training process, but in the seemingly mundane step of tokenization. The discovery spurred new research into token coverage auditing, vocabulary pruning, and tokenizer-model alignment.

Decomposing the Dark Matter of Tokenizers — Rumbelow et al (2024)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SolidGoldMagikarp

Table of Contents

The Anomalous Tokens Story

Foundational Models & Architectures

Scaling & Emergent Behavior

Alignment, Safety & RLHF

Interpretability & Mechanistic Understanding

Reasoning & Agents

Image & Multimodal Models

Training Techniques & Efficiency

Open-Weight Models & Democratization

Code & Mathematics

Retrieval & Knowledge

Speech & Audio

Historical Foundations

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages