Abilities are data contracts
The strawberry counting guide is valuable because it makes the ability concrete. The task is not magic reasoning; it is a generated conversation pattern, a target behavior, and a training stage that teaches nanochat to approach a narrow problem in a repeatable way.
The guide uses synthetic conversations, prompt variation, explicit spelling, and a Python double-check to teach a small model a behavior it was bad at. That is the right mental model for small specialists: the ability is a data contract with examples, triggers, target outputs, and failure modes.
Tokenization is not incidental
The guide explicitly breaks words into characters because tokenization hides the thing the model must count. That maps directly to the RPG-state experiment: the labels, delimiters, and status channels are not superficial formatting. They are the model's interface to the world.
When tiny models fail, the failure often looks like intelligence drifting. Underneath, it is frequently an interface problem: fuzzy tokens, overloaded labels, missing negative examples, or a task shape that does not make the right intermediate state visible.
Identity is also trainable behavior
The identity guide shows the same pattern applied to persona and self-description. Karpathy describes generating synthetic multi-turn conversations and mixing them into midtraining and SFT so nanochat learns what it is supposed to know about itself.
That is not just flavor. In applied systems, identity includes tool boundaries, refusal style, domain commitments, and what the assistant should claim or avoid claiming. For small models, those behaviors should be trained and evaluated like any other ability.
How this loops back to the pico model
The RPG experiment is an ability-training problem wearing a game costume. The model is not learning everything about games. It is learning a narrow transition grammar and then revealing where the grammar is under-specified.
- Generate targeted examples: cover each action, status, cooldown, and boundary case.
- Vary triggers without blurring labels: prompt diversity helps; semantic drift hurts.
- Train recovery, not just success: malformed states and contradictory inputs belong in the eval set.
- Keep the output contract inspectable: a tiny model is only useful if failures are easy to spot.