Nanochat / SLM Series - Part 1

Why Tiny Specialists Matter

A 64 MB RPG-state model is the small end of the same curve nanochat makes visible: constrained tasks, explicit token formats, failure-first evals, and capability-per-megabyte.

Part 1 of 4. A 64 MB RPG-state model shows why tiny specialists matter: constrained domains, explicit formats, and failures you can inspect.
Tiny model experiment screenshot showing compact model output for RPG state transitions
Initial experiment snapshot: compact model output from the pico-LLM training loop.

The question under the experiment

The original field note asked a deliberately narrow question: how much useful behavior can fit inside a model small enough to feel almost unserious? A 64 MB model running RPG combat logic is not a general assistant. That is the point. It is a test of whether a tiny learned system can absorb a constrained grammar, emit plausible state transitions, and expose where the representation breaks.

That question now has a larger context. The SLM survey frames small language models as a serious research lane for accessible, affordable, efficient intelligence, while Phi-3 shows that compact models can matter in practical deployment contexts instead of serving only as toy baselines.

Series thesis: the useful frontier is not only larger models. It is the ability to train, evaluate, and compose smaller models that do one thing clearly enough to trust.

From prompt to game engine behavior

The model is trained on a compact, template-driven format representing turn-based RPG state. Inputs encode turn counters, status effects, cooldown slots, and actions like poison_strike, ignite, heal, and guard. Outputs resolve toward a canonical next-state block.

Prompt template and turn sequence example used for pico LLM RPG training
Template-driven prompt shape used for turn-by-turn combat state prediction.

That makes the model behave like a fuzzy state-transition engine. Not deterministic code, but learned transitions with enough structure to produce coherent combat outcomes under normal conditions.

Legend of RPG status channels and token fields for state transition training
Field legend and token semantics: the tiny format details that decide whether training behaves or drifts.

Where nanochat changes the frame

nanochat is useful here because it treats small-model training as an end-to-end system instead of a mystery box. The repo covers tokenization, pretraining, finetuning, evaluation, inference, and a chat UI, which is exactly the pipeline view a tiny specialist needs.

The important contrast is scale discipline. My RPG experiment starts at the very small, domain-specific end. nanochat shows the same discipline at a more general LLM scale: change the model depth, keep the pipeline coherent, evaluate with comparable metrics, and learn where the curve bends.

The useful failures

The best part of the note is not that the model works. It is where behavior degrades: label collapse, token drift, and boundary confusion when malformed or overloaded labels are introduced.

Failure example where regen label handling causes unstable output
Failure case: regen-focused perturbation that destabilizes sequence tracking.

These failures map the model's internal compression limits and expose where representation quality breaks down. In practical training terms, they tell you what to fix next in data format, token conventions, and eval coverage.

Label collapse example in tiny model output
Label collapse in action: output structure starts to unravel once token semantics blur.
Poison action perturbation causing state-transition prediction breakdown
Poison-state perturbation: partial retention of effects with broken entity continuity.

What this means for custom model training

Closing visual from the pico LLM experiment note
Closing snapshot from the experiment thread: tiny models, grounded constraints, useful lessons.

Sources behind the argument