Signal-to-system playbooks
Practical guidance for teams deciding when a frontier signal deserves a trained model, a dataset, an eval gate, or a safer rollout path.
Start with decision quality, then move into data quality, then enforce ruthless evaluation before launch.
How to decide between prompting, RAG, fine-tuning, or full custom training without burning six weeks for a 4% gain.
Read playbook →Build a dataset that reflects real user behavior, edge cases, and failure modes instead of happy-path vanity examples.
Read playbook →A practical eval stack: offline evals, scenario tests, red-team checks, and hard release thresholds that block bad launches.
Read playbook →A plain-language decision matrix for selecting the lightest approach that achieves your target behavior and reliability.
Read playbook →A four-part field series connecting Eric's pico-LLM experiment to nanochat, small-language-model research, and practical ability training.
The preserved 64 MB RPG-state experiment reframed around constrained domains, token contracts, failure-first evals, and capability-per-megabyte.
Read part 1 →How nanochat's depth dial turns training into a comparable family of compute-optimal models instead of one-off checkpoint luck.
Read part 2 →What changes when GPT-2-level capability becomes cheap enough to repeat, and why data and evals become the real constraint.
Read part 3 →How synthetic data, token-visible task design, and identity tuning turn small models into useful narrow specialists.
Read part 4 →