Custom Model Training Playbooks

Nanochat / SLM Series

Field notes and a four-part series connecting Eric's local-inference experiments, pico-LLM work, nanochat, small-language-model research, and practical ability training.

Field note Local AI TCO

Local AI: When TCO is cash positive?

A Jetson Nano field note on local AI economics, always-on generation, payback periods, and when edge inference starts saving more than it costs.

Read field note →

Field note Prompt stacking

Gardeners of the Latent Space

A prompt-craft field note on queued agents, latent-space gardening, and pruning model work into practical operating loops.

Read field note →

Field note Agent workflow

Hermes Kanban Slays Your Ticketing System

A field note on why agent-native Kanban, triage, and human unblock points make multi-step AI work feel less like Jira and more like a live operating system.

Read field note →

Field note Writing loop

Write As You Go. Schedule Ahead.

A project-writing field note on capturing lessons while the system is still warm, drafting ahead, and turning publication into a reusable learning loop.

Read field note →

Field note Pixel art coaching

Pixel Art Technique Coaching with 5.4.L

A short model-training note on using the generated artifact itself to expose pixel-art technique cues the model can reuse.

Read field note →

Field note OpenClaw + nanochat

GPU Day for OpenClaw

A short SLM field note on handing the GPU to an OpenClaw instance for nanochat experiments in intelligence per megabyte.

Read field note →

Field note NPU

Tinkering with a NPU

A first look at the Snapdragon Elite X1E80100 Qualcomm Hexagon NPU as a local-inference path for low-cost agentic tokens.

Read field note →

Field note Jetson Nano

Learning by Osmosis on the Jetson Nano

A rainy-day local-hardware note on testing whether a Jetson Nano can carry useful learning loops close to the bench.

Read field note →

Field note Qwen

qwen3.5:0.8b One-Shot App Planning

A short local-inference field note on a tiny Qwen model sketching a full C# app plan on older NVIDIA hardware in under a second.

Read field note →

Field note Qwen Fine-tuning

Qwen Franklin: Benjamin Franklin Fine-Tuned Model

A field note on custom Qwen LoRAs, Benjamin Franklin model voice, pseudo-SWE-Bench experiments, and narrow-model usefulness.

Read field note →

Field note Ternary weights

Ternary Weights Training Field Note

A short model-training note on constraining weights to -1, 0, or 1 and using overnight run stats to pressure-test generalization.

Read field note →

Field note NPU + Gemma

NPU and Gemma4

A local-inference field note on Gemma, ONNX, battery-powered AI, and the harness needed to turn a small local model into a useful agent.

Read field note →

Field note ONNX visualization

ONNX Pipeline Visualization

A debugging field note on 3D pipeline views, runtime charts, and tooling ideas for finding where inference performance actually drops.

Read field note →

Field note QNN + NPU

Snapdragon Hexagon: Learning about QNN

A QNN field note on HTP, HMX, HVX, ONNX graph partitioning, and why Gemma4 workloads may still route through a hot CPU.

Read field note →

Field note NPU prefill

Still Need to Fix the Prefill Operation on NPU

A short benchmark note on why selected NPU runs can still show CPU token work when prompt prefill has not moved to the accelerator.

Read field note →

Field note Vision benchmark

Gemma E4B Vision: Snapdragon Benchmarks

A local Snapdragon benchmark note showing how a Gemma vision harness compares CPU, GPU, and NPU profiles on year-old hardware.

Read field note →

Field note GPU dispatch

Gemma4 on Snapdragon: GPU case

A Gemma4 field note on moving a custom Snapdragon harness from slow token/sec baselines toward usable GPU-backed local inference.

Read field note →

Field note Gemma harness

Experimenting with a Custom Gemma Harness

A short field note on Snapdragon NPU experiments, local Gemma sessions, and the slash commands that make a tiny-model harness usable.

Read field note →

Field note GPU dispatch

Gemma4 GPU Dispatch Field Note

A short progress note on making local Gemma4 workloads target the GPU instead of drifting into CPU fallback.

Read field note →

Field note Vision harness

Gemma4 Harness for Vision on GPU/NPU/CPU

A local Gemma4 vision harness for progressive screenshot inspection, ONNX sessions, and cheap edge inference across GPU, NPU, and CPU targets.

Read field note →

Field note Inference benchmarks

Local Inference Puzzle

A benchmark field note on why CPU, GPU, and NPU numbers can tell a surprising story until the ONNX graph, kernels, and DirectML path are made visible.

Read field note →

Part 1 Pico models

Why Tiny Specialists Matter

The preserved 64 MB RPG-state experiment reframed around constrained domains, token contracts, failure-first evals, and capability-per-megabyte.

Read part 1 →

Part 2 nanochat

The Depth Dial and Miniseries

How nanochat's depth dial turns training into a comparable family of compute-optimal models instead of one-off checkpoint luck.

Read part 2 →

Part 3 Economics

GPT-2 Economics Under $100

What changes when GPT-2-level capability becomes cheap enough to repeat, and why data and evals become the real constraint.

Read part 3 →

Part 4 Abilities

Training Small Model Abilities

How synthetic data, token-visible task design, and identity tuning turn small models into useful narrow specialists.

Read part 4 →

Operating Principles

Use the smallest sufficient intervention: if prompt design solves it, do not train.
Ground decisions in production pain: train against real failures, not vibes.
Version everything: data, prompts, eval sets, and model artifacts need traceability.
Gate every release: no pass on evals means no launch, even when deadlines scream.
Measure drift continuously: a model can degrade quietly while dashboards still look pretty.

Custom Model Training

Core Playbooks

Local Vision Model: Examples and thoughts

Local Vision Tools: Independence Town

When Custom Training Is Actually Worth It

Dataset Design and Curation

Evaluation and Release Gates

Fine-Tune vs RAG vs Prompting

Nanochat / SLM Series

Local AI: When TCO is cash positive?

Gardeners of the Latent Space

Hermes Kanban Slays Your Ticketing System

Write As You Go. Schedule Ahead.

Pixel Art Technique Coaching with 5.4.L

GPU Day for OpenClaw

Tinkering with a NPU

Learning by Osmosis on the Jetson Nano

qwen3.5:0.8b One-Shot App Planning

Qwen Franklin: Benjamin Franklin Fine-Tuned Model

Ternary Weights Training Field Note

NPU and Gemma4

ONNX Pipeline Visualization

Snapdragon Hexagon: Learning about QNN

Still Need to Fix the Prefill Operation on NPU

Gemma E4B Vision: Snapdragon Benchmarks

Gemma4 on Snapdragon: GPU case

Experimenting with a Custom Gemma Harness

Gemma4 GPU Dispatch Field Note

Gemma4 Harness for Vision on GPU/NPU/CPU

Local Inference Puzzle

Why Tiny Specialists Matter

The Depth Dial and Miniseries

GPT-2 Economics Under $100

Training Small Model Abilities

Recent field records

Z Image Turbo on the Jetson

Krea2, OSS image model, Jetson Nano

Corporate Goblin LoRA on Qwen

Operating Principles