Character Persistence

Reference Images Are Not Memory

A generated portrait helps, but it does not absolve the harness from doing accounting.

Every person, portrait, message, relationship, and name discussed here is generated. These are fictional harness artifacts, not real people or real relationships.

The Temptation

Once the character ledger started duplicating names, the obvious answer was: fine, give the harness reference images.

This is a reasonable instinct. If the model keeps wobbling on who a person is, show it the person. Store a reference sheet. Preserve the face, outfit, posture, and a few close-up details. Then, when the character appears later, use the reference to keep identity from sliding sideways.

That does help. It is also not enough.

A reference image is a visual anchor, not a memory system. It can keep a fictional neighbor from changing hair color every three frames. It can make a spouse-like contact look less freshly sampled. It can give the renderer something concrete to imitate. But a generated world does not only need faces. It needs permissions, roles, locations, timing, relationship state, clothing rules, and a policy for when the reference is allowed to matter.

Otherwise the portrait becomes a very nice-looking sticker on top of a weak ledger.

Generated character reference sheet for fictional neighbor Mark Reynolds.
Exploration 07 persistent character reference: `suburban-house-john-001_mark`. Generated reference sheet for a fictional neighbor contact.

What the Reference Knows

The Mark Reynolds record is useful because it shows the difference between appearance and identity. The image gives the renderer a body and style. The manifest gives the harness a role.

{
  "id": "mark",
  "name": "Mark Reynolds",
  "kind": "neighbor",
  "relationship": "next-door neighbor and casual friend",
  "defaultLocations": ["sidewalk", "front_yard", "driveway"],
  "visitRules": [
    "May appear on the sidewalk, front yard edge, driveway, or just outside the open garage.",
    "May step into the garage only if John invites him or if he is returning a borrowed tool.",
    "Should not enter the hallway, kitchen, bedroom, or living room without a clear invitation and reason."
  ]
}

The portrait by itself cannot enforce any of that. A face cannot tell the model whether Mark belongs in the kitchen. It cannot decide whether a phone-only contact should become physically present. It cannot distinguish a driveway chat from an interior social visit.

The reference image answers: what should Mark look like if he appears? The manifest answers: should Mark appear here at all?

Working rule: references should be subordinate to state. If the visual reference and the visit rules disagree, the visit rules should win.

The Failure Pattern

The most dangerous version of this failure is not an ugly portrait. It is a convincing portrait in the wrong place.

Generated images can make people look real enough that the viewer stops asking whether the ledger earned their presence. That is exactly why the harness needs boring constraints. A mail carrier can approach the front entry for delivery. A neighbor can wave from the sidewalk. A spouse can be represented as a phone contact if away from home. A brother can enter the garage if the scene establishes a visit. These are not story details first. They are containment rules.

If those rules are missing, the model can use the reference image as permission to instantiate the character whenever the next frame feels socially plausible. That is how a generated world gets crowded with people who look consistent but arrive without cause.

It is the same old problem in a better outfit. Pretty continuity can hide causal discontinuity.

What Should Change

The next harness should treat a character reference as one part of a bundle. The bundle needs a canonical ID, aliases, appearance capsule, default outfit, allowed location tags, allowed location IDs, visit rules, and recent evidence.

It also needs confidence. If the renderer produces someone who looks enough like Mark but puts him in the hallway, the analyzer should not simply say Mark is present. It should say something closer to: possible Mark-like character, location violates visit rules, reject or repair.

This is less fun than letting the world improvise. It is also the difference between a sequence of compelling images and a world that can survive inspection.

Reference images are valuable. They are just not memory. Memory is the ledger that tells the reference when it is allowed to enter the room.