## Summary
Multi-scope landing of today's trilogy-drones work bundled into one PR per operator preference. Three substantive workstreams + one carry-forward docs commit:
1. drones rehydrate --run <id> for orphan recovery — closes the local-telemetry gap we hit on PR #2880 today. Implements [AI-58 (D3)](https://linear.app/builder-team/issue/AI-58/d3-local-telemetry-resilience-when-local-parent-process-dies) of the [v0.6 trace + eval epic (AI-56)](https://linear.app/builder-team/issue/AI-56/v06-trace-eval-infrastructure-for-trilogy-drones).
2. B7.9 OFAT model-selection bake-off scaffolding — reproducible 7-cell experiment harness for cost-quality comparison across Opus / Codex / Composer at impl / reviewer / addresser steps. Cells 1 + 2 fully graded; cells 3-7 in flight.
3. B9.10 drone task spec (KLAIR-2772) for migrating generate_gm_commentary + generate_cf_plan off spec.bu_mips — already fired today as PR #2877 in Klair.
4. klair-pr-review skill rubric polish — anti-inflation patterns + per-dim High cap + #2857 retro anchor + mandatory self-check tags. Carry-forward commit from May 22 that never landed on main.
## Why it's needed
### 1. drones rehydrate (AI-58)
Real bug we hit today during the B7.9 OFAT experiment:
1. Orchestrator's first failed Run-AllCells attempt fired a composer agent bc-9864b557-….
2. I killed the local PowerShell parent to fix the orchestrator's cwd / $ErrorActionPreference bugs.
3. Cloud agent kept running independently and opened PR #2880 about 4 minutes later.
4. Local events/run-350eda6c-….jsonl only ever recorded run_started — no pr_opened because the local SDK subscriber had died.
5. pnpm drones list-active reported the run as in-flight at 25 min, 80 min, etc. We only discovered the PR because the user pointed at it manually.
This is a fundamental gap in the local-first telemetry design. drones rehydrate closes it by querying Agent.getRun on the cloud-side state and synthesising the missing terminal event in the local JSONL.
### 2. B7.9 OFAT experiment
Until today, the harness's cost-quality story relied on "trust me, v0.5 is good." The OFAT scaffolding gives us a reproducible methodology for demonstrating:
* Cost-per-AC by model at each pipeline step (pending drones enrich backfill).
* Reproducibility of catch-rate for spec-level bugs (the H1 SSE-wiring finding has now been caught by the reviewer pipeline on 3 different implementer-model PRs — strong cross-validation).
* Implementer / reviewer / addresser model substitutability — early data shows Codex 5.3 at reasoning=high,fast=false is functionally equivalent to Opus 4.7 on this task; cost-per-quality answer pending.
### 3. B9.10 task spec
Drone task spec already fired today as PR #2877 in Klair. Committing the spec into trilogy-drones for parity with the other tasks/klair/*.md specs — reproducible drone fires require the spec to be source-controlled.
### 4. klair-pr-review skill rubric polish
Carry-forward commit from May 22 (PR #2857 retro work) that never landed on main because it was on the post-merge feat/v05-… branch. Rebased onto origin/main as part of this PR.
## Changes
### Commit 1 — feat(rehydrate): drones rehydrate --run <id> for orphan recovery (AI-58)
| File | LOC | Purpose |
|---|---|---|
| src/rehydrate.ts | +430 | Core module |
| src/rehydrate.test.ts | +349 | 8 vitest cases |
| src/cli.ts | +54 | New rehydrate command + import + RehydrateOpts interface. Zero touches to runCmd / reviewCmd / addressCmd / runner code paths — purely additive. |
| README.md | +31 | New "Orphan recovery" section |
Decision table for the synthesised event:
| Cloud status | Synthesised event |
|---|---|
| finished + git.branches[0].prUrl present | pr_opened with that PR URL + branch |
| finished without a PR URL | run_failed ("likely reviewer fan-out child or addresser turn — inspect cloud agent page") |
| error | run_failed status="error" |
| cancelled | cancelled ("local parent likely died") |
| running | no-op; wait + retry, or drones cancel |
Idempotent (running twice produces the same JSONL state). Never throws — all error paths funnel into RehydrateResult{ exitCode: 1, summary: "..." }. Stream counters set to zero on synthesised events (Agent.getRun doesn't expose per-turn data; drones enrich backfills cost / tokens).
### Commit 2 — feat(experiments): B7.9 OFAT model-selection bake-off scaffolding
| File | LOC | Purpose |
|---|---|---|
| experiments/Invoke-DroneCell.ps1 | +424 | Runs one cell end-to-end (impl → review → address) |
| experiments/Run-AllCells.ps1 | +94 | Loop wrapper for the 7-cell sequence; -StartFromCell N for resume |
| experiments/README.md | +153 | Experiment design + cell matrix + --no-auto-review rationale |
| experiments/b79-3x3-scoring-rubric.md | +183 | 15-item AC checklist + reviewer/addresser dimensions |
| experiments/b79-grading-worksheets.md | +386 | Per-cell narrative grading (cells 1 + 2 done, cells 3-7 TBD) |
| experiments/b79-3x3-results.csv | +3 rows | 50-column results table (started/ended_at per phase, agent + run ids, reviewer-child ids, AC / recall / precision / addressed-rate) |
| experiments/cell1-baseline-findings.md | +271 | Saved aggregate reviewer fan-out markdown — gold set for cells 4 + 5 reviewer-recall scoring |
| experiments/cell1-baseline-inline-comments.json | (bin) | The 14 inline comments cell-1 Opus reviewer posted on PR #2879 |
| experiments/cell01-timestamps.txt | +5 | Per-phase ISO timestamps for cost backfill |
| scripts/dump-model-variants.mjs | +50 | SDK introspection helper — Cursor.models.list() path that returns full ModelListItem with aliases + variants. Used during the experiment to discover that /v0/models REST returns variant-flattened names while the SDK returns canonical ids. |
Key design choice: each cell fires drones run --no-auto-review so the addresser becomes a fresh Agent.create standalone (not agent.send on the implementer's same agent). This is what lets cells 6 + 7 vary the addresser model independently of the implementer's choice.
The Invoke-DroneCell.ps1 orchestrator includes a Get-ModelParams helper that splices the right --model-params per non-Opus family (Codex needs reasoning=high,fast=false, Composer needs fast=false). Discovered mid-experiment that the cloud agents endpoint rejects bare {id} submissions for non-Opus families — these need the full param set.
### Commit 3 — feat(tasks): B9.10 — migrate generate_gm_commentary + generate_cf_plan off spec.bu_mips
| File | LOC | Purpose |
|---|---|---|
| tasks/klair/b9-10-migrate-bu-mips-readers.md | +368 | Drone task spec (KLAIR-2772, fired → PR #2877 in Klair) |
### Commit 0 (carry-forward) — docs(klair-pr-review): tighten severity rubric
From May 22, post-PR #4 merge. Reviewer-skill rubric polish that never made it to main:
* Anti-inflation patterns to prevent over-grading
* Per-dimension High severity cap
* PR #2857 retro anchor + mandatory self-check tags
## Headline findings from the experiment (cells 1 + 2)
* H1 SSE-wiring bug is reproducibly a spec ambiguity, not a model differentiator. Both Opus (cell 1) and Codex (cell 2) implementers followed the spec's literal "fire wizardStep" instruction and missed the FE-side SSE bridge. Reviewer fan-out caught it on BOTH cells; addresser fixed it on BOTH cells. Cross-validates the v0.5 multi-agent pipeline's reason for being.
* At reasoning=high,fast=false, Codex 5.3 is functionally equivalent to Opus 4.7 on this task. Same AC score (14/15), same H1 gap, same reviewer/addresser cycle outcome, same merge-ready verdict. Codex is 25% slower at the implementer phase but produces a final PR indistinguishable from Opus's.
* Composer 2.5 with default fast=true introduces hallucinations — cell-3 orphan attempt invented a new tasks/b7-9-…md file the spec didn't ask for, and the Opus reviewer missed flagging it. Cell 3 re-fired with fast=false to standardise the composer-axis comparison.
## Breaking changes
None. Purely additive — existing events/<run-id>.jsonl writers in runner.ts / reviewer.ts / addresser.ts are untouched. Existing schemas + commands behave identically.
## Test plan
* [x] pnpm tsc --noEmit clean
* [x] pnpm test --run — 43/43 pass (20 model-selection + 15 addresser + 8 rehydrate)
* [ ] Smoke test against today's orphan: pnpm drones rehydrate --run run-350eda6c-b0a0-4570-a150-874d84792480 should synthesise a pr_opened event pointing at the (now-closed) PR #2880.
* [ ] Re-running the smoke test should report "Local events JSONL already contains terminal event 'pr_opened' — nothing to rehydrate (idempotent)."
* [ ] Cells 3-7 of the experiment land via Run-AllCells.ps1 -StartFromCell 3 (currently on cell 4 as of this PR push).
## Verification artifact
Sample drones rehydrate invocation against today's orphan:
$ pnpm drones rehydrate --run run-350eda6c-b0a0-4570-a150-874d84792480[drones:rehydrate] Rehydrated pr_opened (PR https://github.com/AI-Builder-Team/Klair/pull/2880, 243.9s wall-clock). Stream counters set to zero — Agent.getRun doesn't expose per-turn data; backfill via 'drones enrich' once cost / token data lands in the team-usage CSV.
Re-run:
$ pnpm drones rehydrate --run run-350eda6c-b0a0-4570-a150-874d84792480[drones:rehydrate] Local events JSONL already contains terminal event 'pr_opened' — nothing to rehydrate (idempotent).
## Out of scope (deferred)
* Webhook-backed cloud event ingestion (Path A in AI-58). File as v0.7+ if rehydrate's polling pattern becomes painful.
* Auto-rehydrate on list-active. Ship explicit command first.
* Reviewer / addresser run-kind detection in rehydrate. When Agent.getRun starts exposing per-turn metadata that maps cleanly to review_completed / address_completed, a future revision can dispatch on that instead of the current "finished-without-prUrl → run_failed" fallback.
* Per-cell cost backfill in the experiment — pending drones enrich invocation across cells 1-7.
* Statistical-significance follow-up — N=1 per cell; if the reviewer-axis or addresser-axis surfaces a surprising effect, do N=3 on that axis before drawing conclusions.
## Related
* [AI-56 (v0.6 epic)](https://linear.app/builder-team/issue/AI-56/v06-trace-eval-infrastructure-for-trilogy-drones)
* [AI-58 (D3 — this PR)](https://linear.app/builder-team/issue/AI-58/d3-local-telemetry-resilience-when-local-parent-process-dies)
* [AI-57 (D1 — next: per-turn trace persistence)](https://linear.app/builder-team/issue/AI-57/d1-per-turn-trace-persistence-via-runstream-subscription)
* Klair PR #2877 (B9.10 task spec → drone fire outcome)
* Klair PR #2880 (the orphan that motivated drones rehydrate; now closed)