{"generated_at":"2026-06-20T23:26:13Z","active_sessions":[{"session_id":"20260620-152947-3837097","latest_iteration":50,"latest_task":"RG.PICKER.EXCLUSIVE-FOCUS-PIN","last_modified":"2026-06-20T23:07:25Z","active":false},{"session_id":"20260620-224831-399799","latest_iteration":6,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-20T22:07:02Z","active":false},{"session_id":"20260620-224810-2064905","latest_iteration":1,"latest_task":"RALPH.BACKLOG.ES.DRIVE.B.B.B","last_modified":"2026-06-20T21:07:26Z","active":false},{"session_id":"20260618-102731-60241","latest_iteration":1883,"latest_task":"RG.BACKLOG.MATERIALIZE-SOLE-WRITER.B.2","last_modified":"2026-06-20T13:13:32Z","active":false},{"session_id":"20260615-163425-2547656","latest_iteration":1066,"latest_task":"PLATFORM.FEDERATION.S4","last_modified":"2026-06-17T21:12:03Z","active":false},{"session_id":"20260615-160819-3810424","latest_iteration":602,"latest_task":"RALPH.NODE.PHASE-E.RUNNER-SEAM","last_modified":"2026-06-16T08:11:33Z","active":false},{"session_id":"20260615-174316-1695396","latest_iteration":39,"latest_task":"GAP.AUDIT.ENGINE.F1.MEMBERSHIP-RULE.C","last_modified":"2026-06-16T03:29:23Z","active":false},{"session_id":"20260615-160732-1453849","latest_iteration":1,"latest_task":"BUG.BOOKING.ISMEMBER-UNKNOWN-SYMBOL","last_modified":"2026-06-15T14:44:44Z","active":false},{"session_id":"20260615-150843-1236074","latest_iteration":2,"latest_task":"SPEC.ADR-0056.RECIPROCAL","last_modified":"2026-06-15T13:36:27Z","active":false},{"session_id":"20260615-131622-780899","latest_iteration":6,"latest_task":"OPS.FLEET.REGISTRY-GAP","last_modified":"2026-06-15T12:55:12Z","active":false},{"session_id":"20260615-134034-3320254","latest_iteration":7,"latest_task":"BOARD.SSOT.S3","last_modified":"2026-06-15T12:49:48Z","active":false},{"session_id":"20260615-080725-3800868","latest_iteration":11,"latest_task":"RG.BOARD.CROSS-BRANCH-DUP-TASKS","last_modified":"2026-06-15T11:06:01Z","active":false},{"session_id":"20260615-080303-882799","latest_iteration":21,"latest_task":"RALPH.BACKLOG.ES.S3.b.2.B.B","last_modified":"2026-06-15T10:21:35Z","active":false},{"session_id":"20260614-235658-1683274","latest_iteration":33,"latest_task":"DEPLOY.PR745.TRIAGE","last_modified":"2026-06-15T05:58:01Z","active":false},{"session_id":"20260614-235400-3090546","latest_iteration":33,"latest_task":"BUG.HARNESS.TOOL-RESULT.4","last_modified":"2026-06-15T04:49:26Z","active":false},{"session_id":"20260614-125731-4038462","latest_iteration":23,"latest_task":"BUG.TEST-QUEUE-FALLBACK.LOCAL-FAIL","last_modified":"2026-06-14T20:02:31Z","active":false},{"session_id":"20260614-105117-4091625","latest_iteration":24,"latest_task":"ZZ.13.SANDBOX.4.APPLY","last_modified":"2026-06-14T12:50:03Z","active":false},{"session_id":"20260614-122723-3876528","latest_iteration":1,"latest_task":"RALPH.SCHED.S1","last_modified":"2026-06-14T10:47:17Z","active":false},{"session_id":"20260614-093340-4115127","latest_iteration":63,"latest_task":"RALPH.FLEET.ONBOARD.5.TAILSCALE-SSH","last_modified":"2026-06-14T10:31:21Z","active":false},{"session_id":"20260614-101443-2431084","latest_iteration":6,"latest_task":"BOARD.SSOT-CUTOVER","last_modified":"2026-06-14T09:28:14Z","active":false},{"session_id":"20260614-011818-3516953","latest_iteration":152,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-14T08:21:10Z","active":false},{"session_id":"20260614-082620-1921653","latest_iteration":6,"latest_task":"RALPH.PLATFORM.S3.DISCOVERY.GRANT-KC-ORGID","last_modified":"2026-06-14T07:44:46Z","active":false},{"session_id":"20260614-082554-3458735","latest_iteration":2,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-14T06:28:12Z","active":false},{"session_id":"20260614-053818-1292503","latest_iteration":13,"latest_task":"RALPH.PLATFORM.S3.DISCOVERY.B.e","last_modified":"2026-06-14T06:24:44Z","active":false},{"session_id":"20260614-060816-2473916","latest_iteration":2,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-14T04:30:42Z","active":false},{"session_id":"20260613-234434-3253351","latest_iteration":22,"latest_task":"BOARD.SSOT-CUTOVER","last_modified":"2026-06-14T03:11:55Z","active":false},{"session_id":"20260614-000800-3843575","latest_iteration":18,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-14T02:44:49Z","active":false},{"session_id":"20260613-212659-2586969","latest_iteration":17,"latest_task":"RG.SEED.PASSCARDS-IDEMPOTENT","last_modified":"2026-06-13T21:40:35Z","active":false},{"session_id":"20260613-205404-2050936","latest_iteration":9,"latest_task":"CI-FIX","last_modified":"2026-06-13T19:17:24Z","active":false},{"session_id":"20260613-194943-2223731","latest_iteration":5,"latest_task":"RG.FLEET.STALE-APPLIED-UNMANAGED","last_modified":"2026-06-13T19:13:17Z","active":false},{"session_id":"20260613-210329-2210009","latest_iteration":2,"latest_task":"HILLS.SIM.LAB.META","last_modified":"2026-06-13T19:05:36Z","active":false},{"session_id":"20260613-153727-1707962","latest_iteration":26,"latest_task":"RALPH4.BC-POLICY.S3.m","last_modified":"2026-06-13T17:41:12Z","active":false},{"session_id":"20260613-183051-1531001","latest_iteration":2,"latest_task":"RG.PLAN.STALE-PARENT-ALL-CHILDREN-DONE","last_modified":"2026-06-13T16:50:01Z","active":false},{"session_id":"20260612-091453-3664842","latest_iteration":110,"latest_task":"UI.PERF.BOARD-RUNTIME-TAILWIND","last_modified":"2026-06-13T16:36:33Z","active":false},{"session_id":"20260612-151629-570037","latest_iteration":50,"latest_task":"RALPH4.BC-POLICY.S5.B.ENGINE-CORE.UPCASTER-SPLIT","last_modified":"2026-06-13T02:39:30Z","active":false},{"session_id":"20260612-204212-4074432","latest_iteration":9,"latest_task":"RG.BLOCK-ON.DB-MODE","last_modified":"2026-06-12T21:35:49Z","active":false},{"session_id":"20260612-202254-3946771","latest_iteration":5,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T18:35:59Z","active":false},{"session_id":"20260612-200326-3819298","latest_iteration":5,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T18:16:35Z","active":false},{"session_id":"20260612-194419-3692547","latest_iteration":5,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T17:57:21Z","active":false},{"session_id":"20260612-191258-3522070","latest_iteration":7,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T17:36:11Z","active":false},{"session_id":"20260612-145922-2635724","latest_iteration":14,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T17:07:19Z","active":false},{"session_id":"20260612-144751-2489465","latest_iteration":0,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T12:51:32Z","active":false},{"session_id":"20260612-143940-2376553","latest_iteration":0,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T12:43:15Z","active":false},{"session_id":"20260612-142351-2252647","latest_iteration":0,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T12:24:57Z","active":false},{"session_id":"20260612-141142-2154276","latest_iteration":0,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T12:12:46Z","active":false},{"session_id":"20260612-134837-1959312","latest_iteration":0,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T11:51:01Z","active":false},{"session_id":"20260612-091455-1443759","latest_iteration":16,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T11:10:23Z","active":false},{"session_id":"20260612-112743-3214496","latest_iteration":4,"latest_task":"BOARD2.BACKFILL.SINGLETON","last_modified":"2026-06-12T09:45:27Z","active":false},{"session_id":"20260612-091458-1562503","latest_iteration":1,"latest_task":"BOARD2.BACKFILL.SINGLETON","last_modified":"2026-06-12T07:34:03Z","active":false},{"session_id":"20260612-085951-3574057","latest_iteration":4,"latest_task":"DEPLOY-GATE","last_modified":"2026-06-12T07:12:14Z","active":false}],"plan":{"total_tasks":2322,"done_tasks":1660,"open_tasks":662,"next_task":"RALPH.CP.C2C-EXCLUSIVE.CUTOVER","tiers":[{"title":"HIGHEST","done":191,"open":37},{"title":"HIGH","done":314,"open":67},{"title":"NORMAL","done":897,"open":438},{"title":"NICE-TO-HAVE","done":257,"open":107},{"title":"SLEEPER","done":1,"open":13}],"prefix_counts":{"ACL":2,"AGENT-BRIDGE":20,"API":1,"ARCH":21,"AUDIT":29,"AUTH":17,"AUTO":68,"AUTONOMY":7,"BACKTEST":3,"BC":173,"BOARD":22,"BOARD2":6,"BOOKING":10,"BUG":68,"CI":46,"CI-FIX":1,"CLAIMS":36,"CLEAN":6,"CLEANUP":1,"CMD":1,"CREDS":1,"CUTOVER":4,"D-AUDIT":69,"D07":2,"D13":1,"D17":4,"DANGLING":1,"DEMO":49,"DEPLOY":13,"DEV":10,"DEVEX":3,"DOC":3,"DRIFT":14,"DS":2,"ENFORCEMENT":1,"ENG":12,"ENGINE":4,"ENTITYTYPES":4,"ENV":2,"ES":2,"EXT":2,"FACTQUERY":1,"FB":2,"FB-1f3f":1,"FE":2,"FEEDBACK":1,"FIX":14,"FLEET":4,"FLOW":2,"GAP":40,"GH":37,"GUSTAF":7,"HARDEN":1,"HILLS":9,"IMPERSONATE":1,"INBOX":11,"INFRA":6,"INVESTIGATE":6,"MIGRATE":2,"MOD":8,"NLP":1,"OBS":32,"ONBOARD":125,"OPS":46,"ORCHESTRATOR":1,"ORG":2,"ORGCHART":3,"PASS":4,"PGADMIN":2,"PHANTOM-GUARD":1,"PLAN":1,"PLATFORM":27,"PLAYER":4,"PLAYER-BC":2,"POOL":8,"PR":3,"PROBE":1,"PROBE2":1,"PROVISION":4,"PROXMOX":13,"RALPH":355,"RALPH4":60,"RALPHD":2,"REBALANCE":1,"REBASE":1,"RENAME":1,"RESEARCH":6,"RG":517,"RULE":1,"RULES":2,"S":26,"SBX":5,"SEC":2,"SEED":7,"SIM":12,"SLICEIFY":2,"SNAPSHOT":1,"SPEC":2,"STEER":16,"TENANT":5,"TEST":6,"TOK":1,"TOKENS":4,"TRACK":7,"TREE":4,"UI":11,"UNBLOCK":1,"UX":3,"UXCOPY":1,"UXFLOW":6,"VERIFY":4,"VERIFY-FLOW":1,"WIP":1,"ZZ":71}},"recent_iterations":[{"session_id":"20260620-152947-3837097","iteration":50,"task":"RG.PICKER.EXCLUSIVE-FOCUS-PIN","timestamp":"2026-06-20T23:07:25Z","duration_s":804,"commit_sha":"12e1b36","cost_usd":5.434881,"diff_added":1,"diff_removed":1,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-50-RG.PICKER.EXCLUSIVE-FOCUS-PIN/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":46,"task":"RALPH.PICKER.STEER-CROSS-INSTANCE-LIVELOCK","timestamp":"2026-06-20T22:49:44Z","duration_s":993,"commit_sha":"ca85f76","cost_usd":11.06067,"diff_added":205,"diff_removed":0,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-46-RALPH.PICKER.STEER-CROSS-INSTANCE-LIVELOCK/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":44,"task":"RALPH.BACKLOG.ES.ASSIGN-DURABLE","timestamp":"2026-06-20T22:30:33Z","duration_s":1098,"commit_sha":"7484fc0","cost_usd":10.145181,"diff_added":186,"diff_removed":13,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-44-RALPH.BACKLOG.ES.ASSIGN-DURABLE/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":42,"task":"RALPH.BACKLOG.ES.ASSIGN-DURABLE","timestamp":"2026-06-20T22:09:50Z","duration_s":1010,"commit_sha":"ce422f1","cost_usd":9.253565,"diff_added":2,"diff_removed":2,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-42-RALPH.BACKLOG.ES.ASSIGN-DURABLE/raw.jsonl"},{"session_id":"20260620-224831-399799","iteration":6,"task":"DEPLOY-GATE","timestamp":"2026-06-20T22:07:02Z","duration_s":217,"commit_sha":"b7a1db0","cost_usd":2.857054,"diff_added":0,"diff_removed":0,"ci_status":"","log_file":"sessions/20260620-224831-399799/iteration-06-FLOW.FAIL.ONBOARD-NEAT-S5.2026-06-15/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":41,"task":"RALPH.BACKLOG.ES.ASSIGN-DURABLE","timestamp":"2026-06-20T21:51:44Z","duration_s":1421,"commit_sha":"cda21bd","cost_usd":6.622684,"diff_added":10,"diff_removed":1,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-41-RALPH.BACKLOG.ES.ASSIGN-DURABLE/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":39,"task":"BOARD2.FLIP.VERIFY-SANDBOX","timestamp":"2026-06-20T21:25:08Z","duration_s":868,"commit_sha":"cd5c874","cost_usd":6.189682,"diff_added":11,"diff_removed":10,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-39-BOARD2.FLIP.VERIFY-SANDBOX/raw.jsonl"},{"session_id":"20260620-224810-2064905","iteration":1,"task":"RALPH.BACKLOG.ES.DRIVE.B.B.B","timestamp":"2026-06-20T21:07:26Z","duration_s":1083,"commit_sha":"0de01ea","cost_usd":7.921442,"diff_added":19483,"diff_removed":1050,"ci_status":"passed","log_file":"sessions/20260620-224810-2064905/iteration-01-RALPH.BACKLOG.ES.DRIVE.B.B.B/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":35,"task":"FLOW.FAIL.ADMIN-PASS-LIST-RENDERS.2026-06-18","timestamp":"2026-06-20T21:06:25Z","duration_s":882,"commit_sha":"1a9c149","cost_usd":11.611409,"diff_added":20,"diff_removed":26,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-35-FLOW.FAIL.ADMIN-PASS-LIST-RENDERS.2026-06-18/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":34,"task":"ZZ.10","timestamp":"2026-06-20T20:50:27Z","duration_s":838,"commit_sha":"99a4cf3","cost_usd":8.906047,"diff_added":108,"diff_removed":10,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-34-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":33,"task":"CI-FIX","timestamp":"2026-06-20T20:35:20Z","duration_s":844,"commit_sha":"31d8d94","cost_usd":7.170829,"diff_added":37,"diff_removed":17,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-33-CI-FIX/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":32,"task":"ZZ.10","timestamp":"2026-06-20T20:20:14Z","duration_s":920,"commit_sha":"75f3805","cost_usd":11.205179,"diff_added":15,"diff_removed":15,"ci_status":"failed","log_file":"sessions/20260620-152947-3837097/iteration-32-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":31,"task":"ZZ.10","timestamp":"2026-06-20T20:03:49Z","duration_s":841,"commit_sha":"a07be15","cost_usd":7.340629,"diff_added":14,"diff_removed":15,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-31-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":30,"task":"ZZ.10","timestamp":"2026-06-20T19:48:48Z","duration_s":954,"commit_sha":"ca94110","cost_usd":10.625306,"diff_added":16,"diff_removed":14,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-30-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":29,"task":"ZZ.10","timestamp":"2026-06-20T19:31:37Z","duration_s":887,"commit_sha":"8b286b9","cost_usd":8.388727,"diff_added":16,"diff_removed":8,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-29-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":28,"task":"ZZ.10","timestamp":"2026-06-20T19:15:23Z","duration_s":953,"commit_sha":"313011c","cost_usd":10.480337,"diff_added":43,"diff_removed":6,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-28-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":27,"task":"ZZ.10","timestamp":"2026-06-20T18:58:11Z","duration_s":750,"commit_sha":"43ab7d7","cost_usd":4.272857,"diff_added":42,"diff_removed":8,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-27-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":26,"task":"ZZ.10","timestamp":"2026-06-20T18:44:37Z","duration_s":921,"commit_sha":"fa5168d","cost_usd":7.898623,"diff_added":31,"diff_removed":6,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-26-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":25,"task":"ZZ.10","timestamp":"2026-06-20T18:28:14Z","duration_s":913,"commit_sha":"db10811","cost_usd":10.145936,"diff_added":45,"diff_removed":11,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-25-ZZ.10/raw.jsonl"},{"session_id":"20260620-152947-3837097","iteration":24,"task":"ZZ.10","timestamp":"2026-06-20T18:11:52Z","duration_s":891,"commit_sha":"c01f764","cost_usd":8.118311,"diff_added":46,"diff_removed":0,"ci_status":"passed","log_file":"sessions/20260620-152947-3837097/iteration-24-ZZ.10/raw.jsonl"}],"knowledge_top":"## CI Failure — 2026-06-11 11:16:58 (1da06e5)\n\n**Iteration:** 29\n**Branch:** ralph-2/dev\n**Error summary:**\n    🔧 Lint\tRun golangci-lint\t2026-06-11T09:08:22.2025442Z ##[error]bc/ralph/internal/http/board2_parity_handler.go:62:1: cyclomatic complexity 16 of func `(*RalphTasksHandler).HandleBoard2Parity` is high (\u003e 15) (gocyclo)\n    🔧 Lint\tRun golangci-lint\t2026-06-11T09:08:22.2456776Z ##[error]Process completed with exit code 1.\n\n**Log:** /home/ralph-agent/repo/ralph-logs/ci/1da06e5.log\n\n---\n## CI Failure — 2026-06-11 09:23:30 (0e54671)\n\n**Iteration:** 22\n**Branch:** ralph-2/dev\n**Error summary:**\n    🏛️ Fitness Tests\tRun fitness tests\t2026-06-11T07:15:16.1822351Z --- FAIL: TestProjectorIdempotency_NoRunningCounters (0.01s)\n    🏛️ Fitness Tests\tRun fitness tests\t2026-06-11T07:15:25.8074458Z FAIL\n    🏛️ Fitness Tests\tRun fitness tests\t2026-06-11T07:15:25.8127432Z FAIL\tgithub.com/sweetspot/academy/tests/fitness\t12.463s\n    🏛️ Fitness Tests\tRun fitness tests\t2026-06-11T07:15:25.8175840Z FAIL\n    🗄️ Schema Dump Check\tCheck fitness testdata fixtures are git-tracked (RG.292)\t2026-06-11T07:14:53.3197581Z FAIL: fitness test references a testdata/*.txt fixture that git does not track.\n\n**Log:** /home/ralph-agent/repo/ralph-logs/ci/0e54671.log\n\n---\n## CI Failure — 2026-06-11 06:25:25 (4a7666e)\n\n**Iteration:** 14\n**Branch:** ralph-2/dev\n**Error summary:**\n    🔧 Lint\tRun golangci-lint\t2026-06-11T04:18:04.7844327Z ##[error]cmd/academy-ralph-plan-backfill/main.go:85:1: The line is 172 characters long, which exceeds the maximum of 160 characters. (lll)\n    🔧 Lint\tRun golangci-lint\t2026-06-11T04:18:04.7851094Z ##[error]cmd/academy-ralph-plan-backfill/main.go:86:1: The line is 163 characters long, which exceeds the maximum of 160 characters. (lll)\n    🔧 Lint\tRun golangci-lint\t2026-06-11T04:18:04.8226842Z ##[error]Process completed with exit code 1.\n\n**Log:** /home/ralph-agent/repo/ralph-logs/ci/4a7666e.log\n\n---\n## CI Failure — 2026-06-11 02:52:33 (af2cfd6)\n\n**Iteration:** 2\n","crashes_tail":"","last_iteration_md":"","prompts":{"build (ralph-2)":"We are building a predicate-based rules engine grounded in many-sorted first-order logic, a golf domain consumer, and an HTMX admin frontend. Read specs/README.md for the full spec index.\n\n## THREE ENVIRONMENTS — full reference in `.agent_instructions/environments.md` (always loaded at step 0e)\n\n| | **Local dev** | **GitHub CI** | **Sandbox (EC2)** |\n|---|---|---|---|\n| App URL | `http://localhost:8085` | `http://localhost:8080` (inside runner) | `https://academy.sweetspot-labs.io` |\n| Metrics backend | **Prometheus** (`platform/prometheus-config.yaml`) | None | **Mimir** — no Prometheus |\n| Grafana | `http://localhost:3002` (anonymous) | None | `https://academy.sweetspot-labs.io/grafana` (Keycloak SSO) |\n| OTel config | `platform/otel-collector-*.yaml` | None | `deploy/sandbox/otel/agent.yaml` |\n| Job labels | `job=\"node-exporter\"` | — | `job=\"academy/node-exporter\"` |\n| Access | Direct | GitHub runner | AWS SSM (`--profile ralph-agent`) |\n\n**Which env for which task:**\n- `internal/`, `web/`, `migrations/` → local dev + CI gate\n- `platform/grafana/`, `deploy/sandbox/otel/`, dashboards → **sandbox** after deploy (local ≠ sandbox)\n- `.github/workflows/` → GitHub CI\n- `infra/` → sandbox via PR (never `pulumi up` locally)\n\n**Active model:** default alias is `opus` (Claude Opus 4.7 / model ID `claude-opus-4-7`). S.175 pins the exact dated snapshot that Anthropic resolves on the first iteration and passes `--model \u003cdated-id\u003e` to every subsequent iteration in the session. If Anthropic rotates the alias mid-session, the loop aborts with \"upstream model snapshot changed — restart session\". The pinned dated ID is recorded in `ralph_metrics.model_dated` (Grafana: \"Active model snapshot\" stat panel).\n\n**OUTPUT RULES — every output token costs 5x an input token:**\n- Do NOT narrate (\"Let me search for...\", \"Now I'll implement...\", \"I'll start by...\")\n- Do NOT explain what you're about to do — just do it\n- Do NOT summarize what you just did — the diff speaks for itself\n- Do NOT repeat file contents back after reading them\n- Do NOT write long commit messages — one line: `type(scope): summary`\n- Keep tool call descriptions under 10 words\n- When running tests, do NOT quote the full output — just state pass/fail and errors\n- Your goal: maximize code written per output token. Talk less, code more.\n\n**PARALLEL TOOL CALLS — each sequential turn costs ~50K cached tokens:**\nALWAYS batch independent tool calls in ONE response. Never sequential reads/greps/edits when parallel works.\n- Read/grep/edit 3+ independent targets? ONE turn with parallel calls.\n- Build + vet + lint + test? ONE chained command: `ralph-build.sh \u0026\u0026 ralph-vet.sh \u0026\u0026 ralph-lint.sh \u0026\u0026 ralph-test.sh`\n- Exploring? Spawn 3+ parallel searches (grep types + grep functions + glob files) in ONE turn.\n- EXCEPTION: if call B depends on the RESULT of call A, those MUST be sequential.\n\n**FIRST TURN HARD LIMIT — your first response MUST use ≥3 parallel tool calls.**\nIf your first turn has only 1 Read/Bash, you are doing it wrong. The task lookup (0a) is\na single Bash call, but the NEXT turn MUST batch ≥3 parallel reads (spec sections,\nDOMAIN_MODEL.md, example files). Median 47 turns costs ~$4.70/iteration — every turn saved\nis $0.10. Batching 3 sequential reads into 1 parallel call saves $0.20 per iteration.\n\n**ANTI-PATTERN — sequential reads across turns (wastes 2 turns = $0.20):**\n```\nTurn 1: Read(specs/domain/DOMAIN_MODEL.md)           ← WRONG\nTurn 2: Read(specs/05-event-model-mapping.md)         ← WRONG\nTurn 3: Read(internal/slices/create_customer/command.go) ← WRONG\n```\n\n**CORRECT — parallel reads in one turn (saves 2 turns):**\n```\nTurn 1: Read(specs/domain/DOMAIN_MODEL.md)            ← ALL THREE\n      + Read(specs/05-event-model-mapping.md)          ← IN ONE\n      + Read(internal/slices/create_customer/command.go) ← RESPONSE\n```\n\nSame applies to exploration: batch Grep + Glob + Read in ONE turn, not across 3 turns.\n\n0. **INJECTED CONTEXT — already in your system prompt, do NOT re-read these files:**\n   - `CODEBASE.md` — slim summary (aggregate list, slice count, conventions)\n   - `.agent_instructions/codebase-skeleton.md` — always-on symbol map (TOK.3.a):\n     aggregate→events→projectors, command→slice, slice→view-tables, HTTP route→handler.\n     ~5k tokens, auto-generated from source by `make codebase-map`. Consult it BEFORE\n     grepping — most \"who handles event X?\" / \"where's command Y?\" / \"which table does\n     slice Z write?\" questions answer in the skeleton without any tool call.\n   - `.agent_instructions/recipes.md` — step-by-step playbooks for common task types\n   Use CODEBASE.md + codebase-skeleton.md to check if a slice/aggregate already exists and\n   who handles what. Use recipes.md for the exact file structure and patterns. Only Read\n   the specific EXAMPLE file referenced in the recipe (e.g., `create_customer/command.go`),\n   not the whole codebase.\n\n   BATCH STEPS 0a-0c: after finding the task (0a), read spec sections + DOMAIN_MODEL.md\n   ALL IN ONE TURN with parallel Read calls. Do NOT read them one at a time across turns.\n\n0-DEPLOY. **Deploy sentinel — check before every task:**\n   ```bash\n   test -f .deploy-now \u0026\u0026 echo \"DEPLOY NOW: $(cat .deploy-now)\"\n   ```\n   If `.deploy-now` exists, deploy BEFORE doing any task:\n   0. **REUSE BEFORE SNAPSHOT (no duplicates).** First check whether a snapshot PR\n      is already open — only ONE may ever be in flight:\n      ```bash\n      gh pr list --state open --json number,headRefName,createdAt -q '[.[] | select(.headRefName | startswith(\"snapshot/\"))] | (sort_by(.createdAt) | last // {}) | .number // empty'\n      ```\n      Also check `.deploy-now-pr` (a PR number persisted by a prior timeout). If\n      either yields an open PR number N, **resume polling N (step 3) — do NOT run\n      `make snapshot`.** Only create a fresh snapshot when no open snapshot PR exists.\n      Creating a second PR while one is open stacked #339/#340/#341 on 2026-05-24.\n   1. `make snapshot` — creates PR from HEAD, opens PR to main (ONLY if step 0 found none)\n   2. Extract PR number from output (look for `PR created: .../pull/N`)\n   2.5. **AUTO-RESOLVE A DIRTY PR (RG.DEPLOY-GATE.AUTORESOLVE-DIRTY).** Before polling,\n      un-wedge the PR if GitHub marks it conflicting — a `mergeStateStatus=DIRTY`\n      (`mergeable=CONFLICTING`) snapshot PR NEVER triggers `pull_request` CI, so\n      `statusCheckRollup` stays EMPTY (not failing) and step 3 would idle forever:\n      ```bash\n      scripts/ralph-resolve-dirty-snapshot-pr.sh N\n      ```\n      The script is a no-op when the PR is clean (safe to always run). When the PR\n      is dirty it merges `origin/main` into the snapshot branch with `-X ours` (the\n      snapshot side is authoritative — its `IMPLEMENTATION_PLAN.md` is strictly\n      newer) in a throwaway worktree and pushes, flipping the PR MERGEABLE so CI\n      starts. It uses a worktree, NOT an in-place checkout, to dodge the root-owned\n      untracked `prometheus/` dir that breaks branch-switch on the agent host. If it\n      reports a non-IMPLEMENTATION_PLAN conflict it cannot resolve, fall back to the\n      heavy hammer `scripts/ralph-redeploy-conflicting.sh N` (close + re-`--drain`).\n   3. Poll until CI completes: `gh pr view N --json statusCheckRollup -q '.statusCheckRollup[] | select(.name != null) | [.name, (.conclusion//.status)] | @tsv'` (the `select(.name != null)` drops GitHub's phantom null trailing element that otherwise emits a bare-tab line)\n   4. If all checks SUCCESS/SKIPPED: `gh pr merge N --squash` then `make post-snapshot`\n   5. If any check FAILED: fix root cause first, then retry snapshot\n   6. `rm .deploy-now .deploy-now-pr` — removes sentinels so they don't fire again\n   7. Continue to the normal task below (don't stop after deploy)\n\n   **WARNING — task work + post-snapshot:** `make post-snapshot` runs `git reset --hard origin/main` which silently destroys any uncommitted tracked edits outside `ralph-logs/`. If a deploy fires mid-iteration while you have unstaged production-code edits, those edits will vanish (lost iter 33's full ENG.RRULE.TEE-SHEET.PASS-ELIGIBILITY patch on 2026-05-22). **Commit your task work BEFORE running post-snapshot**, even if it's WIP. Since RG.SNAPSHOT-GUARD landed, the script now aborts on dirty production paths and tells you to commit/stash — but treat that abort as a self-inflicted speed bump, not a discovery: front-load the commit. Override is `RALPH_FORCE_DISCARD=1`; almost never the right call.\n\n0a. Find the NEXT task. Run: `./scripts/ralph-next-task.sh`\n    It outputs `LINE:TASK_ID` (e.g. `3493:X.26`). It respects the NEXT: focus line and skips BLOCKED tasks.\n    Fallback ordering is `(priority_rank, line_number)` — `[HIGHEST PRIORITY]` (rank 0) wins over\n    `[HIGH]` (1) wins over `[NORMAL]`/unmarked (2) wins over `[NICE-TO-HAVE]` (3). Use\n    `[HIGHEST PRIORITY]` on the task header (e.g. `**ID** [RALPH] [HIGHEST PRIORITY] …`) to jump\n    a task to the front of the queue regardless of where it sits in the file.\n    Do NOT use raw `grep` on IMPLEMENTATION_PLAN.md — output gets mangled by compression tools.\n    Then use the Read tool to read ONLY the 10 lines around that line number to get the task description and verify step.\n    **CACHE RULE: Do NOT edit IMPLEMENTATION_PLAN.md until step 4b (the final commit turn).**\n    Editing it mid-iteration changes the file on disk, which invalidates the prompt cache for\n    all subsequent turns — every turn after the edit pays full input cost instead of cache cost.\n    This applies to ALL prompt-adjacent files: IMPLEMENTATION_PLAN.md, CLAUDE.md, KNOWLEDGE.md.\n\n    **AUTO.DEPLOY co-adence note.** If the plan contains an open `AUTO.DEPLOY.*` task AND\n    `ralph-next-task.sh` picked a different task, that is expected: the picker is\n    `(priority_rank, line_number)` ordered and AUTO.DEPLOY tasks are injected at a\n    specific position. Do NOT swap to the deploy task on your own — note the situation\n    in your scratchpad (step 4a.5) and proceed with the picked task. The deploy fires\n    automatically when the picker reaches it (after S.PICKER.PRIORITY-AWARE lands) or\n    when an explicit `AUTO.DEPLOY.NOW` is injected at the top. Manually re-prioritizing\n    skips priority rank checks and double-commits a deploy that the picker would have\n    handled cleanly one iteration later.\n0b. Study the relevant spec sections for that task (referenced in the plan).\n0c. Read `specs/domain/DOMAIN_MODEL.md` — the canonical domain model reference.\n    ALL domain work must align with this document. If your task contradicts it, flag the conflict.\n0c.5. Read `specs/adr/INDEX.md` — one-line-per-ADR decision index (auto-generated).\n    Cheap to load (≤3 KB). Citing an existing ADR (e.g. \"per ADR-0011\") is faster than\n    re-litigating the decision. If your task touches a topic with an ADR, open the\n    referenced file and align with it. If your task *contradicts* an existing ADR,\n    STOP and surface the conflict — do not silently override.\n0d. Match the task to a recipe in `.agent_instructions/recipes.md` (already in your system prompt).\n    If a recipe matches, follow it exactly — Read only the referenced example file, then implement.\n    If no recipe matches, explore the codebase: BATCH 3+ parallel tool calls (grep + glob) in ONE turn.\n    Check CODEBASE.md (in your system prompt) to know which packages to search.\n0d-UNCOMMITTED. **[UNCOMMITTED] block** — when present in the prompt prelude\n   (advisory, RG.74.bis), a prior iteration left uncommitted/untracked files on\n   disk in this task's slice dir(s). The block is the `git status --short` output\n   for each `internal/slices/\u003cname\u003e` referenced by the task. REVIEW and REUSE that\n   on-disk work — read the existing files before re-running `make new-slice` or\n   re-writing them from scratch. Iter 41 left 7 slice files untracked after a\n   zero-diff run; iter 42 burned 4 turns rediscovering them. If the files are\n   correct, just commit them; if stale, reconcile before proceeding.\n\n0e-EVAL. **[PREVIOUS EVALUATOR REJECTED] block** — when present in the prompt\n   prelude (injected above the task context, mirrors the `[SCRATCHPAD]` and\n   `[LAST ITERATION]` block style), the previous iteration's commit was\n   flagged `mismatch` by the post-iteration evaluator. Format:\n\n   ```\n   [PREVIOUS EVALUATOR REJECTED]\n   The evaluator flagged the previous iteration as a mismatch.\n     Task:      \u003cprior task_id\u003e\n     Iteration: \u003cprior iteration #\u003e\n     Commit:    \u003cprior commit sha\u003e\n     Reason:    \u003cevaluator_reason — verbatim, may contain commas\u003e\n\n   The picker demoted \u003cprior task_id\u003e below all other unblocked tasks for\n   this round so a different task can run first. If \u003cprior task_id\u003e is\n   re-picked anyway (because no other unblocked tasks exist), address the\n   reason above in THIS iteration instead of re-shipping a diff with the\n   same shortcoming.\n   ```\n\n   Behavior: the task picker (`scripts/ralph-next-task.sh`,\n   S.RETRO.20260521.EVALUATOR-MISMATCH-GUARD) reads the last metrics row for\n   the current session; if `evaluator_verdict=mismatch` it demotes the\n   rejected task to priority rank 9 (below `[NICE-TO-HAVE]`) so any other\n   unblocked task wins the round. If the rejected task is the only\n   pickable candidate it still gets picked — demotion ≠ block. The block\n   is one-shot: once any iteration writes a new metrics row, the\n   last-row check stops seeing the mismatch verdict and the block stops\n   appearing.\n\n   What you must do when you see this block:\n   - If the picker handed you a DIFFERENT task: note the prior rejection in\n     your scratchpad and continue with your assigned task.\n   - If the picker handed you the SAME task (only-candidate fallback):\n     address the `Reason:` line directly. Do NOT re-ship the same shape of\n     diff — the evaluator already flagged it.\n\n0e-STEER. **[STEER INTERRUPT] block** — injected mid-iteration, NOT in the prelude.\n   Unlike the prelude blocks above, this one can appear at ANY turn, returned by the\n   `check-steer-interrupt.sh` PreToolUse hook the instant the operator drops a hard-stop\n   steer file (`inbox/STEER.HARD.*.md` or `inbox/*hard-steer*.md`) while you are\n   mid-iteration. It surfaces as a blocked tool call whose reason is:\n\n   ```\n   [STEER INTERRUPT] The operator dropped a hard-stop steer mid-iteration:\n     inbox/\u003cfile\u003e.md\n\n   --- steer contents (first 500 chars) ---\n   \u003cverbatim steer text\u003e\n   --- end steer ---\n\n   ACT NOW, do not finish the current task first:\n     1. Commit your in-flight work with a \"(wip - interrupted by operator steer at iter N)\"\n        annotation and leave its checkbox [ ].\n     2. Then carry out the steer above as your next action.\n   ```\n\n   Why it exists: `ralph-inbox-fold.sh` folds new inbox files only BETWEEN iterations,\n   so a steer dropped mid-flight was invisible for ~18-25 min (session 20260521-083736 ran\n   ~22 min on the wrong task after a 11:36 hard-steer). The hook closes that gap to one turn.\n\n   What you must do when you see this block:\n   - STOP the current task immediately. Do not argue with the block or retry the same tool\n     call hoping it clears — it fires once per steer file and will not re-block.\n   - Commit whatever you have with the `(wip - interrupted by operator steer at iter N)`\n     suffix; leave the in-flight task's checkbox `[ ]`.\n   - Carry out the steer's instruction as your next action. If the steer says \"stop\", stop.\n\n0e. SKILL ROUTING — two modes: INVOKE (run a skill) or LOAD (read an agent instruction for context).\n\n    **MODE 1 — INVOKE A SKILL** (task contains `[SKILL:name]` or `[SKILL:name args]`):\n    Use the Skill tool directly. Do NOT implement the task yourself — the skill IS the implementation.\n    ```\n    Skill(skill=\"name\", args=\"args\")\n    ```\n    After the skill completes, read its output to decide whether to mark the task `[x]` (all work done)\n    or leave it `[ ]` (more iterations needed — skill will say so). Re-queue logic lives in the skill.\n\n    **You may also invoke skills proactively** — without an explicit `[SKILL:]` tag — whenever a task\n    clearly maps to a named skill from the available-skills list in your system prompt. Use judgment:\n    if the task description is \"do X end-to-end\" and a skill named X exists, invoke it.\n    Skills are first-class tools. Use them freely.\n\n    **MODE 2 — LOAD CONTEXT** (task matches a tag/keyword — Read the agent instruction file):\n    Batch with other step-0 reads in the SAME parallel turn. Skip if no tag matches.\n\n    | Task tag or keyword | Agent instruction file to Read |\n    |---------------------|-------------------------------|\n    | ANY task (always) — Read ALL THREE in one parallel turn | `.agent_instructions/environments.md` + `.agent_instructions/sandbox-dev-env.md` + `.agent_instructions/pr-to-sandbox.md` |\n    | `[GRAFANA]`, \"dashboard\", \"panel\", \"metrics\", observability | `.agent_instructions/grafana-verify.md` AND `.agent_instructions/grafana-dashboard.md` |\n    | `[UX]`, `[UI]`, `[FRONTEND]`, \"htmx\", \"template\", \"page\" | `.agent_instructions/frontend-design.md` |\n    | `[E2E]`, `[BROWSER]`, \"hurl\", \"Chrome MCP\" | `.agent_instructions/e2e-verify.md` |\n    | `[RESEARCH]`, `[SPIKE]`, `[REFINE]` | `.agent_instructions/research-methodology.md` |\n    | `[INFRA-DECISION]` (load-bearing infra/security/release choice) | `specs/adr/TEMPLATE.md` — copy to `specs/adr/NNNN-slug.md` and fill in. Run `scripts/ralph-adr-update-index.sh` after writing. |\n    | `[INFRA]`, or task touches `infra/aws/*.go`, `Pulumi.sandbox.yaml`, SSM params, DNS records, cloud-init, security groups, ECR, S3 buckets | `.agent_instructions/infra-release.md` — MUST read. Never `pulumi up` locally, never `aws ssm put-parameter` to create resources — write Pulumi Go code and let CI apply. Preview before push. |\n    | `[CI-FIX]` (this iteration is a CI-FIX retry — `RALPH_CI_FIX_RETRIES \u003e 0`) | `.agent_instructions/ci-triage.md` — MUST read. Replaces \"look at the error and fix it\" with structured multi-cause classification + local-reproduce-before-commit. |\n    | `[OIDC]`, `[AUTH]`, \"login flow\", \"Keycloak browser\", \"session verify\" | `.agent_instructions/oidc-browser-verify.md` — 7-step Chrome MCP login flow with Keycloak form selectors. |\n    | `AUTH.*` task ID, or \"keycloak\", \"user_directory\", \"realm role\", \"user management\" | `.agent_instructions/recipes/keycloak-admin-api.md` — admin token acquisition, User CRUD, role assignment, invitation flow, `UserDirectory` port + `KeycloakAdminClient` adapter. |\n    | `oidc`, `keycloak_provider`, `feedback_bearer`, `oidc_login`, path under `internal/adapters/secondary/auth/`, OR an OIDC-shaped failure in `LAST_ITERATION.md` (302 to `/dev/impersonate/users`, \"OIDC login handler init failed\", `/dev/feedback` 503 \"auth service unavailable\") | grep `ralph-logs/KNOWLEDGE.md` for the `[auth] /org/* redirects to /dev/impersonate/users → OIDC handler init silently failed at boot` entry AND read the `RG.RECIPE.OIDC-SPLIT-URL` task body in `IMPLEMENTATION_PLAN.md`. Loads the split-URL root cause (`compose_admin.go` devLoginRedirect ← `cfg.LoginUserGET == nil` ← OIDC init WARN) + fix (`oidc.InsecureIssuerURLContext(ctx, cfg.ExternalURL)` wrap) up-front; closes diagnosis in \u003c2 turns instead of 8. |\n    | `ARCH.QB.VIEW.*` task ID, or \"migrate view slice to QueryBus\", \"extract ReadModel port\" | `.agent_instructions/recipes/arch-qb-view-migration.md` — 5-touch shape (query/handler/postgres_store/module/init + adapter swap), JSONB-in-adapter rule, and the 3 in-the-same-commit fitness updates (knownViolations removal, sliceMinimalStructureAllowlist, sliceFanOutExemptions). |\n    | \"booking\", \"reserve\", \"slot\", \"decider\", or path under `internal/dcb/`, `internal/slices/dcb_*/` | `.agent_instructions/booking-perf.md` |\n    | path under `internal/slices/`, \"command_handler\", \"projector\", \"view\", or new table in `academy.*` | `.agent_instructions/cqrs-posture.md` |\n\n    If multiple match (e.g., `[UI]` + `[E2E]`), Read both in ONE parallel turn.\n    Do NOT re-read agent instruction files on subsequent turns — one Read at the start is enough.\n\n1. Your task is to implement that ONE task. Implement first, test once at the end.\n   Search the codebase before writing new code — don't duplicate existing implementations.\n   If the task needs helper types or interfaces from other packages, create them.\n   Implement FULL functionality — no placeholders, no stubs, no TODOs, no \"// TODO: implement later\".\n\n   SEARCH DELEGATION — for any codebase investigation whose expected output spans\n   \u003e3 files or \u003e50 lines (e.g. \"find all callers of X\", \"which tests reference Y\",\n   \"show me every slice that imports Z\"), delegate to the `ralph-searcher` sub-agent\n   via the Task tool. The sub-agent runs on Haiku and its Grep/Read output stays in\n   its own context, keeping the main loop's output_tokens lean. Direct Grep/Read are\n   fine for ≤3-file spot checks; don't round-trip a single-file lookup through a\n   sub-agent. Context Efficiency / Avg Subagent Calls tracks adoption.\n\n   CREATING A NEW SLICE? Use the generator — do NOT hand-write boilerplate:\n   ```\n   make new-slice NAME=cancel_booking KIND=command AGGREGATE=booking\n   make new-slice NAME=view_wallet_balance KIND=view AGGREGATE=wallet\n   make new-slice NAME=expire_stale_bookings KIND=automation AGGREGATE=booking\n   make new-slice NAME=notify_booking_denied KIND=translation EVENT=BookingDenied\n   ```\n   Four slice types per Event Modeling (see `specs/domain/DOMAIN_MODEL.md` § Slice Types):\n   KIND=command (state change): init.go, command_handler.go — user action → events.\n   KIND=view (read model): init.go, query.go, query_handler.go, http_handler.go — events → projection → query.\n   KIND=automation: init.go, processor.go — todo-list view → processor → command (no saga).\n   KIND=translation: init.go, translator.go, translator_test.go — events → external system (email, payment).\n   Then edit the generated files to add domain-specific logic. Do NOT hand-write these files.\n   Optional flags: `EVENT=BookingDenied` (for translation), `ROUTE=\"/admin/things\"` (view HTTP handler).\n\n   DCB SLICES — use the decider pattern (per-command handlers, no standard CommandHandler type):\n   - DCB command slices (e.g., `dcb_slot_booking`) use per-command handlers like `ReserveSlotHandler`\n     instead of a single `CommandHandler`. The fitness test `TestEveryCommand_HasHandler` already\n     supports this via the `perCmdHandler` fallback (event_model_test.go:270-283).\n   - If the DCB slice has `command_handler.go`, add it to the auth check allowlist in\n     `tests/fitness/auth_check_test.go` (if it doesn't call `GetAuthenticationContextFromContext`).\n   - If the DCB view slice defines queries inline (no separate `query.go`), add it to\n     `noSeparateQueryFile` in `tests/fitness/cqrs_rules_test.go`.\n   - Fitness tests run in CI only (see DILIGENCE RULES below). After creating a DCB slice,\n     verify by reading the fitness-test allowlist and confirming your slice is listed — do\n     NOT run the fitness suite locally (it's slow and the gate is disabled by design).\n\n   IMPLEMENT-THEN-TEST — do NOT run tests mid-implementation:\n   ```\n   1. IMPLEMENT: Write ALL production code for the task. Get it compiling.\n      Do NOT run tests until the implementation is complete.\n   2. WRITE TESTS: Write tests for the task's expected behavior.\n      Capture WHY each test exists in a comment — future iterations have no prior context.\n      Name tests `Test{Behavior}_{ExpectedOutcome}` — the name explains WHY the test exists.\n   3. VERIFY: Run `./scripts/ralph-build.sh \u0026\u0026 ./scripts/ralph-vet.sh \u0026\u0026 ./scripts/ralph-lint.sh \u0026\u0026 ./scripts/ralph-test.sh` as ONE command (single turn).\n      `ralph-vet.sh` catches test-only compile errors that `ralph-build.sh` misses (it skips `_test.go` files).\n      Do NOT split build/vet/lint/test into separate turns. Do NOT run `make pre-commit` — it is slow and redundant.\n   3b. INTEGRATION SMOKE: If your diff touches `internal/slices/\u003cX\u003e/` and that slice has\n       integration tests, run `make integration-fast SLICE=\u003cX\u003e` AFTER step 3 passes.\n       Skip if no tests are found (script prints \"NO INTEGRATION TESTS found\").\n       This catches SQL typos and cross-slice bugs locally in \u003c90s instead of waiting\n       10 min for CI. Do NOT run `make test-integration` (full suite, slow).\n   3c. FITNESS MICROSET: `./scripts/ralph-fitness-microset.sh` — fast fitness checks: file line\n       cap (500), function line cap (80), dead code allowlist. Run this after any commit that adds\n       or modifies Go files or web/templates. If it fails, fix before the iteration ends.\n       DO NOT run the full fitness suite (`ralph-fitness.sh`) locally — CI-only by design.\n   4. FIX: If tests fail, fix and re-run. But do NOT loop more than 3 times — see HARD LIMITS.\n   ```\n\n   Each test run costs ~2 turns (run + read output). 5 test runs mid-implementation = 10 wasted turns = 500K cached tokens.\n   One test run at the end = 2 turns. The math is clear: implement first, test once.\n\n   TURN-BUDGET GUARD — if you have used 80+ turns, something is wrong:\n   - At 80 turns: STOP exploring. Commit what you have, even if incomplete.\n     Leave the parent `[ ]` and emit the next `[ ]` sub-atom (`\u003cparent-id\u003e.\u003catom\u003e`)\n     for the remaining work. Do NOT mark the parent `[-]` UNLESS you also leave a\n     `[ ]` descendant sub-task: a `[-]` parent with no open `[ ]` sub-task is frozen\n     forever (both pickers treat `[-]` as never-selectable), which stalled\n     S2/S3/S5.B. Verify with `scripts/ralph-next-task.sh --lint`.\n   - At 100 turns: You are stalling. Commit immediately. Do NOT start new files.\n   - This session had iterations at 1124, 2686, and 4741 turns — all were stalls\n     that produced work achievable in 40 turns. The cost of a stall ($15–35)\n     dwarfs the cost of a partial commit ($2).\n\n   DON'T: Run tests mid-implementation, mock for isolation, or test aggregate internals directly.\n   DO: Implement all code first, write tests, run once through MessageBus boundary.\n   Slice rules (types, independence, isolation) are in DOMAIN_MODEL.md (step 0c).\n   Testing rules are in testing.md (step 0e). Do NOT duplicate them here.\n\n   LONG-RUNNING COMMANDS — bench, replay, and load scripts can take \u003e5 minutes:\n   - DON'T run multi-minute bench scripts inline. Specifically: `scripts/japan-range-bench*.sh`,\n     `scripts/hills-*.sh` full runs, `scripts/ralph-bench.sh` against the full suite, or any\n     `make bench` / `make load-test` invocation without size flags. Iter 14 of session\n     20260517-202253 ran `japan-range-bench-growth.sh --concurrency=10 --cells=2` inline;\n     the bench ran 40+ minutes, the iteration ended while it was still running, the loop\n     blocked waiting for the background process, and the human had to Ctrl+C and restart.\n   - DO use the smoke-test variant for in-iteration verification: smallest concurrency\n     (`--concurrency=1`) and smallest cell count (`--cells=2`), or whatever the script's\n     `--help` advertises as the minimum. Example: `timeout 120s bash scripts/japan-range-bench.sh --cells=2 --concurrency=1`.\n   - DO wrap every long-running shell invocation in `timeout 300s \u003ccmd\u003e` when you must run\n     it inline. The `timeout` exit code (124) is recoverable; a hung iteration is not.\n   - DO offload genuine long runs (\u003e5 min) to a `[BG-POLL \u003csentinel\u003e]` follow-up task per\n     step 6c — launch the bench detached with `nohup`, drop a sentinel on completion, and\n     tag the aggregation task with the sentinel path. The picker skips the follow-up until\n     the sentinel appears, so polling iterations cost $0.\n\n   DILIGENCE RULES — violating any of these means the task is NOT done:\n   - Fitness/architecture tests (`tests/fitness/`) run in CI only. Do NOT run `ralph-fitness.sh`\n     locally and do NOT re-enable the fitness gate in loop.sh — it is disabled by design.\n     Being addressed in S.93 — until the fast path ships (S.93.1–S.93.4), fitness remains CI-only.\n   - FIX ROOT CAUSES, not symptoms. No `// nolint`, `|| true`, error suppression, or skip logic.\n   - Unrelated bugs: fix AND add a KNOWLEDGE.md entry.\n   - No `InMemory*` stores in non-test code — use Postgres-backed implementations.\n   - No `fmt.Printf`/`log.Printf` debug statements — use proper logger.\n   - **CHROME DEVTOOLS MCP IS NOT OPTIONAL** for any task tagged `[GRAFANA]`,\n     `[UI]`, `[UX]`, `[FRONTEND]`, `[E2E]`, `[BROWSER]`, `[FEEDBACK]`, or any task\n     whose verify step mentions a URL (dashboard panel, admin page, `/d/...` path).\n     curl proves the endpoint responded with 200; only a screenshot proves the\n     page RENDERS and the DATA appears. Minimum per task:\n       1. `mcp__chrome-devtools__navigate_page` to the target URL\n       2. `mcp__chrome-devtools__take_screenshot` saved under the iteration dir\n          (e.g. `ralph-logs/sessions/$SESSION_ID/iteration-$ITERATION-$TASK/\u003cname\u003e.png`)\n       3. `mcp__chrome-devtools__list_console_messages` — zero errors (or\n          explain why each one is pre-existing in the iteration commit)\n       4. For Grafana: visit with `?from=now-1h\u0026to=now`, confirm panels show\n          real data; if empty, generate traffic, wait 30s, re-screenshot\n     No screenshot artifact in the iteration dir = task NOT done; commit with\n     `(wip)` suffix and leave the checkbox `[ ]`. This applies even if you\n     verified via curl/CLI — the screenshot is the non-negotiable artifact\n     for surfaces humans can actually see.\n   MIGRATION GOLDEN PATH — ALWAYS create a file under `migrations/`, then `make migrate`.\n   NEVER apply migrations via `docker compose exec postgres psql \u003c migrations/*.sql` — it\n   bypasses goose version tracking and causes duplicate `goose_db_version` rows. If `make\n   migrate` fails (e.g., TLS cert issue), fix the underlying issue rather than bypassing goose.\n   Verify: after `make migrate`, `SELECT MAX(version_id) FROM goose_db_version;` matches your file.\n   MIGRATION SCHEMA-DUMP RULE — after creating or modifying ANY file in `migrations/`:\n   1. `docker compose build migrate \u0026\u0026 make migrate` (apply the migration)\n   2. `make schema-dump` (regenerate `schema/academy.sql` with updated migrations hash)\n   3. `git add migrations/\u003cnew-file\u003e.sql schema/academy.sql` (stage BOTH files)\n   CI runs `scripts/check-schema-dump.sh` which hashes ALL tracked files in `migrations/`\n   and compares against the hash in `schema/academy.sql` header. If the migration file is\n   untracked or `schema/academy.sql` is stale, CI fails with \"STALE: schema/academy.sql is\n   out of date\". `make pre-commit` does NOT run this check — it only fires in CI.\n   MIGRATION CONSTRAINT RULE — S.46.1 broke CI because test cleanup wasn't updated:\n   DON'T: Add a UNIQUE, CHECK, or EXCLUSION constraint without checking test fixtures.\n   DO: `grep -rn 'INSERT INTO academy.\u003ctable\u003e' tests/` → update cleanup/teardown in same commit.\n   Why: constraints make previously-valid test data invalid. Tests that seed rows without\n   cleaning up will fail with constraint violations, but only in CI (local may pass by luck).\n   - Before committing: `grep -rn 'TODO\\|FIXME\\|HACK\\|XXX' \u003cchanged files\u003e` — fix any found.\n   - After committing: `git diff HEAD~1` — would you approve this PR?\n   - Max 50 lines/function, 300 lines/file. Never swallow errors. Every public function needs a test.\n   - Read per-directory CLAUDE.md files for package-specific rules (engine, admin, bootstrap).\n   - SCOPE CHECK: if the task touches \u003e5 files, STOP and plan subtasks first. Commit each\n     subtask separately. A 900s timeout with no commit = wasted iteration.\n   UI SCOPE RULE — any task touching `web/templates/` or `htmx/` MUST touch ≤5 production\n   files. If exceeded, split by layer: (a) data/handler, (b) template/route,\n   (c) interactivity/verify. Each sub-task stays within the 5-file limit and commits\n   independently. RULES.UI.1 stalled at 6 AST types in one task; splitting into 4 atoms\n   produced 4 clean commits at 25–29 turns each.\n   WIDE TASK RULE — \"add X to all Y slices\" tasks are 2-3x more expensive than focused ones:\n   DON'T: Create a single task like \"add org-scoping to all view query handlers.\"\n   DO: Pre-split into one sub-task per slice: S.N.a (resources), S.N.b (events), etc.\n   Each sub-task reads 2 files and writes 2 files instead of 10+10. If you discover a wide\n   task during planning, split it BEFORE adding to the plan. Existing example: ISO.2–ISO.7.\n   PARTLY-DONE WIDE TASK — when you finish ONE slice of a wide task and more remain,\n   emit the next `[ ]` sub-atom (`\u003cparent-id\u003e.\u003catom\u003e`) for the remaining work instead\n   of just marking the parent `[-]`. A `[-]` parent with no open `[ ]` descendant\n   sub-task is FROZEN forever — both pickers treat `[-]` as never-selectable (this\n   markdown picker skips it; planparse maps `[-]`→done), so its prose-only remaining\n   work is never picked up again (this stalled S2/S3/S5.B.SCENARIO). Either keep the\n   parent `[ ]`, or mark it `[-]` AND leave a `[ ]` sub-atom. Guard:\n   `scripts/ralph-next-task.sh --lint` flags any `[-]` task with no open `[ ]` sub-task.\n\n   NO BACKWARDS COMPATIBILITY — pre-production, vertical slices, OCP. When a domain\n   event changes: change the event type/shape; grep affected slices (`rg 'OldEventName'\n   internal/`); delete + rescaffold with `make new-slice`; port logic against the new\n   event. No dual-write, no deprecation window, no historic-rows fitness tests, no\n   aliasing old→new event types in the registry. Event-store rows from before the\n   change do not need to replay — `make migrate` rebuilds projections from the current\n   event shape. If a projector needs both old + new shapes, you are doing it wrong —\n   rebuild the projector. Applies to domain events, command names, aggregate names,\n   view table names, and public slice APIs.\n\n   GRAFANA DASHBOARD TASKS — see `.agent_instructions/grafana-verify.md` (loaded via skill routing 0e).\n   Key rule: fix ONE panel at a time, validate with `scripts/validate-dashboards.py`, never rewrite from scratch.\n\n   BUG-FIX WORKFLOW — required for any task tagged `[BUG]` or any fix to existing behavior:\n   - FIRST write a test that fails because of the bug. The test name must describe the\n     boundary condition or invariant being violated (e.g., `TestBooking_AtWindowClose_Denied`).\n   - Stage the failing test locally (`git add -p` the test file).\n   - THEN apply the production fix and confirm the test now passes.\n   - Do not skip this step even if the fix looks obvious — the test proves the bug existed\n     and prevents regression. A fix without a failing test is indistinguishable from a guess.\n   - If you cannot reproduce the bug with a test, document WHY in the commit message\n     (e.g., \"race condition only under load\", \"requires external service state\").\n\n   UI BUG VERIFICATION — for `[UX]`/`[UI]` tasks or htmx/ changes:\n   See `.agent_instructions/frontend-design.md` (loaded via skill routing 0e).\n   Key rule: reproduce the ACTUAL USER FLOW (click, fill, submit), screenshot to iteration dir.\n\n   [VERIFY-SANDBOX] PRE-FLIGHT — before invoking the `verify-flow` skill (see S.273):\n   - The skill reads its flow definitions from `fixtures/verify-flow/flows.yaml`.\n     If the flow name your task references is NOT present there, the skill cannot run\n     regardless of the slice's wiring state. ALWAYS run\n     `grep -E \"^[[:space:]]*\u003cflow-name\u003e:\" fixtures/verify-flow/flows.yaml` FIRST.\n   - If the flow is missing, do NOT invoke the skill. Either (a) the slice's HTTP\n     route + handler doesn't exist yet (mark `[BLOCKED:DEPS \u003cmissing-task-id\u003e]`),\n     or (b) the flow YAML itself needs an entry (file an inbox task to add it and\n     mark this one `[BLOCKED:NO-FLOW]`). Session 20260526-144918 wasted iters 165 +\n     168 on TREE.MOVE.3.B precisely because this preflight was skipped.\n   - LOGIN-REACHABILITY: for a flow with `login != none`, also confirm the role can\n     actually REACH the flow's `url:` before paying the ~$1 + 6-turn Chrome-MCP\n     drive. Two cheap probes (the skill's Phase 0b.1 runs them automatically):\n     (1) one SSM `SELECT` for the grant the route requires —\n     `org_admins_view` for `/org/*` + `/admin/*`, `players_view.home_org_id` for\n     `/play/*` + `/org/tee-sheet` — keyed by the role's email from the CLAUDE.md\n     Dev Actors table; (2) one `fetch(url, {redirect:'manual'})` to catch an\n     unmounted route or an off-URL server-side redirect (a 3xx to the auth proxy is\n     EXPECTED and fine). If the grant row is missing or the route 404s/redirects\n     off-target, the skill fails fast with \"flow's login can't reach url on env\"\n     (`skip_reason: login-cannot-reach-url`) in ≤2 turns instead of driving the\n     browser. This is the DEMO.4.B iter-13 failure: `admin@test.com` lacked the\n     `OrgAdminGranted` seed on sandbox, so `/org/setup/structure` bounced to\n     `/dev/impersonate/users` and every browser step failed downstream.\n\n   BOUNDARY CASES — for any task touching ranges, intervals, dates, or numeric thresholds:\n   - Document interval inclusivity `[a,b)` or `[a,b]` at call sites.\n   - Test boundary values: at-start, at-end, zero-length, one-before, one-after.\n   - Test 0, 1, max, max+1, negative for numeric logic.\n   - Verify both sides of an interval exchange agree on open/closed.\n   - Silent parse/unmarshal failures are bugs — return errors, don't return false.\n   - Name tests explicitly: `Test{Thing}_AtWindowClose_Succeeds`.\n\n   REMOVAL TASKS — completeness checklist:\n   When the task is \"remove X\" / \"delete X\" / \"deprecate X\" / \"decommission X\", you MUST\n   audit and clean up ALL of these locations before marking the task done:\n\n   1. Code call sites: `grep -r '\u003cX\u003e' internal/ pkg/ cmd/`\n   2. Imports / go.mod: `grep '\u003cX\u003e' go.mod go.sum \u0026\u0026 go mod tidy`\n   3. Docker compose services: `grep -r '\u003cX\u003e' deploy/`\n   4. Env vars / secrets: `grep -r '\u003cX\u003e_' .env* deploy/ infra/`\n   5. CI/CD references: `grep -r '\u003cX\u003e' .github/`\n   6. Documentation: `grep -r '\u003cX\u003e' docs/ specs/ CLAUDE.md README.md`\n   7. Grafana dashboards: `grep -r '\u003cX\u003e' deploy/sandbox/grafana/`\n   8. Inbox / plan references: `grep -r '\u003cX\u003e' inbox/ archive/inbox/ IMPLEMENTATION_PLAN.md`\n\n   In the commit message, list which categories had matches and were cleaned. If a\n   category had no matches, omit it. If you intentionally left some references (e.g.\n   archive/ history), state why.\n\n## Definition of Done (by task type)\n\n   **Backend logic:** Tests pass + exercise via UI/API + screenshot Domain Observability\n   dashboard (`localhost:3002/d/domain-observability/`) AND Tempo Traces\n   (`localhost:3002/d/tempo-traces/`). Command must appear in both.\n\n   **HTTP handlers:** Tests + Hurl E2E + verify auth rejection (401) + screenshot HTTP RED\n   dashboard (`localhost:3002/d/http-red-method/`) AND Tempo Traces.\n\n   **Admin UI:** CI green + Chrome MCP screenshot + submit forms + verify data renders.\n   Check Domain Observability + Tempo Traces for triggered commands.\n\n   **Infrastructure / CI/CD:** CI green + document what was verified.\n   For CI changes: push, wait for `gh run list`, verify `conclusion == success`.\n\n   **Observability:** Verify via curl/CLI first, then Grafana screenshot showing real data.\n   Generate traffic if panels show \"No data\", wait 15-30s, re-check.\n\n   **Batch changes:** Verify each affected page/endpoint individually — not just one.\n\n   ## E2E flow verification (for domain-affecting changes)\n\n   See `.agent_instructions/e2e-verify.md` (loaded via skill routing 0e).\n   Key rule: Hurl scripts first, observability second, Chrome MCP last (1-2 screenshots only).\n   Skip for: engine logic, infrastructure, CI/CD, docs, unrelated admin pages.\n\n2. See CLAUDE.md for Docker commands, test scripts, port mapping, and Chrome MCP usage.\n   Avoid `make pre-commit` (slow) — use `ralph-build.sh \u0026\u0026 ralph-vet.sh \u0026\u0026 ralph-lint.sh \u0026\u0026 ralph-test.sh`.\n\n3. Verify in running app — MANDATORY, never skip:\n   - Screenshot via Chrome MCP. Save to iteration dir.\n   - API: verify endpoint responds (not 404). UI: screenshot with real data + proper CSS.\n   - Grafana: navigate to dashboard, set last 15 min, screenshot. Generate traffic if \"No data\".\n   - INTERACT LIKE A USER: click, fill forms, submit. If click fails → BUG, fix root cause.\n   - NOT DONE if: placeholder, 404, in-memory-only, unstyled, broken clicks, empty Grafana panels.\n\n4. CHECKPOINT COMMIT — commit early and often, not just at the end.\n   After tests pass, commit IMMEDIATELY. Do not do more work after tests pass.\n   If you have been working for 50+ turns, commit what you have NOW even if not fully done.\n   An incomplete commit is better than losing all work to a context overflow.\n\n   **FB task — stuck guard:** If this is a `FB.*` task and you have used \u003e30 turns without writing any code yet, STOP reading and commit a `(wip)` note in `LAST_ITERATION.md` that lists: (a) the files you explored, (b) the concrete blocker (e.g., \"prerequisite command slice not yet wired\", \"ambiguous task description\"). This gives the next iteration a head start instead of repeating the same reads.\n\n   **Tidy First** (see step 0f): never mix refactoring + feature in one commit.\n   If both needed: `refactor:` commit first, then `feat:` commit.\n\n   **4a. BEFORE committing**, batch: write LAST_ITERATION.md + mark task done + run ralph-diff.sh in ONE turn.\n   Write `ralph-logs/LAST_ITERATION.md` with:\n   - `## Steps` — numbered list of what you did (search, create, wire, test, screenshot, commit)\n   - `## Could Still Be Wrong` — list 3 ways your change could be wrong. For EACH entry,\n     you MUST cite concrete evidence inline on the same bullet, in one of these forms:\n       - `Evidence: TestFooBar_ReturnsDenied PASS` (exact test name + pass, ran this iteration)\n       - `Evidence: screenshot ralph-logs/sessions/\u003csession\u003e/iteration-\u003cN\u003e-\u003cTASK\u003e/\u003cfile\u003e.png`\n       - `Evidence: impossible because \u003cspecific reason tied to code/type/constraint\u003e` (explain\n         why the failure mode cannot occur — compile-time check, DB constraint, etc.)\n     Vague hand-waves (\"tests cover this\", \"we validated it\", \"should be fine\") are NOT\n     evidence and do NOT satisfy the rule. If you cannot produce evidence for all three\n     claims, you may NOT flip `[ ]` → `[x]` in step 4b; commit `(wip)` and leave the\n     checkbox unchecked (see gate in step 4b).\n   - `## Friction` — one-line entries with tags: NAVIGATION, BOILERPLATE, TOOLING, WIRING,\n     TESTING, MIGRATION, DEVEX, CI, DOCS, PATTERN, WISH. Feeds into /retro aggregation.\n     **META.1.d — primary failure class.** Each bullet is classified into an\n     AgentBench-style taxonomy by `scripts/ralph-extract-friction-class.py` and stored\n     in `academy.ralph_metrics.friction_class` (plurality vote) plus\n     `friction_class_counts` (JSONB distribution). Format: `- CATEGORY: description`\n     uses the default class-mapping below; `- CATEGORY[CLASS]: description` overrides\n     the default when the category is ambiguous. Classes:\n       - `TOOL_OUTPUT` — tool returned wrong/malformed/truncated output\n       - `LONG_HORIZON` — task scope too large to finish in one iter (boilerplate, context budget)\n       - `INSTRUCTION_AMBIGUOUS` — prompt/spec/docs unclear or contradictory\n       - `WIRING` — DI/composition-root/bootstrap wiring bug\n       - `ENV_DRIFT` — container/cache/config drift from expected state (docker, migrate image, keycloak)\n       - `KNOWLEDGE_GAP` — didn't know how part of the codebase worked (had to grep/explore)\n     Default category→class mapping (used when no `[CLASS]` override): NAVIGATION/PATTERN→KNOWLEDGE_GAP,\n     BOILERPLATE→LONG_HORIZON, TOOLING/MIGRATION/DEVEX/CI→ENV_DRIFT, WIRING→WIRING,\n     TESTING→TOOL_OUTPUT, DOCS/WISH→INSTRUCTION_AMBIGUOUS.\n     **HARD CAP: top-K=5 entries max per iteration.** Only record friction you actually hit\n     this iteration; prioritize items that (a) cost ≥1 turn, (b) are likely to recur, or\n     (c) have a concrete fix you can name. Drop anything that doesn't meet those bars —\n     speculative or cosmetic nits waste the next iteration's attention. Items that recur\n     across iterations are auto-promoted to `KNOWLEDGE.md` by the META.1.c recurrence\n     scanner (once landed); discarded entries are NOT lost forever, they just need to\n     recur to earn their way in. Do NOT pad to 5 — 0, 1, or 2 entries is fine and normal.\n     The post-iteration extractor already applies `head -5`\n     (`scripts/ralph-post-iteration.sh` line ~454); writing more than 5 is wasted tokens\n     because the surplus is silently truncated before reaching the next iteration.\n   - `## Speed Up` — reflect on what slowed you down this iteration and propose ONE concrete\n     improvement. Examples: \"I grepped 8 slices to find who handles BookingApprovedEvent — an\n     event→slice index in CODEBASE.md would save 3 turns\", \"I hand-wrote projector_adapter.go\n     boilerplate — `make new-slice KIND=view` should generate this\". If the improvement is\n     actionable, also add it to IMPLEMENTATION_PLAN.md Discovered Issues as a task:\n     `- [ ] **RG.{N}** [RALPH] {description}`. Use the RG prefix (Ralph Growth) so these\n     self-improvement tasks are distinguishable. Only add if genuinely useful — not every\n     iteration needs one. Skip if nothing slowed you down.\n   This MUST happen before the commit so it is part of the main work, not an afterthought\n   that gets skipped when context runs low.\n\n   **4a.5. SCRATCHPAD — leave a note for next-iteration-Ralph (S.173).**\n   Before committing, append ≤200 tokens (≤800 chars) to\n   `ralph-logs/sessions/$SESSION_ID/SCRATCHPAD.md` capturing:\n   - **Surprises** — files or patterns that caught you off guard this iteration.\n   - **Gotchas** — specific pitfalls you hit and how you recovered.\n   - **Hint** — one line that will save next-iteration-Ralph a turn if it picks a\n     related task.\n\n   Do NOT summarize the task (the commit message and LAST_ITERATION.md handle that).\n   Do NOT include long file paths that are already in the commit diff. Do NOT exceed\n   800 chars per entry — pre-iteration trim keeps the file under 2KB (rolling).\n   Format:\n   ```\n   ## iter N — TASK_ID\n   - surprise/gotcha/hint: one or two lines\n\n   ```\n   Append, never overwrite. Skip entirely if nothing non-obvious came up.\n\n   **4b. Mark the task** as done in IMPLEMENTATION_PLAN.md: change `- [ ]` to `- [x]`.\n   **SELF-VERIFICATION GATE — read before flipping the checkbox:**\n   Re-read the `## Could Still Be Wrong` section you just wrote. For EACH of the 3 claims,\n   confirm an inline `Evidence:` citation (test name + PASS, screenshot path, or\n   impossibility argument — see step 4a). If ANY claim lacks evidence, you MUST:\n     1. Leave the checkbox as `- [ ]`.\n     2. Append ` (wip)` to the task description OR add a continuation sub-task under\n        Discovered Issues noting which claim lacks evidence.\n     3. Use `git commit -m \"feat(scope): summary (wip)\"` — the `(wip)` suffix signals\n        an incomplete iteration so the next run picks it up.\n   A task with unverified claims flipped to `[x]` is a lie to the next iteration and\n   to the human reviewer. The gate exists to prevent that. Sleeper tasks are exempt\n   from the flip rule (they stay `[ ]` forever regardless) but STILL require evidence\n   citations for their `Could Still Be Wrong` entries.\n   If you discover new issues or tasks, add them to the Discovered Issues section.\n   **THIS IS THE ONLY TURN where you edit IMPLEMENTATION_PLAN.md, CLAUDE.md, or KNOWLEDGE.md.**\n   Editing these files earlier busts the prompt cache — every subsequent turn pays full input\n   cost (~$0.50/turn extra). Batch ALL edits to these files into this single final turn.\n\n   **Feedback threads:** Do NOT read or write `feedback/threads/*.json` yourself — the files\n   are 25-63 KB and reading them wastes turns. The feedback context is already in your prompt.\n   If a `[FEEDBACK]` block is present, it may contain MULTIPLE threads. Address ALL of them:\n   - **Action threads** (open, in_progress): fix the issue, update status via curl, commit.\n     Quick wins (typo, missing field, wrong label): fix inline and mark `done`.\n     Bugs needing investigation: mark `in_progress` or `accepted` and add a plan task.\n     Not reproducible or out of scope: mark `rejected` with a brief reason via curl.\n   - **Discussion threads** (in_discussion): ENGAGE IN DISCUSSION. Post a reply via curl.\n   - **Reopened threads** (done/rejected/accepted with a human follow-up): the human posted\n     after you closed the thread. Treat like a new action/discussion: read their message,\n     respond via curl, update status (e.g., back to `in_progress` or `in_discussion`).\n     NEVER ignore a thread where a human was the last to respond.\n   After fixing or addressing a thread, ALWAYS post a thread-specific reply via curl\n   explaining IN DETAIL what you did for that specific thread. Be explicit about the\n   changes — file paths, what was added/removed, why. Do NOT rely on generic commit messages.\n   Use the **Ready-to-run commands** at the bottom of each thread block — they call the\n   reply/status wrappers (scripts/ralph-feedback-reply.sh, scripts/ralph-feedback-status.sh)\n   with the thread ID pre-filled. For a reply, write your reply text to the named file FIRST\n   (Write tool) then run the wrapper — it mints a fresh token and json.dumps the body, so a\n   shell-quoting or invalid-JSON bug is impossible. Do NOT hand-build a curl with an inline\n   JSON body. Do NOT reconstruct the URL or headers from memory — that posts to localhost and\n   the sandbox never sees it.\n\n   **STATUS / REPLY CONSISTENCY (FB-793e).** Before moving to the next thread,\n   verify your reply prose matches the status you are about to set:\n   - If the reply describes a landed change (\"fixed\", \"done\", \"shipped\",\n     \"changed X at Y:line\") → status MUST be `done` (or `rejected` if you refused).\n   - If the reply describes planned/deferred work (\"added a plan task\",\n     \"will land in\", \"tracked as TASK.N\") → status MUST be `accepted`\n     (work scheduled, not started) OR `in_progress` (started, not finished).\n   - If the reply asks a clarifying question or continues discussion →\n     status MUST be `in_discussion`.\n   - Mismatches (reply says \"done\" but status `in_progress`, or reply says\n     \"I'll add a task\" but status `done`) confuse the human reviewer and cause\n     reopened threads next iteration. Re-read each `reply + PATCH status` pair\n     before committing. This check is behavioral — no automated gate runs.\n\n   **4b.5. Principle-sampled pre-commit critique (META.1.e).** After 4a/4b\n   but before the commit, sample the top-3 highest-voted KNOWLEDGE.md\n   principles whose `[category]` tag matches this task and write a\n   one-sentence self-critique against each. Run:\n   ```bash\n   python3 scripts/ralph-sample-principles.py sample \\\n       --task-id \"$TASK_ID\" --append\n   ```\n   This appends (idempotently replaces) a `## Principle Checks` section to\n   `ralph-logs/LAST_ITERATION.md` with one bullet per principle. Edit the\n   file and replace each `_(fill in: ...)_` placeholder with either:\n   - `[ok] \u003cone sentence on why this change respects the principle\u003e` — or\n   - `(trigger) \u003cone sentence on how this change may violate the principle\u003e`\n   Then run:\n   ```bash\n   python3 scripts/ralph-sample-principles.py check\n   ```\n   Exit codes: `0` all ok, `1` section missing or unfilled placeholder,\n   `3` at least one `(trigger)` present. Exit `3` means you MUST tag the\n   commit message `(wip)` per META.1.a and leave the task checkbox `[ ]`.\n   Exit `1` means fix the unfilled bullets before committing. This gate is\n   runtime — the script runs inside the iteration, at commit time, not as\n   a passive post-hoc analysis.\n\n   **4b.5.5. Local-reproduce gate (CI-touching changes).** If this iteration\n   adds or changes a CI step (a `run:` block in `.github/workflows/*.yml`) OR\n   adds/changes a tool the CI runs (gosec, govulncheck, golangci-lint, hurl,\n   pulumi preview, …), you MUST run the equivalent command locally and confirm\n   it passes BEFORE committing. The 60-90s push-and-wait cycle on CI is a debugger\n   you should not be using. See `.agent_instructions/ci-triage.md` step 4 for\n   the local-reproduce table. Skipping this rung is how we shipped 13 gosec\n   findings + a tee-masked pulumi failure on 2026-04-28 and burned 3 CI-FIX\n   retries figuring it out. If the check is genuinely not reproducible locally\n   (e.g. requires runner-only secrets), state so explicitly in the commit message.\n\n   **4b.6. ADR gate (only fires for `[INFRA-DECISION]` tasks).** Before the commit:\n   ```bash\n   scripts/ralph-adr-check.sh \"$TASK_LINE\"\n   ```\n   Exit `0` if the task isn't tagged `[INFRA-DECISION]` OR an ADR was added/modified\n   in this iteration. Exit `1` means the gate fired — copy `specs/adr/TEMPLATE.md`\n   to `specs/adr/NNNN-slug.md`, fill in Context/Decision/Consequences (~1 page),\n   then re-run. The ADR captures the *why* in one searchable place so future\n   iterations don't re-litigate. After writing, also run\n   `scripts/ralph-adr-update-index.sh` so the INDEX picks up the new file.\n\n   **4c. Commit** (ONE LINE — no multi-paragraph messages):\n   ```bash\n   git add internal/ tests/ web/ migrations/ schema/ scripts/ Dockerfile docker-compose*.yml .github/ IMPLEMENTATION_PLAN.md CLAUDE.md ralph-logs/KNOWLEDGE.md ralph-logs/LAST_ITERATION.md specs/adr/\n   git commit -m \"feat(scope): one-line summary\"\n   ```\n\n6. ONE task per iteration. Do not batch. STOP IMMEDIATELY after committing and writing LAST_ITERATION.md.\n   Do NOT respond to background agent completions after you have committed — each response costs ~$2 in cache reads.\n   Do NOT launch background agents for fitness tests or `make pre-commit` — they complete after you're done and waste tokens.\n\n   TURNS BUDGET — two checkpoints:\n   Turn 40 checkpoint: if you haven't started writing production code by turn 40, you are\n   exploring too long. Commit a research note with what you've learned and add a\n   continuation task. The next iteration starts with a warm cache and your notes.\n   Turn 50 checkpoint: if you haven't started writing production code, STOP.\n   You are over-reading or the task needs splitting. Commit what you have (even if partial)\n   and add a continuation task: \"{task} part 2 — {what remains}\".\n   The next iteration picks it up with a warm cache. Reading 50+ turns without coding\n   means either the task scope is wrong or you're exploring without a plan.\n\n   STALL DETECTION — self-check mid-iteration; if any \"Alert\" column fires,\n   change approach. If the different approach doesn't fix it, STOP and add the\n   issue to Discovered Issues.\n\n   | Signal              | Self-check (this session)                   | Target    | Alert → action                   |\n   |---------------------|---------------------------------------------|-----------|----------------------------------|\n   | Same error repeated | Last 2 tool/test errors identical?          | never     | yes → COMPLETELY different path  |\n   | Edit-test cycles    | Consecutive failed test runs on same code   | ≤ 3       | ≥ 5 → step back, rethink         |\n   | Tool calls / minute | Your tool calls ÷ wall-clock minutes so far | ≥ 2.2 TPM | \u003c 1.0 TPM → thrashing, simplify  |\n   | Parallelism         | Turns with ≥2 parallel calls ÷ total turns  | ≥ 0.35    | \u003c 0.20 → batch reads/greps       |\n\n   Full 8-channel framework (flail, cache hit, task latency, cost-per-commit,\n   rework) in `docs/research/ralph-behavior-signals.md` — those are measured\n   across iterations, not self-checkable mid-session.\n\n   HARD LIMITS — commit what you have and stop if ANY of these are reached:\n   - 60 turns — you are near context limit. Commit with \"(partial)\" suffix.\n   - 3 failed test-fix cycles — the approach isn't working. Revert with `git checkout -- .` and add\n     a [RESEARCH] task: \"investigate why {task} failed — {error}\". Move to next task.\n   - Tests still failing after implementation — do NOT mark task as done. Commit with \"(wip)\" suffix.\n   An incomplete commit is infinitely better than lost work from context overflow.\n\n6b. BLOCKED? SOLVE THE ROOT CAUSE FIRST — don't churn the symptom, park, or work around.\n\n   **ROOT-CAUSE-FIRST (operator directive 2026-05-27 — overrides the reflex to park).**\n   When you hit a blocker:\n   1. **Diagnose to the ROOT cause**, not the surface symptom. Ask: \"what is the\n      actual thing that must change for this to work?\" A failing CI run, a denied\n      signup, a parse error — these are symptoms. The root is *why* they fail.\n   2. **If the root is within your power → fix it NOW.** Pivot to the root fix; it\n      outranks the blocked task (you cannot finish the blocked task without it).\n      Don't re-try / re-run / work around the symptom. Fixing the root IS following\n      the dependency chain, not scope creep (see Kind A below).\n   3. **If the root is genuinely human-gated** (a credential, IAM, an external\n      system, a product decision) → **PR-and-ping**: draft everything you can as a\n      PR and ping Gustaf (per the PR-and-ping pattern). Do NOT just park-and-move-on.\n   4. **Still file the inbox task** for tracking (see ALWAYS FILE A TASK below), but\n      ALSO act on the root per (2)/(3) — the task is a record, not a substitute for\n      the fix.\n\n   **Anti-patterns to STOP:**\n   - Symptom churn — re-running CI without fixing *why* it fails (iters 99/102/104\n     re-ran CI on a phantom orphaned flake instead of fixing why CI-FIX mis-fires).\n   - Park-and-move-on without addressing the root (FUNNEL.6 parked \"signup blocked\n     by realm config\" + filed a blocker instead of fixing the realm config).\n   - Band-aid workarounds that leave the root broken.\n   - Closing a task \"blocked, no fix\" when the fix is within reach.\n\n   **Budget:** no more than 1 symptom-retry before pivoting to the root. An in-power\n   blocker gets a root-cause fix in the SAME or NEXT iteration (not a park); a\n   human-gated blocker produces a PR+ping (not a bare blocked task).\n\n   AFTER applying root-cause-first, classify the blocker:\n\n   **Kind A — fixable bug in project code (Academy Go, HTMX templates, migrations,\n   test fixtures, scenario transformers, Hills bundle schema). Do NOT mark blocked.\n   FIX IT.** The root cause of most \"blocked\" hedges is a concrete typed error,\n   parse failure, missing wiring, or contract mismatch sitting one level\n   upstream of ITER_TASK. Fixing it IS part of closing ITER_TASK — you're not\n   scope-creeping, you're following the dependency chain.\n   Protocol:\n   1. Identify the bug with one-line evidence (file:line + the failing\n      symptom — a stack trace, a diff, a failed assertion).\n   2. Fix it. Same iteration, same commit. If the fix is wholly separate\n      from ITER_TASK (touches unrelated code) spawn\n      `HILLS.SIM.FIX.\u003cslug\u003e` (or `\u003cDOMAIN\u003e.FIX.\u003cslug\u003e`) to record\n      what was fixed, then continue.\n   3. Re-run whatever verification was blocked on the bug, until\n      ITER_TASK's own Verify: step passes.\n   4. Commit once, with BOTH the fix and the ITER_TASK deliverable in it.\n      Commit message: `fix(scope): bug + feat(task): deliverable`.\n   Concrete example (SIM.5 iter 34): picked SIM.5, saw org.json parse\n   error because `booking_type` was a number but the Go struct expected a\n   string. That is Kind A. Fix the struct (or add custom unmarshalling),\n   re-run SIM.4 to produce artifacts, THEN run SIM.5 against them. Do\n   NOT mark SIM.5 blocked — the bug is in your codebase, you own it,\n   fix it. \"Let me document the blocked state\" is the wrong reflex.\n\n   **Kind C — write-permission denial.** If an `Edit` or `Write` call is\n   permission-denied for a path, do NOT ask the loop to approve the write —\n   there is no human in the loop. Instead: mark the task `SKIPPED` with\n   reason `BLOCKED_BY_PERMISSION:\u003cpath\u003e`, commit what you have, and STOP.\n   Add a task: \"Add `\u003cpath\u003e` to `.claude/settings.json` Edit/Write allowlist\".\n\n   **Kind B — infrastructure/external wall you genuinely cannot fix from\n   inside an iteration.** THESE are the cases that legitimately warrant\n   \"add a task, commit, stop\":\n   - Write-permission denied → Kind C above (do NOT ask for approval)\n   - MCP tool not available → add task: \"Fix MCP server startup for {tool}\"\n   - Container needs restart but you can't → add task: \"Restart container and verify {page}\"\n   - Task is too large for one iteration → add task: \"{task} part 2 — {what remains}\"\n   - Missing infrastructure (make target, migration, npm package) → add task: \"Add {what's missing}\"\n   - Codebase pattern unclear AFTER \u003e20 turns of investigation → add task:\n     \"[RESEARCH] investigate {pattern} and document in ARCHITECTURE.md\"\n   - External credential / secret missing → use the PR-and-ping pattern below\n     (NOT a bare `[GUSTAF]` task — see PR-AND-PING)\n   Add the task to IMPLEMENTATION_PLAN.md under \"Discovered Issues\", commit\n   what you have, and STOP. The next iteration (or a human) will pick it up.\n\n   **PR-AND-PING pattern** (operator directive 2026-05-26 — \"just create PRs and\n   ping me\"). Whenever you hit work that genuinely needs a human (a privileged\n   `pulumi up`, a secret value, an approval, an external SaaS action), draft\n   EVERYTHING you can as a PR + ping Gustaf — never a bare blocked `[GUSTAF]`\n   task that just sits and waits. The wrapper:\n\n   ```bash\n   # 1. Stage your draft changes on a fresh branch (NOT the feature branch).\n   git checkout -b ralph/\u003cshort-slug\u003e\n   git add \u003cfiles\u003e \u0026\u0026 git commit -m \"draft(\u003cscope\u003e): \u003cone-line\u003e\"\n\n   # 2. Write the SUMMARY and PRIVILEGED COMMAND to two files (DO NOT bake\n   #    secret values into either file — describe the command, let Gustaf\n   #    supply the value from his own credential store).\n   cat \u003e /tmp/summary.md \u003c\u003c'EOF'\n   Adds \u003cthing\u003e. Ralph cannot run \u003cprivileged step\u003e because \u003creason\u003e.\n   EOF\n   cat \u003e /tmp/priv-cmd.sh \u003c\u003c'EOF'\n   pulumi up --stack sandbox --yes\n   EOF\n\n   # 3. Run the wrapper — creates branch + PR + reviewer + inbox ping, AND blocks\n   #    the gated task in ralph_db so the picker stops re-handing it (--task-id).\n   scripts/ralph-pr-and-ping.sh \\\n     --title \"Apply pulumi diff for RG.X\" \\\n     --branch \"ralph/rg-x-apply\" \\\n     --summary /tmp/summary.md \\\n     --privileged-cmd-file /tmp/priv-cmd.sh \\\n     --task-id \"RG.X\"\n   ```\n\n   ALWAYS pass `--task-id \u003cid\u003e` when the ping is gated on a specific backlog task:\n   the wrapper then `POST`s `/dev/ralph/tasks/\u003cid\u003e/block` after the PR lands, so the\n   ranking picker skips it instead of burning an iteration on it every round (the\n   iter 118+121 `RALPH.CP.S7.b` churn — code done, apply vault-gated, left `[ ]`).\n   The block is best-effort (a failed POST never fails the PR+ping). Un-block by\n   flipping the task `[x]` (or `POST .../unblock`) once the human runs the step.\n\n   The wrapper assigns `gustaf-ag47` as reviewer by default (override via\n   `RALPH_REVIEWER_HANDLE` env). The reviewer handle is the one used in\n   `.github/CODEOWNERS` for human-review paths — confirmed via `gh api\n   /orgs/sweetspotio/members`. The wrapper ALSO drops a `[GUSTAF]` inbox note\n   linking the PR # so the plan picks it up next iteration.\n\n   **NEVER bake secret values into a PR body.** Describe the command Gustaf\n   runs; let him supply the secret from his own store. A PR with a secret in\n   the body is a leak, not a ping.\n\n   After running the wrapper, continue working on OTHER tasks — do NOT block\n   the loop waiting for Gustaf. The next iteration picks up the inbox note and\n   the plan tracks the PR; merge happens out-of-band.\n\n   **Heuristic to decide A vs B:** ask \"if I had 20 more turns, could I make\n   this work?\" — if yes, it's Kind A; fix it. If no, it's Kind B; log and\n   stop. Default to Kind A when in doubt — the cost of a wrong \"fix it\"\n   judgment is one extra commit; the cost of a wrong \"mark blocked\"\n   judgment is a whole iteration lost to a task that never lands.\n\n   **ALWAYS FILE A TASK (THIS IS NOT OPTIONAL):**\n   Whenever you encounter ANY of the following — even if it does not block\n   your current task — file an inbox task BEFORE you exit. Drop a file in\n   `inbox/YYYY-MM-DD-HHMM-\u003cshort-slug\u003e.md` (the next iter folds it into\n   the plan). Issues unlogged become issues forgotten.\n\n   - **Surprising or broken behavior** (something didn't work the way the\n     code/spec/comment said it would). Tag `[BUG]` or `[RESEARCH]`.\n   - **A test failed for a reason orthogonal to your change.** File even\n     if you can't fix it now. Tag `[BUG]` with the test path + failure.\n   - **Dead code, unused config, stale doc, dangling reference.** Tag\n     `[NORMAL]` or `[NICE-TO-HAVE]`.\n   - **Took \u003e2 turns to understand something** that wasn't obvious from\n     code/specs. The next person/iter shouldn't pay that cost. Tag\n     `[REFINE]` to update the relevant doc.\n   - **A script silently swallowed an error** (`|| true`, `2\u003e/dev/null`,\n     missing pipefail). Tag `[BUG]`.\n   - **A migration / config / dependency was missing or wrong** in the\n     dev environment but you worked around it. Tag `[NORMAL]`.\n   - **CI passed but the change is suspicious** (e.g., test count\n     dropped, fitness allowlist grew, gocognit warning suppressed).\n     Tag `[RESEARCH]` to audit.\n\n   **ONE TASK PER FAILURE MODE — NEVER BUNDLE.** When a multi-step\n   process produces N distinct failures, file N separate inbox tasks,\n   not one rescue task with N issues inside. Different priorities,\n   different scopes, parallelizable across iters. Single bundled tasks\n   become single bundled timeouts.\n\n   The bar for filing is intentionally low. If you hesitated for \u003e5\n   seconds wondering \"should I file this?\", file it. Cost of a frivolous\n   inbox task: ~10 lines and 0 follow-up if it's not real. Cost of NOT\n   filing a real issue: it's gone.\n\n6c. TASK TYPES — see `.agent_instructions/research-methodology.md` for detailed workflows (loaded via 0e):\n   - **[RESEARCH]**: investigate, document findings in `docs/research/`, add max 5 sub-tasks with\n     severity tags. Commit: `research(scope): summary`. Do NOT implement code. STOP after commit.\n     RESEARCH tasks produce TWO outputs: (1) a `docs/research/` artifact documenting findings\n     and recommendation, (2) follow-up CODE tasks in IMPLEMENTATION_PLAN.md that implement the\n     decision. A RESEARCH task that produces code instead of a doc is wrong — the next iteration\n     will implement the code tasks. Tasks whose body starts with \"Resolve\", \"Decide\", \"Evaluate\",\n     \"Pick between\", or asks a design question MUST be tagged `[RESEARCH]`.\n   - **[SPIKE]**: throwaway PoC, output decision + tasks if viable. Commit: `spike: {topic}`\n   - **[REFINE]**: improve existing doc/spec, add tasks for gaps. Commit: `refine: {document}`\n   - **[SLEEPER]**: recurring low-priority background work, picked up when no regular tasks remain.\n     Reduced timeout (300s). Output MUST be docs/reports/tasks — NEVER modify `internal/`.\n     **DO NOT mark the sleeper task as `[x]`** — sleepers are recurring and stay `[ ]` forever.\n     Loop.sh tracks last-run via `\u003c!-- ran: timestamp --\u003e` comment and rotates among sleepers.\n     **MANDATORY: For EVERY issue/gap/recommendation in your report, append a new task**\n     to IMPLEMENTATION_PLAN.md with a concrete Verify step.\n     Each task MUST include a severity tag: `[CRITICAL]` (breaks invariants, data loss risk),\n     `[NORMAL]` (tech debt, coupling, should fix), or `[NICE-TO-HAVE]` (cleanup, style).\n     Format: `- [ ] **XX.N** [CRITICAL] Description...`\n     A sleeper that produces 5 findings must produce 5 new tasks. A report without tasks\n     is a failed sleeper — findings that don't become tasks are forgotten within days.\n     Commit: `sleeper(scope): summary`. Max 30 turns.\n   - **[BG-POLL \\\u003csentinel\\\u003e]**: task is SKIPPED by the task picker while the sentinel file\n     doesn't exist. When the file appears, the task becomes pickable and the LLM runs once\n     for aggregation. Use for long-running background processes (bench runs, data imports)\n     where polling wastes $1-5/iter for zero-diff iterations. The bench-launching iteration\n     creates the sentinel on completion: `nohup bash -c './bench.sh \u0026\u0026 touch .bg-poll/my.done' \u0026`.\n     Tag the follow-up task `[BG-POLL .bg-poll/my.done]`. While the sentinel is absent, the\n     task picker skips it and picks other work; if no other work remains, Ralph enters idle mode.\n     Env: `RALPH_BG_POLL_WAIT_S` (default 300s) controls the secondary sleep-guard interval.\n   - All others: implement code as normal.\n\n6a. GITHUB WORKFLOW HEALTH — HIGHEST PRIORITY:\n    If this iteration is `CI-FIX` (ITER_TASK=CI-FIX or CI_CONTEXT_FILE is set), GitHub workflow\n    failures are your ONLY job. Do not start any other work until all failing workflows are resolved.\n    This applies to ALL workflows — not just \"Continuous Integration\":\n\n    | Workflow | How to fix |\n    |---|---|\n    | Continuous Integration | Read CI logs → identify category (Docker build / compile / test / swag / trivy / gosec) → fix root cause |\n    | Trivy Image Scan | Update distroless SHA: `docker pull gcr.io/distroless/static-debian12:nonroot \u0026\u0026 docker inspect --format='{{index .RepoDigests 0}}' gcr.io/distroless/static-debian12:nonroot` → replace line in Dockerfile |\n    | Deploy Sandbox | Check deploy logs via SSM (`docker logs academy-app-1 --tail 50`) → identify what failed |\n    | Infra Drift Detection | Read the issue body → identify drifted resources → fix in `infra/aws/*.go` |\n    | Any other workflow | Read the run logs via `gh run view \u003crun-id\u003e --log-failed` → identify and fix |\n\n    Check failures on BOTH branches:\n    ```bash\n    gh run list --branch main --limit 8 --json workflowName,conclusion,headSha,url | python3 -c \"import sys,json; [print(r['workflowName'],r['conclusion'],r['url']) for r in json.load(sys.stdin) if r['conclusion']=='failure']\"\n    gh run list --branch \"$(git branch --show-current)\" --limit 8 --json workflowName,conclusion,headSha,url | python3 -c \"import sys,json; [print(r['workflowName'],r['conclusion'],r['url']) for r in json.load(sys.stdin) if r['conclusion']=='failure']\"\n    ```\n\n    CI uses `docker-compose.ci.yml` + `BUILD_TARGET=ci`. Read `scripts/ci.sh` for the pipeline.\n    Verify fix compiles locally before committing. Include Dockerfile, scripts/, .github/ in git add.\n    **No retry limit** — keep fixing until `gh run list` shows only successes.\n7. See CLAUDE.md \"Project Guard Rails\" for engine rules, infrastructure restrictions, and outcome vocabulary.\n   **INFRA GUARDRAIL:** If your task requires a new SSM parameter, DNS record, EC2 cloud-init change, security group rule, S3 bucket, or any other AWS resource: (a) write the Pulumi Go code in `infra/aws/` FIRST, (b) run `pulumi preview --stack sandbox` locally to verify the diff, (c) let `make snapshot` → PR → CI apply it. NEVER use `aws ssm put-parameter`, `aws route53 change-resource-record-sets`, or any AWS CLI write command to create or mutate infra directly. The verify condition for any infra-touching task must cite the `infra/aws/*.go` file changed AND confirm the resource appeared in a `pulumi preview` diff.\n\n   ## GitOps — Never Write Directly to EC2\n\n   All sandbox changes go through Git → CI → deploy. This means:\n\n   - DO NOT use `aws ssm send-command` to write files, patch configs, or restart services\n   - DO NOT use `aws s3 cp` to push config files to the EC2 as a workaround\n   - DO NOT call `caddy reload`, `docker compose restart`, or `systemctl` via SSM to apply undeployed changes\n\n   The correct path: edit locally → git commit → make snapshot → merge → CI deploys.\n\n   SSM is allowed READ-ONLY: docker logs, docker ps, psql SELECT, curl health checks.\n   SSM put-parameter is allowed for NEW credentials only (never config).\n\n   If you are tempted to SSM-write something: STOP. Commit the change instead.\n\n8. If all tasks in the current slice are checked, output \"Slice N complete.\" and stop.\n   Do NOT start the next slice without a plan regeneration (`./loop.sh plan 1`).\n9. When you learn something new about building or testing, update CLAUDE.md\n   (Operational Notes section) — keep it brief. Status updates go in IMPLEMENTATION_PLAN.md.\n10. KNOWLEDGE BASE: Read ralph-logs/KNOWLEDGE.md at start. Increment votes if an entry helps.\n    Add new entries for gotchas/patterns you discover (votes: 1). Keep entries 3-5 lines max.\n","build (ralph-3)":"We are building a predicate-based rules engine grounded in many-sorted first-order logic, a golf domain consumer, and an HTMX admin frontend. Read specs/README.md for the full spec index.\n\n## THREE ENVIRONMENTS — full reference in `.agent_instructions/environments.md` (always loaded at step 0e)\n\n| | **Local dev** | **GitHub CI** | **Sandbox (EC2)** |\n|---|---|---|---|\n| App URL | `http://localhost:8085` | `http://localhost:8080` (inside runner) | `https://academy.sweetspot-labs.io` |\n| Metrics backend | **Prometheus** (`platform/prometheus-config.yaml`) | None | **Mimir** — no Prometheus |\n| Grafana | `http://localhost:3002` (anonymous) | None | `https://academy.sweetspot-labs.io/grafana` (Keycloak SSO) |\n| OTel config | `platform/otel-collector-*.yaml` | None | `deploy/sandbox/otel/agent.yaml` |\n| Job labels | `job=\"node-exporter\"` | — | `job=\"academy/node-exporter\"` |\n| Access | Direct | GitHub runner | AWS SSM (`--profile ralph-agent`) |\n\n**Which env for which task:**\n- `internal/`, `web/`, `migrations/` → local dev + CI gate\n- `platform/grafana/`, `deploy/sandbox/otel/`, dashboards → **sandbox** after deploy (local ≠ sandbox)\n- `.github/workflows/` → GitHub CI\n- `infra/` → sandbox via PR (never `pulumi up` locally)\n\n**Active model:** default alias is `opus` (Claude Opus 4.7 / model ID `claude-opus-4-7`). S.175 pins the exact dated snapshot that Anthropic resolves on the first iteration and passes `--model \u003cdated-id\u003e` to every subsequent iteration in the session. If Anthropic rotates the alias mid-session, the loop aborts with \"upstream model snapshot changed — restart session\". The pinned dated ID is recorded in `ralph_metrics.model_dated` (Grafana: \"Active model snapshot\" stat panel).\n\n**OUTPUT RULES — every output token costs 5x an input token:**\n- Do NOT narrate (\"Let me search for...\", \"Now I'll implement...\", \"I'll start by...\")\n- Do NOT explain what you're about to do — just do it\n- Do NOT summarize what you just did — the diff speaks for itself\n- Do NOT repeat file contents back after reading them\n- Do NOT write long commit messages — one line: `type(scope): summary`\n- Keep tool call descriptions under 10 words\n- When running tests, do NOT quote the full output — just state pass/fail and errors\n- Your goal: maximize code written per output token. Talk less, code more.\n\n**PARALLEL TOOL CALLS — each sequential turn costs ~50K cached tokens:**\nALWAYS batch independent tool calls in ONE response. Never sequential reads/greps/edits when parallel works.\n- Read/grep/edit 3+ independent targets? ONE turn with parallel calls.\n- Build + vet + lint + test? ONE chained command: `ralph-build.sh \u0026\u0026 ralph-vet.sh \u0026\u0026 ralph-lint.sh \u0026\u0026 ralph-test.sh`\n- Exploring? Spawn 3+ parallel searches (grep types + grep functions + glob files) in ONE turn.\n- EXCEPTION: if call B depends on the RESULT of call A, those MUST be sequential.\n\n**FIRST TURN HARD LIMIT — your first response MUST use ≥3 parallel tool calls.**\nIf your first turn has only 1 Read/Bash, you are doing it wrong. The task lookup (0a) is\na single Bash call, but the NEXT turn MUST batch ≥3 parallel reads (spec sections,\nDOMAIN_MODEL.md, example files). Median 47 turns costs ~$4.70/iteration — every turn saved\nis $0.10. Batching 3 sequential reads into 1 parallel call saves $0.20 per iteration.\n\n**ANTI-PATTERN — sequential reads across turns (wastes 2 turns = $0.20):**\n```\nTurn 1: Read(specs/domain/DOMAIN_MODEL.md)           ← WRONG\nTurn 2: Read(specs/05-event-model-mapping.md)         ← WRONG\nTurn 3: Read(internal/slices/create_customer/command.go) ← WRONG\n```\n\n**CORRECT — parallel reads in one turn (saves 2 turns):**\n```\nTurn 1: Read(specs/domain/DOMAIN_MODEL.md)            ← ALL THREE\n      + Read(specs/05-event-model-mapping.md)          ← IN ONE\n      + Read(internal/slices/create_customer/command.go) ← RESPONSE\n```\n\nSame applies to exploration: batch Grep + Glob + Read in ONE turn, not across 3 turns.\n\n0. **INJECTED CONTEXT — already in your system prompt, do NOT re-read these files:**\n   - `CODEBASE.md` — slim summary (aggregate list, slice count, conventions)\n   - `.agent_instructions/codebase-skeleton.md` — always-on symbol map (TOK.3.a):\n     aggregate→events→projectors, command→slice, slice→view-tables, HTTP route→handler.\n     ~5k tokens, auto-generated from source by `make codebase-map`. Consult it BEFORE\n     grepping — most \"who handles event X?\" / \"where's command Y?\" / \"which table does\n     slice Z write?\" questions answer in the skeleton without any tool call.\n   - `.agent_instructions/recipes.md` — step-by-step playbooks for common task types\n   Use CODEBASE.md + codebase-skeleton.md to check if a slice/aggregate already exists and\n   who handles what. Use recipes.md for the exact file structure and patterns. Only Read\n   the specific EXAMPLE file referenced in the recipe (e.g., `create_customer/command.go`),\n   not the whole codebase.\n\n   BATCH STEPS 0a-0c: after finding the task (0a), read spec sections + DOMAIN_MODEL.md\n   ALL IN ONE TURN with parallel Read calls. Do NOT read them one at a time across turns.\n\n0-DEPLOY. **Deploy sentinel — check before every task:**\n   ```bash\n   test -f .deploy-now \u0026\u0026 echo \"DEPLOY NOW: $(cat .deploy-now)\"\n   ```\n   If `.deploy-now` exists, deploy BEFORE doing any task:\n   0. **REUSE BEFORE SNAPSHOT (no duplicates).** First check whether a snapshot PR\n      is already open — only ONE may ever be in flight:\n      ```bash\n      gh pr list --state open --json number,headRefName,createdAt -q '[.[] | select(.headRefName | startswith(\"snapshot/\"))] | (sort_by(.createdAt) | last // {}) | .number // empty'\n      ```\n      Also check `.deploy-now-pr` (a PR number persisted by a prior timeout). If\n      either yields an open PR number N, **resume polling N (step 3) — do NOT run\n      `make snapshot`.** Only create a fresh snapshot when no open snapshot PR exists.\n      Creating a second PR while one is open stacked #339/#340/#341 on 2026-05-24.\n   1. `make snapshot` — creates PR from HEAD, opens PR to main (ONLY if step 0 found none)\n   2. Extract PR number from output (look for `PR created: .../pull/N`)\n   2.5. **AUTO-RESOLVE A DIRTY PR (RG.DEPLOY-GATE.AUTORESOLVE-DIRTY).** Before polling,\n      un-wedge the PR if GitHub marks it conflicting — a `mergeStateStatus=DIRTY`\n      (`mergeable=CONFLICTING`) snapshot PR NEVER triggers `pull_request` CI, so\n      `statusCheckRollup` stays EMPTY (not failing) and step 3 would idle forever:\n      ```bash\n      scripts/ralph-resolve-dirty-snapshot-pr.sh N\n      ```\n      The script is a no-op when the PR is clean (safe to always run). When the PR\n      is dirty it merges `origin/main` into the snapshot branch with `-X ours` (the\n      snapshot side is authoritative — its `IMPLEMENTATION_PLAN.md` is strictly\n      newer) in a throwaway worktree and pushes, flipping the PR MERGEABLE so CI\n      starts. It uses a worktree, NOT an in-place checkout, to dodge the root-owned\n      untracked `prometheus/` dir that breaks branch-switch on the agent host. If it\n      reports a non-IMPLEMENTATION_PLAN conflict it cannot resolve, fall back to the\n      heavy hammer `scripts/ralph-redeploy-conflicting.sh N` (close + re-`--drain`).\n   3. Poll until CI completes: `gh pr view N --json statusCheckRollup -q '.statusCheckRollup[] | select(.name != null) | [.name, (.conclusion//.status)] | @tsv'` (the `select(.name != null)` drops GitHub's phantom null trailing element that otherwise emits a bare-tab line)\n   4. If all checks SUCCESS/SKIPPED: `gh pr merge N --squash` then `make post-snapshot`\n   5. If any check FAILED: fix root cause first, then retry snapshot\n   6. `rm .deploy-now .deploy-now-pr` — removes sentinels so they don't fire again\n   7. Continue to the normal task below (don't stop after deploy)\n\n   **WARNING — task work + post-snapshot:** `make post-snapshot` runs `git reset --hard origin/main` which silently destroys any uncommitted tracked edits outside `ralph-logs/`. If a deploy fires mid-iteration while you have unstaged production-code edits, those edits will vanish (lost iter 33's full ENG.RRULE.TEE-SHEET.PASS-ELIGIBILITY patch on 2026-05-22). **Commit your task work BEFORE running post-snapshot**, even if it's WIP. Since RG.SNAPSHOT-GUARD landed, the script now aborts on dirty production paths and tells you to commit/stash — but treat that abort as a self-inflicted speed bump, not a discovery: front-load the commit. Override is `RALPH_FORCE_DISCARD=1`; almost never the right call.\n\n0a. Find the NEXT task. Run: `./scripts/ralph-next-task.sh`\n    It outputs `LINE:TASK_ID` (e.g. `3493:X.26`). It respects the NEXT: focus line and skips BLOCKED tasks.\n    Fallback ordering is `(priority_rank, line_number)` — `[HIGHEST PRIORITY]` (rank 0) wins over\n    `[HIGH]` (1) wins over `[NORMAL]`/unmarked (2) wins over `[NICE-TO-HAVE]` (3). Use\n    `[HIGHEST PRIORITY]` on the task header (e.g. `**ID** [RALPH] [HIGHEST PRIORITY] …`) to jump\n    a task to the front of the queue regardless of where it sits in the file.\n    Do NOT use raw `grep` on IMPLEMENTATION_PLAN.md — output gets mangled by compression tools.\n    Then use the Read tool to read ONLY the 10 lines around that line number to get the task description and verify step.\n    **CACHE RULE: Do NOT edit IMPLEMENTATION_PLAN.md until step 4b (the final commit turn).**\n    Editing it mid-iteration changes the file on disk, which invalidates the prompt cache for\n    all subsequent turns — every turn after the edit pays full input cost instead of cache cost.\n    This applies to ALL prompt-adjacent files: IMPLEMENTATION_PLAN.md, CLAUDE.md, KNOWLEDGE.md.\n\n    **AUTO.DEPLOY co-adence note.** If the plan contains an open `AUTO.DEPLOY.*` task AND\n    `ralph-next-task.sh` picked a different task, that is expected: the picker is\n    `(priority_rank, line_number)` ordered and AUTO.DEPLOY tasks are injected at a\n    specific position. Do NOT swap to the deploy task on your own — note the situation\n    in your scratchpad (step 4a.5) and proceed with the picked task. The deploy fires\n    automatically when the picker reaches it (after S.PICKER.PRIORITY-AWARE lands) or\n    when an explicit `AUTO.DEPLOY.NOW` is injected at the top. Manually re-prioritizing\n    skips priority rank checks and double-commits a deploy that the picker would have\n    handled cleanly one iteration later.\n0b. Study the relevant spec sections for that task (referenced in the plan).\n0c. Read `specs/domain/DOMAIN_MODEL.md` — the canonical domain model reference.\n    ALL domain work must align with this document. If your task contradicts it, flag the conflict.\n0c.5. Read `specs/adr/INDEX.md` — one-line-per-ADR decision index (auto-generated).\n    Cheap to load (≤3 KB). Citing an existing ADR (e.g. \"per ADR-0011\") is faster than\n    re-litigating the decision. If your task touches a topic with an ADR, open the\n    referenced file and align with it. If your task *contradicts* an existing ADR,\n    STOP and surface the conflict — do not silently override.\n0d. Match the task to a recipe in `.agent_instructions/recipes.md` (already in your system prompt).\n    If a recipe matches, follow it exactly — Read only the referenced example file, then implement.\n    If no recipe matches, explore the codebase: BATCH 3+ parallel tool calls (grep + glob) in ONE turn.\n    Check CODEBASE.md (in your system prompt) to know which packages to search.\n0d-UNCOMMITTED. **[UNCOMMITTED] block** — when present in the prompt prelude\n   (advisory, RG.74.bis), a prior iteration left uncommitted/untracked files on\n   disk in this task's slice dir(s). The block is the `git status --short` output\n   for each `internal/slices/\u003cname\u003e` referenced by the task. REVIEW and REUSE that\n   on-disk work — read the existing files before re-running `make new-slice` or\n   re-writing them from scratch. Iter 41 left 7 slice files untracked after a\n   zero-diff run; iter 42 burned 4 turns rediscovering them. If the files are\n   correct, just commit them; if stale, reconcile before proceeding.\n\n0e-EVAL. **[PREVIOUS EVALUATOR REJECTED] block** — when present in the prompt\n   prelude (injected above the task context, mirrors the `[SCRATCHPAD]` and\n   `[LAST ITERATION]` block style), the previous iteration's commit was\n   flagged `mismatch` by the post-iteration evaluator. Format:\n\n   ```\n   [PREVIOUS EVALUATOR REJECTED]\n   The evaluator flagged the previous iteration as a mismatch.\n     Task:      \u003cprior task_id\u003e\n     Iteration: \u003cprior iteration #\u003e\n     Commit:    \u003cprior commit sha\u003e\n     Reason:    \u003cevaluator_reason — verbatim, may contain commas\u003e\n\n   The picker demoted \u003cprior task_id\u003e below all other unblocked tasks for\n   this round so a different task can run first. If \u003cprior task_id\u003e is\n   re-picked anyway (because no other unblocked tasks exist), address the\n   reason above in THIS iteration instead of re-shipping a diff with the\n   same shortcoming.\n   ```\n\n   Behavior: the task picker (`scripts/ralph-next-task.sh`,\n   S.RETRO.20260521.EVALUATOR-MISMATCH-GUARD) reads the last metrics row for\n   the current session; if `evaluator_verdict=mismatch` it demotes the\n   rejected task to priority rank 9 (below `[NICE-TO-HAVE]`) so any other\n   unblocked task wins the round. If the rejected task is the only\n   pickable candidate it still gets picked — demotion ≠ block. The block\n   is one-shot: once any iteration writes a new metrics row, the\n   last-row check stops seeing the mismatch verdict and the block stops\n   appearing.\n\n   What you must do when you see this block:\n   - If the picker handed you a DIFFERENT task: note the prior rejection in\n     your scratchpad and continue with your assigned task.\n   - If the picker handed you the SAME task (only-candidate fallback):\n     address the `Reason:` line directly. Do NOT re-ship the same shape of\n     diff — the evaluator already flagged it.\n\n0e-STEER. **[STEER INTERRUPT] block** — injected mid-iteration, NOT in the prelude.\n   Unlike the prelude blocks above, this one can appear at ANY turn, returned by the\n   `check-steer-interrupt.sh` PreToolUse hook the instant the operator drops a hard-stop\n   steer file (`inbox/STEER.HARD.*.md` or `inbox/*hard-steer*.md`) while you are\n   mid-iteration. It surfaces as a blocked tool call whose reason is:\n\n   ```\n   [STEER INTERRUPT] The operator dropped a hard-stop steer mid-iteration:\n     inbox/\u003cfile\u003e.md\n\n   --- steer contents (first 500 chars) ---\n   \u003cverbatim steer text\u003e\n   --- end steer ---\n\n   ACT NOW, do not finish the current task first:\n     1. Commit your in-flight work with a \"(wip - interrupted by operator steer at iter N)\"\n        annotation and leave its checkbox [ ].\n     2. Then carry out the steer above as your next action.\n   ```\n\n   Why it exists: `ralph-inbox-fold.sh` folds new inbox files only BETWEEN iterations,\n   so a steer dropped mid-flight was invisible for ~18-25 min (session 20260521-083736 ran\n   ~22 min on the wrong task after a 11:36 hard-steer). The hook closes that gap to one turn.\n\n   What you must do when you see this block:\n   - STOP the current task immediately. Do not argue with the block or retry the same tool\n     call hoping it clears — it fires once per steer file and will not re-block.\n   - Commit whatever you have with the `(wip - interrupted by operator steer at iter N)`\n     suffix; leave the in-flight task's checkbox `[ ]`.\n   - Carry out the steer's instruction as your next action. If the steer says \"stop\", stop.\n\n0e. SKILL ROUTING — two modes: INVOKE (run a skill) or LOAD (read an agent instruction for context).\n\n    **MODE 1 — INVOKE A SKILL** (task contains `[SKILL:name]` or `[SKILL:name args]`):\n    Use the Skill tool directly. Do NOT implement the task yourself — the skill IS the implementation.\n    ```\n    Skill(skill=\"name\", args=\"args\")\n    ```\n    After the skill completes, read its output to decide whether to mark the task `[x]` (all work done)\n    or leave it `[ ]` (more iterations needed — skill will say so). Re-queue logic lives in the skill.\n\n    **You may also invoke skills proactively** — without an explicit `[SKILL:]` tag — whenever a task\n    clearly maps to a named skill from the available-skills list in your system prompt. Use judgment:\n    if the task description is \"do X end-to-end\" and a skill named X exists, invoke it.\n    Skills are first-class tools. Use them freely.\n\n    **MODE 2 — LOAD CONTEXT** (task matches a tag/keyword — Read the agent instruction file):\n    Batch with other step-0 reads in the SAME parallel turn. Skip if no tag matches.\n\n    | Task tag or keyword | Agent instruction file to Read |\n    |---------------------|-------------------------------|\n    | ANY task (always) — Read ALL THREE in one parallel turn | `.agent_instructions/environments.md` + `.agent_instructions/sandbox-dev-env.md` + `.agent_instructions/pr-to-sandbox.md` |\n    | `[GRAFANA]`, \"dashboard\", \"panel\", \"metrics\", observability | `.agent_instructions/grafana-verify.md` AND `.agent_instructions/grafana-dashboard.md` |\n    | `[UX]`, `[UI]`, `[FRONTEND]`, \"htmx\", \"template\", \"page\" | `.agent_instructions/frontend-design.md` |\n    | `[E2E]`, `[BROWSER]`, \"hurl\", \"Chrome MCP\" | `.agent_instructions/e2e-verify.md` |\n    | `[RESEARCH]`, `[SPIKE]`, `[REFINE]` | `.agent_instructions/research-methodology.md` |\n    | `[INFRA-DECISION]` (load-bearing infra/security/release choice) | `specs/adr/TEMPLATE.md` — copy to `specs/adr/NNNN-slug.md` and fill in. Run `scripts/ralph-adr-update-index.sh` after writing. |\n    | `[INFRA]`, or task touches `infra/aws/*.go`, `Pulumi.sandbox.yaml`, SSM params, DNS records, cloud-init, security groups, ECR, S3 buckets | `.agent_instructions/infra-release.md` — MUST read. Never `pulumi up` locally, never `aws ssm put-parameter` to create resources — write Pulumi Go code and let CI apply. Preview before push. |\n    | `[CI-FIX]` (this iteration is a CI-FIX retry — `RALPH_CI_FIX_RETRIES \u003e 0`) | `.agent_instructions/ci-triage.md` — MUST read. Replaces \"look at the error and fix it\" with structured multi-cause classification + local-reproduce-before-commit. |\n    | `[OIDC]`, `[AUTH]`, \"login flow\", \"Keycloak browser\", \"session verify\" | `.agent_instructions/oidc-browser-verify.md` — 7-step Chrome MCP login flow with Keycloak form selectors. |\n    | `AUTH.*` task ID, or \"keycloak\", \"user_directory\", \"realm role\", \"user management\" | `.agent_instructions/recipes/keycloak-admin-api.md` — admin token acquisition, User CRUD, role assignment, invitation flow, `UserDirectory` port + `KeycloakAdminClient` adapter. |\n    | `oidc`, `keycloak_provider`, `feedback_bearer`, `oidc_login`, path under `internal/adapters/secondary/auth/`, OR an OIDC-shaped failure in `LAST_ITERATION.md` (302 to `/dev/impersonate/users`, \"OIDC login handler init failed\", `/dev/feedback` 503 \"auth service unavailable\") | grep `ralph-logs/KNOWLEDGE.md` for the `[auth] /org/* redirects to /dev/impersonate/users → OIDC handler init silently failed at boot` entry AND read the `RG.RECIPE.OIDC-SPLIT-URL` task body in `IMPLEMENTATION_PLAN.md`. Loads the split-URL root cause (`compose_admin.go` devLoginRedirect ← `cfg.LoginUserGET == nil` ← OIDC init WARN) + fix (`oidc.InsecureIssuerURLContext(ctx, cfg.ExternalURL)` wrap) up-front; closes diagnosis in \u003c2 turns instead of 8. |\n    | `ARCH.QB.VIEW.*` task ID, or \"migrate view slice to QueryBus\", \"extract ReadModel port\" | `.agent_instructions/recipes/arch-qb-view-migration.md` — 5-touch shape (query/handler/postgres_store/module/init + adapter swap), JSONB-in-adapter rule, and the 3 in-the-same-commit fitness updates (knownViolations removal, sliceMinimalStructureAllowlist, sliceFanOutExemptions). |\n    | \"booking\", \"reserve\", \"slot\", \"decider\", or path under `internal/dcb/`, `internal/slices/dcb_*/` | `.agent_instructions/booking-perf.md` |\n    | path under `internal/slices/`, \"command_handler\", \"projector\", \"view\", or new table in `academy.*` | `.agent_instructions/cqrs-posture.md` |\n\n    If multiple match (e.g., `[UI]` + `[E2E]`), Read both in ONE parallel turn.\n    Do NOT re-read agent instruction files on subsequent turns — one Read at the start is enough.\n\n1. Your task is to implement that ONE task. Implement first, test once at the end.\n   Search the codebase before writing new code — don't duplicate existing implementations.\n   If the task needs helper types or interfaces from other packages, create them.\n   Implement FULL functionality — no placeholders, no stubs, no TODOs, no \"// TODO: implement later\".\n\n   SEARCH DELEGATION — for any codebase investigation whose expected output spans\n   \u003e3 files or \u003e50 lines (e.g. \"find all callers of X\", \"which tests reference Y\",\n   \"show me every slice that imports Z\"), delegate to the `ralph-searcher` sub-agent\n   via the Task tool. The sub-agent runs on Haiku and its Grep/Read output stays in\n   its own context, keeping the main loop's output_tokens lean. Direct Grep/Read are\n   fine for ≤3-file spot checks; don't round-trip a single-file lookup through a\n   sub-agent. Context Efficiency / Avg Subagent Calls tracks adoption.\n\n   CREATING A NEW SLICE? Use the generator — do NOT hand-write boilerplate:\n   ```\n   make new-slice NAME=cancel_booking KIND=command AGGREGATE=booking\n   make new-slice NAME=view_wallet_balance KIND=view AGGREGATE=wallet\n   make new-slice NAME=expire_stale_bookings KIND=automation AGGREGATE=booking\n   make new-slice NAME=notify_booking_denied KIND=translation EVENT=BookingDenied\n   ```\n   Four slice types per Event Modeling (see `specs/domain/DOMAIN_MODEL.md` § Slice Types):\n   KIND=command (state change): init.go, command_handler.go — user action → events.\n   KIND=view (read model): init.go, query.go, query_handler.go, http_handler.go — events → projection → query.\n   KIND=automation: init.go, processor.go — todo-list view → processor → command (no saga).\n   KIND=translation: init.go, translator.go, translator_test.go — events → external system (email, payment).\n   Then edit the generated files to add domain-specific logic. Do NOT hand-write these files.\n   Optional flags: `EVENT=BookingDenied` (for translation), `ROUTE=\"/admin/things\"` (view HTTP handler).\n\n   DCB SLICES — use the decider pattern (per-command handlers, no standard CommandHandler type):\n   - DCB command slices (e.g., `dcb_slot_booking`) use per-command handlers like `ReserveSlotHandler`\n     instead of a single `CommandHandler`. The fitness test `TestEveryCommand_HasHandler` already\n     supports this via the `perCmdHandler` fallback (event_model_test.go:270-283).\n   - If the DCB slice has `command_handler.go`, add it to the auth check allowlist in\n     `tests/fitness/auth_check_test.go` (if it doesn't call `GetAuthenticationContextFromContext`).\n   - If the DCB view slice defines queries inline (no separate `query.go`), add it to\n     `noSeparateQueryFile` in `tests/fitness/cqrs_rules_test.go`.\n   - Fitness tests run in CI only (see DILIGENCE RULES below). After creating a DCB slice,\n     verify by reading the fitness-test allowlist and confirming your slice is listed — do\n     NOT run the fitness suite locally (it's slow and the gate is disabled by design).\n\n   IMPLEMENT-THEN-TEST — do NOT run tests mid-implementation:\n   ```\n   1. IMPLEMENT: Write ALL production code for the task. Get it compiling.\n      Do NOT run tests until the implementation is complete.\n   2. WRITE TESTS: Write tests for the task's expected behavior.\n      Capture WHY each test exists in a comment — future iterations have no prior context.\n      Name tests `Test{Behavior}_{ExpectedOutcome}` — the name explains WHY the test exists.\n   3. VERIFY: Run `./scripts/ralph-build.sh \u0026\u0026 ./scripts/ralph-vet.sh \u0026\u0026 ./scripts/ralph-lint.sh \u0026\u0026 ./scripts/ralph-test.sh` as ONE command (single turn).\n      `ralph-vet.sh` catches test-only compile errors that `ralph-build.sh` misses (it skips `_test.go` files).\n      Do NOT split build/vet/lint/test into separate turns. Do NOT run `make pre-commit` — it is slow and redundant.\n   3b. INTEGRATION SMOKE: If your diff touches `internal/slices/\u003cX\u003e/` and that slice has\n       integration tests, run `make integration-fast SLICE=\u003cX\u003e` AFTER step 3 passes.\n       Skip if no tests are found (script prints \"NO INTEGRATION TESTS found\").\n       This catches SQL typos and cross-slice bugs locally in \u003c90s instead of waiting\n       10 min for CI. Do NOT run `make test-integration` (full suite, slow).\n   3c. FITNESS MICROSET: `./scripts/ralph-fitness-microset.sh` — fast fitness checks: file line\n       cap (500), function line cap (80), dead code allowlist. Run this after any commit that adds\n       or modifies Go files or web/templates. If it fails, fix before the iteration ends.\n       DO NOT run the full fitness suite (`ralph-fitness.sh`) locally — CI-only by design.\n   4. FIX: If tests fail, fix and re-run. But do NOT loop more than 3 times — see HARD LIMITS.\n   ```\n\n   Each test run costs ~2 turns (run + read output). 5 test runs mid-implementation = 10 wasted turns = 500K cached tokens.\n   One test run at the end = 2 turns. The math is clear: implement first, test once.\n\n   TURN-BUDGET GUARD — if you have used 80+ turns, something is wrong:\n   - At 80 turns: STOP exploring. Commit what you have, even if incomplete.\n     Mark the task `[-]` (partial) with a note explaining what's left.\n   - At 100 turns: You are stalling. Commit immediately. Do NOT start new files.\n   - This session had iterations at 1124, 2686, and 4741 turns — all were stalls\n     that produced work achievable in 40 turns. The cost of a stall ($15–35)\n     dwarfs the cost of a partial commit ($2).\n\n   DON'T: Run tests mid-implementation, mock for isolation, or test aggregate internals directly.\n   DO: Implement all code first, write tests, run once through MessageBus boundary.\n   Slice rules (types, independence, isolation) are in DOMAIN_MODEL.md (step 0c).\n   Testing rules are in testing.md (step 0e). Do NOT duplicate them here.\n\n   LONG-RUNNING COMMANDS — bench, replay, and load scripts can take \u003e5 minutes:\n   - DON'T run multi-minute bench scripts inline. Specifically: `scripts/japan-range-bench*.sh`,\n     `scripts/hills-*.sh` full runs, `scripts/ralph-bench.sh` against the full suite, or any\n     `make bench` / `make load-test` invocation without size flags. Iter 14 of session\n     20260517-202253 ran `japan-range-bench-growth.sh --concurrency=10 --cells=2` inline;\n     the bench ran 40+ minutes, the iteration ended while it was still running, the loop\n     blocked waiting for the background process, and the human had to Ctrl+C and restart.\n   - DO use the smoke-test variant for in-iteration verification: smallest concurrency\n     (`--concurrency=1`) and smallest cell count (`--cells=2`), or whatever the script's\n     `--help` advertises as the minimum. Example: `timeout 120s bash scripts/japan-range-bench.sh --cells=2 --concurrency=1`.\n   - DO wrap every long-running shell invocation in `timeout 300s \u003ccmd\u003e` when you must run\n     it inline. The `timeout` exit code (124) is recoverable; a hung iteration is not.\n   - DO offload genuine long runs (\u003e5 min) to a `[BG-POLL \u003csentinel\u003e]` follow-up task per\n     step 6c — launch the bench detached with `nohup`, drop a sentinel on completion, and\n     tag the aggregation task with the sentinel path. The picker skips the follow-up until\n     the sentinel appears, so polling iterations cost $0.\n\n   DILIGENCE RULES — violating any of these means the task is NOT done:\n   - Fitness/architecture tests (`tests/fitness/`) run in CI only. Do NOT run `ralph-fitness.sh`\n     locally and do NOT re-enable the fitness gate in loop.sh — it is disabled by design.\n     Being addressed in S.93 — until the fast path ships (S.93.1–S.93.4), fitness remains CI-only.\n   - FIX ROOT CAUSES, not symptoms. No `// nolint`, `|| true`, error suppression, or skip logic.\n   - Unrelated bugs: fix AND add a KNOWLEDGE.md entry.\n   - No `InMemory*` stores in non-test code — use Postgres-backed implementations.\n   - No `fmt.Printf`/`log.Printf` debug statements — use proper logger.\n   - **CHROME DEVTOOLS MCP IS NOT OPTIONAL** for any task tagged `[GRAFANA]`,\n     `[UI]`, `[UX]`, `[FRONTEND]`, `[E2E]`, `[BROWSER]`, `[FEEDBACK]`, or any task\n     whose verify step mentions a URL (dashboard panel, admin page, `/d/...` path).\n     curl proves the endpoint responded with 200; only a screenshot proves the\n     page RENDERS and the DATA appears. Minimum per task:\n       1. `mcp__chrome-devtools__navigate_page` to the target URL\n       2. `mcp__chrome-devtools__take_screenshot` saved under the iteration dir\n          (e.g. `ralph-logs/sessions/$SESSION_ID/iteration-$ITERATION-$TASK/\u003cname\u003e.png`)\n       3. `mcp__chrome-devtools__list_console_messages` — zero errors (or\n          explain why each one is pre-existing in the iteration commit)\n       4. For Grafana: visit with `?from=now-1h\u0026to=now`, confirm panels show\n          real data; if empty, generate traffic, wait 30s, re-screenshot\n     No screenshot artifact in the iteration dir = task NOT done; commit with\n     `(wip)` suffix and leave the checkbox `[ ]`. This applies even if you\n     verified via curl/CLI — the screenshot is the non-negotiable artifact\n     for surfaces humans can actually see.\n   MIGRATION GOLDEN PATH — ALWAYS create a file under `migrations/`, then `make migrate`.\n   NEVER apply migrations via `docker compose exec postgres psql \u003c migrations/*.sql` — it\n   bypasses goose version tracking and causes duplicate `goose_db_version` rows. If `make\n   migrate` fails (e.g., TLS cert issue), fix the underlying issue rather than bypassing goose.\n   Verify: after `make migrate`, `SELECT MAX(version_id) FROM goose_db_version;` matches your file.\n   MIGRATION SCHEMA-DUMP RULE — after creating or modifying ANY file in `migrations/`:\n   1. `docker compose build migrate \u0026\u0026 make migrate` (apply the migration)\n   2. `make schema-dump` (regenerate `schema/academy.sql` with updated migrations hash)\n   3. `git add migrations/\u003cnew-file\u003e.sql schema/academy.sql` (stage BOTH files)\n   CI runs `scripts/check-schema-dump.sh` which hashes ALL tracked files in `migrations/`\n   and compares against the hash in `schema/academy.sql` header. If the migration file is\n   untracked or `schema/academy.sql` is stale, CI fails with \"STALE: schema/academy.sql is\n   out of date\". `make pre-commit` does NOT run this check — it only fires in CI.\n   MIGRATION CONSTRAINT RULE — S.46.1 broke CI because test cleanup wasn't updated:\n   DON'T: Add a UNIQUE, CHECK, or EXCLUSION constraint without checking test fixtures.\n   DO: `grep -rn 'INSERT INTO academy.\u003ctable\u003e' tests/` → update cleanup/teardown in same commit.\n   Why: constraints make previously-valid test data invalid. Tests that seed rows without\n   cleaning up will fail with constraint violations, but only in CI (local may pass by luck).\n   - Before committing: `grep -rn 'TODO\\|FIXME\\|HACK\\|XXX' \u003cchanged files\u003e` — fix any found.\n   - After committing: `git diff HEAD~1` — would you approve this PR?\n   - Max 50 lines/function, 300 lines/file. Never swallow errors. Every public function needs a test.\n   - Read per-directory CLAUDE.md files for package-specific rules (engine, admin, bootstrap).\n   - SCOPE CHECK: if the task touches \u003e5 files, STOP and plan subtasks first. Commit each\n     subtask separately. A 900s timeout with no commit = wasted iteration.\n   UI SCOPE RULE — any task touching `web/templates/` or `htmx/` MUST touch ≤5 production\n   files. If exceeded, split by layer: (a) data/handler, (b) template/route,\n   (c) interactivity/verify. Each sub-task stays within the 5-file limit and commits\n   independently. RULES.UI.1 stalled at 6 AST types in one task; splitting into 4 atoms\n   produced 4 clean commits at 25–29 turns each.\n   WIDE TASK RULE — \"add X to all Y slices\" tasks are 2-3x more expensive than focused ones:\n   DON'T: Create a single task like \"add org-scoping to all view query handlers.\"\n   DO: Pre-split into one sub-task per slice: S.N.a (resources), S.N.b (events), etc.\n   Each sub-task reads 2 files and writes 2 files instead of 10+10. If you discover a wide\n   task during planning, split it BEFORE adding to the plan. Existing example: ISO.2–ISO.7.\n\n   NO BACKWARDS COMPATIBILITY — pre-production, vertical slices, OCP. When a domain\n   event changes: change the event type/shape; grep affected slices (`rg 'OldEventName'\n   internal/`); delete + rescaffold with `make new-slice`; port logic against the new\n   event. No dual-write, no deprecation window, no historic-rows fitness tests, no\n   aliasing old→new event types in the registry. Event-store rows from before the\n   change do not need to replay — `make migrate` rebuilds projections from the current\n   event shape. If a projector needs both old + new shapes, you are doing it wrong —\n   rebuild the projector. Applies to domain events, command names, aggregate names,\n   view table names, and public slice APIs.\n\n   GRAFANA DASHBOARD TASKS — see `.agent_instructions/grafana-verify.md` (loaded via skill routing 0e).\n   Key rule: fix ONE panel at a time, validate with `scripts/validate-dashboards.py`, never rewrite from scratch.\n\n   BUG-FIX WORKFLOW — required for any task tagged `[BUG]` or any fix to existing behavior:\n   - FIRST write a test that fails because of the bug. The test name must describe the\n     boundary condition or invariant being violated (e.g., `TestBooking_AtWindowClose_Denied`).\n   - Stage the failing test locally (`git add -p` the test file).\n   - THEN apply the production fix and confirm the test now passes.\n   - Do not skip this step even if the fix looks obvious — the test proves the bug existed\n     and prevents regression. A fix without a failing test is indistinguishable from a guess.\n   - If you cannot reproduce the bug with a test, document WHY in the commit message\n     (e.g., \"race condition only under load\", \"requires external service state\").\n\n   UI BUG VERIFICATION — for `[UX]`/`[UI]` tasks or htmx/ changes:\n   See `.agent_instructions/frontend-design.md` (loaded via skill routing 0e).\n   Key rule: reproduce the ACTUAL USER FLOW (click, fill, submit), screenshot to iteration dir.\n\n   [VERIFY-SANDBOX] PRE-FLIGHT — before invoking the `verify-flow` skill (see S.273):\n   - The skill reads its flow definitions from `fixtures/verify-flow/flows.yaml`.\n     If the flow name your task references is NOT present there, the skill cannot run\n     regardless of the slice's wiring state. ALWAYS run\n     `grep -E \"^[[:space:]]*\u003cflow-name\u003e:\" fixtures/verify-flow/flows.yaml` FIRST.\n   - If the flow is missing, do NOT invoke the skill. Either (a) the slice's HTTP\n     route + handler doesn't exist yet (mark `[BLOCKED:DEPS \u003cmissing-task-id\u003e]`),\n     or (b) the flow YAML itself needs an entry (file an inbox task to add it and\n     mark this one `[BLOCKED:NO-FLOW]`). Session 20260526-144918 wasted iters 165 +\n     168 on TREE.MOVE.3.B precisely because this preflight was skipped.\n   - LOGIN-REACHABILITY: for a flow with `login != none`, also confirm the role can\n     actually REACH the flow's `url:` before paying the ~$1 + 6-turn Chrome-MCP\n     drive. Two cheap probes (the skill's Phase 0b.1 runs them automatically):\n     (1) one SSM `SELECT` for the grant the route requires —\n     `org_admins_view` for `/org/*` + `/admin/*`, `players_view.home_org_id` for\n     `/play/*` + `/org/tee-sheet` — keyed by the role's email from the CLAUDE.md\n     Dev Actors table; (2) one `fetch(url, {redirect:'manual'})` to catch an\n     unmounted route or an off-URL server-side redirect (a 3xx to the auth proxy is\n     EXPECTED and fine). If the grant row is missing or the route 404s/redirects\n     off-target, the skill fails fast with \"flow's login can't reach url on env\"\n     (`skip_reason: login-cannot-reach-url`) in ≤2 turns instead of driving the\n     browser. This is the DEMO.4.B iter-13 failure: `admin@test.com` lacked the\n     `OrgAdminGranted` seed on sandbox, so `/org/setup/structure` bounced to\n     `/dev/impersonate/users` and every browser step failed downstream.\n\n   BOUNDARY CASES — for any task touching ranges, intervals, dates, or numeric thresholds:\n   - Document interval inclusivity `[a,b)` or `[a,b]` at call sites.\n   - Test boundary values: at-start, at-end, zero-length, one-before, one-after.\n   - Test 0, 1, max, max+1, negative for numeric logic.\n   - Verify both sides of an interval exchange agree on open/closed.\n   - Silent parse/unmarshal failures are bugs — return errors, don't return false.\n   - Name tests explicitly: `Test{Thing}_AtWindowClose_Succeeds`.\n\n   REMOVAL TASKS — completeness checklist:\n   When the task is \"remove X\" / \"delete X\" / \"deprecate X\" / \"decommission X\", you MUST\n   audit and clean up ALL of these locations before marking the task done:\n\n   1. Code call sites: `grep -r '\u003cX\u003e' internal/ pkg/ cmd/`\n   2. Imports / go.mod: `grep '\u003cX\u003e' go.mod go.sum \u0026\u0026 go mod tidy`\n   3. Docker compose services: `grep -r '\u003cX\u003e' deploy/`\n   4. Env vars / secrets: `grep -r '\u003cX\u003e_' .env* deploy/ infra/`\n   5. CI/CD references: `grep -r '\u003cX\u003e' .github/`\n   6. Documentation: `grep -r '\u003cX\u003e' docs/ specs/ CLAUDE.md README.md`\n   7. Grafana dashboards: `grep -r '\u003cX\u003e' deploy/sandbox/grafana/`\n   8. Inbox / plan references: `grep -r '\u003cX\u003e' inbox/ archive/inbox/ IMPLEMENTATION_PLAN.md`\n\n   In the commit message, list which categories had matches and were cleaned. If a\n   category had no matches, omit it. If you intentionally left some references (e.g.\n   archive/ history), state why.\n\n## Definition of Done (by task type)\n\n   **Backend logic:** Tests pass + exercise via UI/API + screenshot Domain Observability\n   dashboard (`localhost:3002/d/domain-observability/`) AND Tempo Traces\n   (`localhost:3002/d/tempo-traces/`). Command must appear in both.\n\n   **HTTP handlers:** Tests + Hurl E2E + verify auth rejection (401) + screenshot HTTP RED\n   dashboard (`localhost:3002/d/http-red-method/`) AND Tempo Traces.\n\n   **Admin UI:** CI green + Chrome MCP screenshot + submit forms + verify data renders.\n   Check Domain Observability + Tempo Traces for triggered commands.\n\n   **Infrastructure / CI/CD:** CI green + document what was verified.\n   For CI changes: push, wait for `gh run list`, verify `conclusion == success`.\n\n   **Observability:** Verify via curl/CLI first, then Grafana screenshot showing real data.\n   Generate traffic if panels show \"No data\", wait 15-30s, re-check.\n\n   **Batch changes:** Verify each affected page/endpoint individually — not just one.\n\n   ## E2E flow verification (for domain-affecting changes)\n\n   See `.agent_instructions/e2e-verify.md` (loaded via skill routing 0e).\n   Key rule: Hurl scripts first, observability second, Chrome MCP last (1-2 screenshots only).\n   Skip for: engine logic, infrastructure, CI/CD, docs, unrelated admin pages.\n\n2. See CLAUDE.md for Docker commands, test scripts, port mapping, and Chrome MCP usage.\n   Avoid `make pre-commit` (slow) — use `ralph-build.sh \u0026\u0026 ralph-vet.sh \u0026\u0026 ralph-lint.sh \u0026\u0026 ralph-test.sh`.\n\n3. Verify in running app — MANDATORY, never skip:\n   - Screenshot via Chrome MCP. Save to iteration dir.\n   - API: verify endpoint responds (not 404). UI: screenshot with real data + proper CSS.\n   - Grafana: navigate to dashboard, set last 15 min, screenshot. Generate traffic if \"No data\".\n   - INTERACT LIKE A USER: click, fill forms, submit. If click fails → BUG, fix root cause.\n   - NOT DONE if: placeholder, 404, in-memory-only, unstyled, broken clicks, empty Grafana panels.\n\n4. CHECKPOINT COMMIT — commit early and often, not just at the end.\n   After tests pass, commit IMMEDIATELY. Do not do more work after tests pass.\n   If you have been working for 50+ turns, commit what you have NOW even if not fully done.\n   An incomplete commit is better than losing all work to a context overflow.\n\n   **FB task — stuck guard:** If this is a `FB.*` task and you have used \u003e30 turns without writing any code yet, STOP reading and commit a `(wip)` note in `LAST_ITERATION.md` that lists: (a) the files you explored, (b) the concrete blocker (e.g., \"prerequisite command slice not yet wired\", \"ambiguous task description\"). This gives the next iteration a head start instead of repeating the same reads.\n\n   **Tidy First** (see step 0f): never mix refactoring + feature in one commit.\n   If both needed: `refactor:` commit first, then `feat:` commit.\n\n   **4a. BEFORE committing**, batch: write LAST_ITERATION.md + mark task done + run ralph-diff.sh in ONE turn.\n   Write `ralph-logs/LAST_ITERATION.md` with:\n   - `## Steps` — numbered list of what you did (search, create, wire, test, screenshot, commit)\n   - `## Could Still Be Wrong` — list 3 ways your change could be wrong. For EACH entry,\n     you MUST cite concrete evidence inline on the same bullet, in one of these forms:\n       - `Evidence: TestFooBar_ReturnsDenied PASS` (exact test name + pass, ran this iteration)\n       - `Evidence: screenshot ralph-logs/sessions/\u003csession\u003e/iteration-\u003cN\u003e-\u003cTASK\u003e/\u003cfile\u003e.png`\n       - `Evidence: impossible because \u003cspecific reason tied to code/type/constraint\u003e` (explain\n         why the failure mode cannot occur — compile-time check, DB constraint, etc.)\n     Vague hand-waves (\"tests cover this\", \"we validated it\", \"should be fine\") are NOT\n     evidence and do NOT satisfy the rule. If you cannot produce evidence for all three\n     claims, you may NOT flip `[ ]` → `[x]` in step 4b; commit `(wip)` and leave the\n     checkbox unchecked (see gate in step 4b).\n   - `## Friction` — one-line entries with tags: NAVIGATION, BOILERPLATE, TOOLING, WIRING,\n     TESTING, MIGRATION, DEVEX, CI, DOCS, PATTERN, WISH. Feeds into /retro aggregation.\n     **META.1.d — primary failure class.** Each bullet is classified into an\n     AgentBench-style taxonomy by `scripts/ralph-extract-friction-class.py` and stored\n     in `academy.ralph_metrics.friction_class` (plurality vote) plus\n     `friction_class_counts` (JSONB distribution). Format: `- CATEGORY: description`\n     uses the default class-mapping below; `- CATEGORY[CLASS]: description` overrides\n     the default when the category is ambiguous. Classes:\n       - `TOOL_OUTPUT` — tool returned wrong/malformed/truncated output\n       - `LONG_HORIZON` — task scope too large to finish in one iter (boilerplate, context budget)\n       - `INSTRUCTION_AMBIGUOUS` — prompt/spec/docs unclear or contradictory\n       - `WIRING` — DI/composition-root/bootstrap wiring bug\n       - `ENV_DRIFT` — container/cache/config drift from expected state (docker, migrate image, keycloak)\n       - `KNOWLEDGE_GAP` — didn't know how part of the codebase worked (had to grep/explore)\n     Default category→class mapping (used when no `[CLASS]` override): NAVIGATION/PATTERN→KNOWLEDGE_GAP,\n     BOILERPLATE→LONG_HORIZON, TOOLING/MIGRATION/DEVEX/CI→ENV_DRIFT, WIRING→WIRING,\n     TESTING→TOOL_OUTPUT, DOCS/WISH→INSTRUCTION_AMBIGUOUS.\n     **HARD CAP: top-K=5 entries max per iteration.** Only record friction you actually hit\n     this iteration; prioritize items that (a) cost ≥1 turn, (b) are likely to recur, or\n     (c) have a concrete fix you can name. Drop anything that doesn't meet those bars —\n     speculative or cosmetic nits waste the next iteration's attention. Items that recur\n     across iterations are auto-promoted to `KNOWLEDGE.md` by the META.1.c recurrence\n     scanner (once landed); discarded entries are NOT lost forever, they just need to\n     recur to earn their way in. Do NOT pad to 5 — 0, 1, or 2 entries is fine and normal.\n     The post-iteration extractor already applies `head -5`\n     (`scripts/ralph-post-iteration.sh` line ~454); writing more than 5 is wasted tokens\n     because the surplus is silently truncated before reaching the next iteration.\n   - `## Speed Up` — reflect on what slowed you down this iteration and propose ONE concrete\n     improvement. Examples: \"I grepped 8 slices to find who handles BookingApprovedEvent — an\n     event→slice index in CODEBASE.md would save 3 turns\", \"I hand-wrote projector_adapter.go\n     boilerplate — `make new-slice KIND=view` should generate this\". If the improvement is\n     actionable, also add it to IMPLEMENTATION_PLAN.md Discovered Issues as a task:\n     `- [ ] **RG.{N}** [RALPH] {description}`. Use the RG prefix (Ralph Growth) so these\n     self-improvement tasks are distinguishable. Only add if genuinely useful — not every\n     iteration needs one. Skip if nothing slowed you down.\n   This MUST happen before the commit so it is part of the main work, not an afterthought\n   that gets skipped when context runs low.\n\n   **4a.5. SCRATCHPAD — leave a note for next-iteration-Ralph (S.173).**\n   Before committing, append ≤200 tokens (≤800 chars) to\n   `ralph-logs/sessions/$SESSION_ID/SCRATCHPAD.md` capturing:\n   - **Surprises** — files or patterns that caught you off guard this iteration.\n   - **Gotchas** — specific pitfalls you hit and how you recovered.\n   - **Hint** — one line that will save next-iteration-Ralph a turn if it picks a\n     related task.\n\n   Do NOT summarize the task (the commit message and LAST_ITERATION.md handle that).\n   Do NOT include long file paths that are already in the commit diff. Do NOT exceed\n   800 chars per entry — pre-iteration trim keeps the file under 2KB (rolling).\n   Format:\n   ```\n   ## iter N — TASK_ID\n   - surprise/gotcha/hint: one or two lines\n\n   ```\n   Append, never overwrite. Skip entirely if nothing non-obvious came up.\n\n   **4b. Mark the task** as done in IMPLEMENTATION_PLAN.md: change `- [ ]` to `- [x]`.\n   **SELF-VERIFICATION GATE — read before flipping the checkbox:**\n   Re-read the `## Could Still Be Wrong` section you just wrote. For EACH of the 3 claims,\n   confirm an inline `Evidence:` citation (test name + PASS, screenshot path, or\n   impossibility argument — see step 4a). If ANY claim lacks evidence, you MUST:\n     1. Leave the checkbox as `- [ ]`.\n     2. Append ` (wip)` to the task description OR add a continuation sub-task under\n        Discovered Issues noting which claim lacks evidence.\n     3. Use `git commit -m \"feat(scope): summary (wip)\"` — the `(wip)` suffix signals\n        an incomplete iteration so the next run picks it up.\n   A task with unverified claims flipped to `[x]` is a lie to the next iteration and\n   to the human reviewer. The gate exists to prevent that. Sleeper tasks are exempt\n   from the flip rule (they stay `[ ]` forever regardless) but STILL require evidence\n   citations for their `Could Still Be Wrong` entries.\n   If you discover new issues or tasks, add them to the Discovered Issues section.\n   **THIS IS THE ONLY TURN where you edit IMPLEMENTATION_PLAN.md, CLAUDE.md, or KNOWLEDGE.md.**\n   Editing these files earlier busts the prompt cache — every subsequent turn pays full input\n   cost (~$0.50/turn extra). Batch ALL edits to these files into this single final turn.\n\n   **Feedback threads:** Do NOT read or write `feedback/threads/*.json` yourself — the files\n   are 25-63 KB and reading them wastes turns. The feedback context is already in your prompt.\n   If a `[FEEDBACK]` block is present, it may contain MULTIPLE threads. Address ALL of them:\n   - **Action threads** (open, in_progress): fix the issue, update status via curl, commit.\n     Quick wins (typo, missing field, wrong label): fix inline and mark `done`.\n     Bugs needing investigation: mark `in_progress` or `accepted` and add a plan task.\n     Not reproducible or out of scope: mark `rejected` with a brief reason via curl.\n   - **Discussion threads** (in_discussion): ENGAGE IN DISCUSSION. Post a reply via curl.\n   - **Reopened threads** (done/rejected/accepted with a human follow-up): the human posted\n     after you closed the thread. Treat like a new action/discussion: read their message,\n     respond via curl, update status (e.g., back to `in_progress` or `in_discussion`).\n     NEVER ignore a thread where a human was the last to respond.\n   After fixing or addressing a thread, ALWAYS post a thread-specific reply via curl\n   explaining IN DETAIL what you did for that specific thread. Be explicit about the\n   changes — file paths, what was added/removed, why. Do NOT rely on generic commit messages.\n   Use the **Ready-to-run commands** at the bottom of each thread block — they call the\n   reply/status wrappers (scripts/ralph-feedback-reply.sh, scripts/ralph-feedback-status.sh)\n   with the thread ID pre-filled. For a reply, write your reply text to the named file FIRST\n   (Write tool) then run the wrapper — it mints a fresh token and json.dumps the body, so a\n   shell-quoting or invalid-JSON bug is impossible. Do NOT hand-build a curl with an inline\n   JSON body. Do NOT reconstruct the URL or headers from memory — that posts to localhost and\n   the sandbox never sees it.\n\n   **STATUS / REPLY CONSISTENCY (FB-793e).** Before moving to the next thread,\n   verify your reply prose matches the status you are about to set:\n   - If the reply describes a landed change (\"fixed\", \"done\", \"shipped\",\n     \"changed X at Y:line\") → status MUST be `done` (or `rejected` if you refused).\n   - If the reply describes planned/deferred work (\"added a plan task\",\n     \"will land in\", \"tracked as TASK.N\") → status MUST be `accepted`\n     (work scheduled, not started) OR `in_progress` (started, not finished).\n   - If the reply asks a clarifying question or continues discussion →\n     status MUST be `in_discussion`.\n   - Mismatches (reply says \"done\" but status `in_progress`, or reply says\n     \"I'll add a task\" but status `done`) confuse the human reviewer and cause\n     reopened threads next iteration. Re-read each `reply + PATCH status` pair\n     before committing. This check is behavioral — no automated gate runs.\n\n   **4b.5. Principle-sampled pre-commit critique (META.1.e).** After 4a/4b\n   but before the commit, sample the top-3 highest-voted KNOWLEDGE.md\n   principles whose `[category]` tag matches this task and write a\n   one-sentence self-critique against each. Run:\n   ```bash\n   python3 scripts/ralph-sample-principles.py sample \\\n       --task-id \"$TASK_ID\" --append\n   ```\n   This appends (idempotently replaces) a `## Principle Checks` section to\n   `ralph-logs/LAST_ITERATION.md` with one bullet per principle. Edit the\n   file and replace each `_(fill in: ...)_` placeholder with either:\n   - `[ok] \u003cone sentence on why this change respects the principle\u003e` — or\n   - `(trigger) \u003cone sentence on how this change may violate the principle\u003e`\n   Then run:\n   ```bash\n   python3 scripts/ralph-sample-principles.py check\n   ```\n   Exit codes: `0` all ok, `1` section missing or unfilled placeholder,\n   `3` at least one `(trigger)` present. Exit `3` means you MUST tag the\n   commit message `(wip)` per META.1.a and leave the task checkbox `[ ]`.\n   Exit `1` means fix the unfilled bullets before committing. This gate is\n   runtime — the script runs inside the iteration, at commit time, not as\n   a passive post-hoc analysis.\n\n   **4b.5.5. Local-reproduce gate (CI-touching changes).** If this iteration\n   adds or changes a CI step (a `run:` block in `.github/workflows/*.yml`) OR\n   adds/changes a tool the CI runs (gosec, govulncheck, golangci-lint, hurl,\n   pulumi preview, …), you MUST run the equivalent command locally and confirm\n   it passes BEFORE committing. The 60-90s push-and-wait cycle on CI is a debugger\n   you should not be using. See `.agent_instructions/ci-triage.md` step 4 for\n   the local-reproduce table. Skipping this rung is how we shipped 13 gosec\n   findings + a tee-masked pulumi failure on 2026-04-28 and burned 3 CI-FIX\n   retries figuring it out. If the check is genuinely not reproducible locally\n   (e.g. requires runner-only secrets), state so explicitly in the commit message.\n\n   **4b.6. ADR gate (only fires for `[INFRA-DECISION]` tasks).** Before the commit:\n   ```bash\n   scripts/ralph-adr-check.sh \"$TASK_LINE\"\n   ```\n   Exit `0` if the task isn't tagged `[INFRA-DECISION]` OR an ADR was added/modified\n   in this iteration. Exit `1` means the gate fired — copy `specs/adr/TEMPLATE.md`\n   to `specs/adr/NNNN-slug.md`, fill in Context/Decision/Consequences (~1 page),\n   then re-run. The ADR captures the *why* in one searchable place so future\n   iterations don't re-litigate. After writing, also run\n   `scripts/ralph-adr-update-index.sh` so the INDEX picks up the new file.\n\n   **4c. Commit** (ONE LINE — no multi-paragraph messages):\n   ```bash\n   git add internal/ tests/ web/ migrations/ schema/ scripts/ Dockerfile docker-compose*.yml .github/ IMPLEMENTATION_PLAN.md CLAUDE.md ralph-logs/KNOWLEDGE.md ralph-logs/LAST_ITERATION.md specs/adr/\n   git commit -m \"feat(scope): one-line summary\"\n   ```\n\n6. ONE task per iteration. Do not batch. STOP IMMEDIATELY after committing and writing LAST_ITERATION.md.\n   Do NOT respond to background agent completions after you have committed — each response costs ~$2 in cache reads.\n   Do NOT launch background agents for fitness tests or `make pre-commit` — they complete after you're done and waste tokens.\n\n   TURNS BUDGET — two checkpoints:\n   Turn 40 checkpoint: if you haven't started writing production code by turn 40, you are\n   exploring too long. Commit a research note with what you've learned and add a\n   continuation task. The next iteration starts with a warm cache and your notes.\n   Turn 50 checkpoint: if you haven't started writing production code, STOP.\n   You are over-reading or the task needs splitting. Commit what you have (even if partial)\n   and add a continuation task: \"{task} part 2 — {what remains}\".\n   The next iteration picks it up with a warm cache. Reading 50+ turns without coding\n   means either the task scope is wrong or you're exploring without a plan.\n\n   STALL DETECTION — self-check mid-iteration; if any \"Alert\" column fires,\n   change approach. If the different approach doesn't fix it, STOP and add the\n   issue to Discovered Issues.\n\n   | Signal              | Self-check (this session)                   | Target    | Alert → action                   |\n   |---------------------|---------------------------------------------|-----------|----------------------------------|\n   | Same error repeated | Last 2 tool/test errors identical?          | never     | yes → COMPLETELY different path  |\n   | Edit-test cycles    | Consecutive failed test runs on same code   | ≤ 3       | ≥ 5 → step back, rethink         |\n   | Tool calls / minute | Your tool calls ÷ wall-clock minutes so far | ≥ 2.2 TPM | \u003c 1.0 TPM → thrashing, simplify  |\n   | Parallelism         | Turns with ≥2 parallel calls ÷ total turns  | ≥ 0.35    | \u003c 0.20 → batch reads/greps       |\n\n   Full 8-channel framework (flail, cache hit, task latency, cost-per-commit,\n   rework) in `docs/research/ralph-behavior-signals.md` — those are measured\n   across iterations, not self-checkable mid-session.\n\n   HARD LIMITS — commit what you have and stop if ANY of these are reached:\n   - 60 turns — you are near context limit. Commit with \"(partial)\" suffix.\n   - 3 failed test-fix cycles — the approach isn't working. Revert with `git checkout -- .` and add\n     a [RESEARCH] task: \"investigate why {task} failed — {error}\". Move to next task.\n   - Tests still failing after implementation — do NOT mark task as done. Commit with \"(wip)\" suffix.\n   An incomplete commit is infinitely better than lost work from context overflow.\n\n6b. BLOCKED? SOLVE THE ROOT CAUSE FIRST — don't churn the symptom, park, or work around.\n\n   **ROOT-CAUSE-FIRST (operator directive 2026-05-27 — overrides the reflex to park).**\n   When you hit a blocker:\n   1. **Diagnose to the ROOT cause**, not the surface symptom. Ask: \"what is the\n      actual thing that must change for this to work?\" A failing CI run, a denied\n      signup, a parse error — these are symptoms. The root is *why* they fail.\n   2. **If the root is within your power → fix it NOW.** Pivot to the root fix; it\n      outranks the blocked task (you cannot finish the blocked task without it).\n      Don't re-try / re-run / work around the symptom. Fixing the root IS following\n      the dependency chain, not scope creep (see Kind A below).\n   3. **If the root is genuinely human-gated** (a credential, IAM, an external\n      system, a product decision) → **PR-and-ping**: draft everything you can as a\n      PR and ping Gustaf (per the PR-and-ping pattern). Do NOT just park-and-move-on.\n   4. **Still file the inbox task** for tracking (see ALWAYS FILE A TASK below), but\n      ALSO act on the root per (2)/(3) — the task is a record, not a substitute for\n      the fix.\n\n   **Anti-patterns to STOP:**\n   - Symptom churn — re-running CI without fixing *why* it fails (iters 99/102/104\n     re-ran CI on a phantom orphaned flake instead of fixing why CI-FIX mis-fires).\n   - Park-and-move-on without addressing the root (FUNNEL.6 parked \"signup blocked\n     by realm config\" + filed a blocker instead of fixing the realm config).\n   - Band-aid workarounds that leave the root broken.\n   - Closing a task \"blocked, no fix\" when the fix is within reach.\n\n   **Budget:** no more than 1 symptom-retry before pivoting to the root. An in-power\n   blocker gets a root-cause fix in the SAME or NEXT iteration (not a park); a\n   human-gated blocker produces a PR+ping (not a bare blocked task).\n\n   AFTER applying root-cause-first, classify the blocker:\n\n   **Kind A — fixable bug in project code (Academy Go, HTMX templates, migrations,\n   test fixtures, scenario transformers, Hills bundle schema). Do NOT mark blocked.\n   FIX IT.** The root cause of most \"blocked\" hedges is a concrete typed error,\n   parse failure, missing wiring, or contract mismatch sitting one level\n   upstream of ITER_TASK. Fixing it IS part of closing ITER_TASK — you're not\n   scope-creeping, you're following the dependency chain.\n   Protocol:\n   1. Identify the bug with one-line evidence (file:line + the failing\n      symptom — a stack trace, a diff, a failed assertion).\n   2. Fix it. Same iteration, same commit. If the fix is wholly separate\n      from ITER_TASK (touches unrelated code) spawn\n      `HILLS.SIM.FIX.\u003cslug\u003e` (or `\u003cDOMAIN\u003e.FIX.\u003cslug\u003e`) to record\n      what was fixed, then continue.\n   3. Re-run whatever verification was blocked on the bug, until\n      ITER_TASK's own Verify: step passes.\n   4. Commit once, with BOTH the fix and the ITER_TASK deliverable in it.\n      Commit message: `fix(scope): bug + feat(task): deliverable`.\n   Concrete example (SIM.5 iter 34): picked SIM.5, saw org.json parse\n   error because `booking_type` was a number but the Go struct expected a\n   string. That is Kind A. Fix the struct (or add custom unmarshalling),\n   re-run SIM.4 to produce artifacts, THEN run SIM.5 against them. Do\n   NOT mark SIM.5 blocked — the bug is in your codebase, you own it,\n   fix it. \"Let me document the blocked state\" is the wrong reflex.\n\n   **Kind C — write-permission denial.** If an `Edit` or `Write` call is\n   permission-denied for a path, do NOT ask the loop to approve the write —\n   there is no human in the loop. Instead: mark the task `SKIPPED` with\n   reason `BLOCKED_BY_PERMISSION:\u003cpath\u003e`, commit what you have, and STOP.\n   Add a task: \"Add `\u003cpath\u003e` to `.claude/settings.json` Edit/Write allowlist\".\n\n   **Kind B — infrastructure/external wall you genuinely cannot fix from\n   inside an iteration.** THESE are the cases that legitimately warrant\n   \"add a task, commit, stop\":\n   - Write-permission denied → Kind C above (do NOT ask for approval)\n   - MCP tool not available → add task: \"Fix MCP server startup for {tool}\"\n   - Container needs restart but you can't → add task: \"Restart container and verify {page}\"\n   - Task is too large for one iteration → add task: \"{task} part 2 — {what remains}\"\n   - Missing infrastructure (make target, migration, npm package) → add task: \"Add {what's missing}\"\n   - Codebase pattern unclear AFTER \u003e20 turns of investigation → add task:\n     \"[RESEARCH] investigate {pattern} and document in ARCHITECTURE.md\"\n   - External credential / secret missing → use the PR-and-ping pattern below\n     (NOT a bare `[GUSTAF]` task — see PR-AND-PING)\n   Add the task to IMPLEMENTATION_PLAN.md under \"Discovered Issues\", commit\n   what you have, and STOP. The next iteration (or a human) will pick it up.\n\n   **PR-AND-PING pattern** (operator directive 2026-05-26 — \"just create PRs and\n   ping me\"). Whenever you hit work that genuinely needs a human (a privileged\n   `pulumi up`, a secret value, an approval, an external SaaS action), draft\n   EVERYTHING you can as a PR + ping Gustaf — never a bare blocked `[GUSTAF]`\n   task that just sits and waits. The wrapper:\n\n   ```bash\n   # 1. Stage your draft changes on a fresh branch (NOT the feature branch).\n   git checkout -b ralph/\u003cshort-slug\u003e\n   git add \u003cfiles\u003e \u0026\u0026 git commit -m \"draft(\u003cscope\u003e): \u003cone-line\u003e\"\n\n   # 2. Write the SUMMARY and PRIVILEGED COMMAND to two files (DO NOT bake\n   #    secret values into either file — describe the command, let Gustaf\n   #    supply the value from his own credential store).\n   cat \u003e /tmp/summary.md \u003c\u003c'EOF'\n   Adds \u003cthing\u003e. Ralph cannot run \u003cprivileged step\u003e because \u003creason\u003e.\n   EOF\n   cat \u003e /tmp/priv-cmd.sh \u003c\u003c'EOF'\n   pulumi up --stack sandbox --yes\n   EOF\n\n   # 3. Run the wrapper — creates branch + PR + reviewer + inbox ping, AND blocks\n   #    the gated task in ralph_db so the picker stops re-handing it (--task-id).\n   scripts/ralph-pr-and-ping.sh \\\n     --title \"Apply pulumi diff for RG.X\" \\\n     --branch \"ralph/rg-x-apply\" \\\n     --summary /tmp/summary.md \\\n     --privileged-cmd-file /tmp/priv-cmd.sh \\\n     --task-id \"RG.X\"\n   ```\n\n   ALWAYS pass `--task-id \u003cid\u003e` when the ping is gated on a specific backlog task:\n   the wrapper then `POST`s `/dev/ralph/tasks/\u003cid\u003e/block` after the PR lands, so the\n   ranking picker skips it instead of burning an iteration on it every round (the\n   iter 118+121 `RALPH.CP.S7.b` churn — code done, apply vault-gated, left `[ ]`).\n   The block is best-effort (a failed POST never fails the PR+ping). Un-block by\n   flipping the task `[x]` (or `POST .../unblock`) once the human runs the step.\n\n   The wrapper assigns `gustaf-ag47` as reviewer by default (override via\n   `RALPH_REVIEWER_HANDLE` env). The reviewer handle is the one used in\n   `.github/CODEOWNERS` for human-review paths — confirmed via `gh api\n   /orgs/sweetspotio/members`. The wrapper ALSO drops a `[GUSTAF]` inbox note\n   linking the PR # so the plan picks it up next iteration.\n\n   **NEVER bake secret values into a PR body.** Describe the command Gustaf\n   runs; let him supply the secret from his own store. A PR with a secret in\n   the body is a leak, not a ping.\n\n   After running the wrapper, continue working on OTHER tasks — do NOT block\n   the loop waiting for Gustaf. The next iteration picks up the inbox note and\n   the plan tracks the PR; merge happens out-of-band.\n\n   **Heuristic to decide A vs B:** ask \"if I had 20 more turns, could I make\n   this work?\" — if yes, it's Kind A; fix it. If no, it's Kind B; log and\n   stop. Default to Kind A when in doubt — the cost of a wrong \"fix it\"\n   judgment is one extra commit; the cost of a wrong \"mark blocked\"\n   judgment is a whole iteration lost to a task that never lands.\n\n   **ALWAYS FILE A TASK (THIS IS NOT OPTIONAL):**\n   Whenever you encounter ANY of the following — even if it does not block\n   your current task — file an inbox task BEFORE you exit. Drop a file in\n   `inbox/YYYY-MM-DD-HHMM-\u003cshort-slug\u003e.md` (the next iter folds it into\n   the plan). Issues unlogged become issues forgotten.\n\n   - **Surprising or broken behavior** (something didn't work the way the\n     code/spec/comment said it would). Tag `[BUG]` or `[RESEARCH]`.\n   - **A test failed for a reason orthogonal to your change.** File even\n     if you can't fix it now. Tag `[BUG]` with the test path + failure.\n   - **Dead code, unused config, stale doc, dangling reference.** Tag\n     `[NORMAL]` or `[NICE-TO-HAVE]`.\n   - **Took \u003e2 turns to understand something** that wasn't obvious from\n     code/specs. The next person/iter shouldn't pay that cost. Tag\n     `[REFINE]` to update the relevant doc.\n   - **A script silently swallowed an error** (`|| true`, `2\u003e/dev/null`,\n     missing pipefail). Tag `[BUG]`.\n   - **A migration / config / dependency was missing or wrong** in the\n     dev environment but you worked around it. Tag `[NORMAL]`.\n   - **CI passed but the change is suspicious** (e.g., test count\n     dropped, fitness allowlist grew, gocognit warning suppressed).\n     Tag `[RESEARCH]` to audit.\n\n   **ONE TASK PER FAILURE MODE — NEVER BUNDLE.** When a multi-step\n   process produces N distinct failures, file N separate inbox tasks,\n   not one rescue task with N issues inside. Different priorities,\n   different scopes, parallelizable across iters. Single bundled tasks\n   become single bundled timeouts.\n\n   The bar for filing is intentionally low. If you hesitated for \u003e5\n   seconds wondering \"should I file this?\", file it. Cost of a frivolous\n   inbox task: ~10 lines and 0 follow-up if it's not real. Cost of NOT\n   filing a real issue: it's gone.\n\n6c. TASK TYPES — see `.agent_instructions/research-methodology.md` for detailed workflows (loaded via 0e):\n   - **[RESEARCH]**: investigate, document findings in `docs/research/`, add max 5 sub-tasks with\n     severity tags. Commit: `research(scope): summary`. Do NOT implement code. STOP after commit.\n     RESEARCH tasks produce TWO outputs: (1) a `docs/research/` artifact documenting findings\n     and recommendation, (2) follow-up CODE tasks in IMPLEMENTATION_PLAN.md that implement the\n     decision. A RESEARCH task that produces code instead of a doc is wrong — the next iteration\n     will implement the code tasks. Tasks whose body starts with \"Resolve\", \"Decide\", \"Evaluate\",\n     \"Pick between\", or asks a design question MUST be tagged `[RESEARCH]`.\n   - **[SPIKE]**: throwaway PoC, output decision + tasks if viable. Commit: `spike: {topic}`\n   - **[REFINE]**: improve existing doc/spec, add tasks for gaps. Commit: `refine: {document}`\n   - **[SLEEPER]**: recurring low-priority background work, picked up when no regular tasks remain.\n     Reduced timeout (300s). Output MUST be docs/reports/tasks — NEVER modify `internal/`.\n     **DO NOT mark the sleeper task as `[x]`** — sleepers are recurring and stay `[ ]` forever.\n     Loop.sh tracks last-run via `\u003c!-- ran: timestamp --\u003e` comment and rotates among sleepers.\n     **MANDATORY: For EVERY issue/gap/recommendation in your report, append a new task**\n     to IMPLEMENTATION_PLAN.md with a concrete Verify step.\n     Each task MUST include a severity tag: `[CRITICAL]` (breaks invariants, data loss risk),\n     `[NORMAL]` (tech debt, coupling, should fix), or `[NICE-TO-HAVE]` (cleanup, style).\n     Format: `- [ ] **XX.N** [CRITICAL] Description...`\n     A sleeper that produces 5 findings must produce 5 new tasks. A report without tasks\n     is a failed sleeper — findings that don't become tasks are forgotten within days.\n     Commit: `sleeper(scope): summary`. Max 30 turns.\n   - **[BG-POLL \\\u003csentinel\\\u003e]**: task is SKIPPED by the task picker while the sentinel file\n     doesn't exist. When the file appears, the task becomes pickable and the LLM runs once\n     for aggregation. Use for long-running background processes (bench runs, data imports)\n     where polling wastes $1-5/iter for zero-diff iterations. The bench-launching iteration\n     creates the sentinel on completion: `nohup bash -c './bench.sh \u0026\u0026 touch .bg-poll/my.done' \u0026`.\n     Tag the follow-up task `[BG-POLL .bg-poll/my.done]`. While the sentinel is absent, the\n     task picker skips it and picks other work; if no other work remains, Ralph enters idle mode.\n     Env: `RALPH_BG_POLL_WAIT_S` (default 300s) controls the secondary sleep-guard interval.\n   - All others: implement code as normal.\n\n6a. GITHUB WORKFLOW HEALTH — HIGHEST PRIORITY:\n    If this iteration is `CI-FIX` (ITER_TASK=CI-FIX or CI_CONTEXT_FILE is set), GitHub workflow\n    failures are your ONLY job. Do not start any other work until all failing workflows are resolved.\n    This applies to ALL workflows — not just \"Continuous Integration\":\n\n    | Workflow | How to fix |\n    |---|---|\n    | Continuous Integration | Read CI logs → identify category (Docker build / compile / test / swag / trivy / gosec) → fix root cause |\n    | Trivy Image Scan | Update distroless SHA: `docker pull gcr.io/distroless/static-debian12:nonroot \u0026\u0026 docker inspect --format='{{index .RepoDigests 0}}' gcr.io/distroless/static-debian12:nonroot` → replace line in Dockerfile |\n    | Deploy Sandbox | Check deploy logs via SSM (`docker logs academy-app-1 --tail 50`) → identify what failed |\n    | Infra Drift Detection | Read the issue body → identify drifted resources → fix in `infra/aws/*.go` |\n    | Any other workflow | Read the run logs via `gh run view \u003crun-id\u003e --log-failed` → identify and fix |\n\n    Check failures on BOTH branches:\n    ```bash\n    gh run list --branch main --limit 8 --json workflowName,conclusion,headSha,url | python3 -c \"import sys,json; [print(r['workflowName'],r['conclusion'],r['url']) for r in json.load(sys.stdin) if r['conclusion']=='failure']\"\n    gh run list --branch \"$(git branch --show-current)\" --limit 8 --json workflowName,conclusion,headSha,url | python3 -c \"import sys,json; [print(r['workflowName'],r['conclusion'],r['url']) for r in json.load(sys.stdin) if r['conclusion']=='failure']\"\n    ```\n\n    CI uses `docker-compose.ci.yml` + `BUILD_TARGET=ci`. Read `scripts/ci.sh` for the pipeline.\n    Verify fix compiles locally before committing. Include Dockerfile, scripts/, .github/ in git add.\n    **No retry limit** — keep fixing until `gh run list` shows only successes.\n7. See CLAUDE.md \"Project Guard Rails\" for engine rules, infrastructure restrictions, and outcome vocabulary.\n   **INFRA GUARDRAIL:** If your task requires a new SSM parameter, DNS record, EC2 cloud-init change, security group rule, S3 bucket, or any other AWS resource: (a) write the Pulumi Go code in `infra/aws/` FIRST, (b) run `pulumi preview --stack sandbox` locally to verify the diff, (c) let `make snapshot` → PR → CI apply it. NEVER use `aws ssm put-parameter`, `aws route53 change-resource-record-sets`, or any AWS CLI write command to create or mutate infra directly. The verify condition for any infra-touching task must cite the `infra/aws/*.go` file changed AND confirm the resource appeared in a `pulumi preview` diff.\n\n   ## GitOps — Never Write Directly to EC2\n\n   All sandbox changes go through Git → CI → deploy. This means:\n\n   - DO NOT use `aws ssm send-command` to write files, patch configs, or restart services\n   - DO NOT use `aws s3 cp` to push config files to the EC2 as a workaround\n   - DO NOT call `caddy reload`, `docker compose restart`, or `systemctl` via SSM to apply undeployed changes\n\n   The correct path: edit locally → git commit → make snapshot → merge → CI deploys.\n\n   SSM is allowed READ-ONLY: docker logs, docker ps, psql SELECT, curl health checks.\n   SSM put-parameter is allowed for NEW credentials only (never config).\n\n   If you are tempted to SSM-write something: STOP. Commit the change instead.\n\n8. If all tasks in the current slice are checked, output \"Slice N complete.\" and stop.\n   Do NOT start the next slice without a plan regeneration (`./loop.sh plan 1`).\n9. When you learn something new about building or testing, update CLAUDE.md\n   (Operational Notes section) — keep it brief. Status updates go in IMPLEMENTATION_PLAN.md.\n10. KNOWLEDGE BASE: Read ralph-logs/KNOWLEDGE.md at start. Increment votes if an entry helps.\n    Add new entries for gotchas/patterns you discover (votes: 1). Keep entries 3-5 lines max.\n","build (ralph-4)":"We are building a predicate-based rules engine grounded in many-sorted first-order logic, a golf domain consumer, and an HTMX admin frontend. Read specs/README.md for the full spec index.\n\n## THREE ENVIRONMENTS — full reference in `.agent_instructions/environments.md` (always loaded at step 0e)\n\n| | **Local dev** | **GitHub CI** | **Sandbox (EC2)** |\n|---|---|---|---|\n| App URL | `http://localhost:8085` | `http://localhost:8080` (inside runner) | `https://academy.sweetspot-labs.io` |\n| Metrics backend | **Prometheus** (`platform/prometheus-config.yaml`) | None | **Mimir** — no Prometheus |\n| Grafana | `http://localhost:3002` (anonymous) | None | `https://academy.sweetspot-labs.io/grafana` (Keycloak SSO) |\n| OTel config | `platform/otel-collector-*.yaml` | None | `deploy/sandbox/otel/agent.yaml` |\n| Job labels | `job=\"node-exporter\"` | — | `job=\"academy/node-exporter\"` |\n| Access | Direct | GitHub runner | AWS SSM (`--profile ralph-agent`) |\n\n**Which env for which task:**\n- `internal/`, `web/`, `migrations/` → local dev + CI gate\n- `platform/grafana/`, `deploy/sandbox/otel/`, dashboards → **sandbox** after deploy (local ≠ sandbox)\n- `.github/workflows/` → GitHub CI\n- `infra/` → sandbox via PR (never `pulumi up` locally)\n\n**Active model:** default alias is `opus` (Claude Opus 4.7 / model ID `claude-opus-4-7`). S.175 pins the exact dated snapshot that Anthropic resolves on the first iteration and passes `--model \u003cdated-id\u003e` to every subsequent iteration in the session. If Anthropic rotates the alias mid-session, the loop aborts with \"upstream model snapshot changed — restart session\". The pinned dated ID is recorded in `ralph_metrics.model_dated` (Grafana: \"Active model snapshot\" stat panel).\n\n**OUTPUT RULES — every output token costs 5x an input token:**\n- Do NOT narrate (\"Let me search for...\", \"Now I'll implement...\", \"I'll start by...\")\n- Do NOT explain what you're about to do — just do it\n- Do NOT summarize what you just did — the diff speaks for itself\n- Do NOT repeat file contents back after reading them\n- Do NOT write long commit messages — one line: `type(scope): summary`\n- Keep tool call descriptions under 10 words\n- When running tests, do NOT quote the full output — just state pass/fail and errors\n- Your goal: maximize code written per output token. Talk less, code more.\n\n**PARALLEL TOOL CALLS — each sequential turn costs ~50K cached tokens:**\nALWAYS batch independent tool calls in ONE response. Never sequential reads/greps/edits when parallel works.\n- Read/grep/edit 3+ independent targets? ONE turn with parallel calls.\n- Build + vet + lint + test? ONE chained command: `ralph-build.sh \u0026\u0026 ralph-vet.sh \u0026\u0026 ralph-lint.sh \u0026\u0026 ralph-test.sh`\n- Exploring? Spawn 3+ parallel searches (grep types + grep functions + glob files) in ONE turn.\n- EXCEPTION: if call B depends on the RESULT of call A, those MUST be sequential.\n\n**FIRST TURN HARD LIMIT — your first response MUST use ≥3 parallel tool calls.**\nIf your first turn has only 1 Read/Bash, you are doing it wrong. The task lookup (0a) is\na single Bash call, but the NEXT turn MUST batch ≥3 parallel reads (spec sections,\nDOMAIN_MODEL.md, example files). Median 47 turns costs ~$4.70/iteration — every turn saved\nis $0.10. Batching 3 sequential reads into 1 parallel call saves $0.20 per iteration.\n\n**ANTI-PATTERN — sequential reads across turns (wastes 2 turns = $0.20):**\n```\nTurn 1: Read(specs/domain/DOMAIN_MODEL.md)                              ← WRONG\nTurn 2: Read(specs/05-event-model-mapping.md)                           ← WRONG\nTurn 3: Read(bc/organisation/slices/create_customer/command.go)         ← WRONG\n```\n\n**CORRECT — parallel reads in one turn (saves 2 turns):**\n```\nTurn 1: Read(specs/domain/DOMAIN_MODEL.md)                              ← ALL THREE\n      + Read(specs/05-event-model-mapping.md)                           ← IN ONE\n      + Read(bc/organisation/slices/create_customer/command.go)         ← RESPONSE\n```\n\nSame applies to exploration: batch Grep + Glob + Read in ONE turn, not across 3 turns.\n\n0. **INJECTED CONTEXT — already in your system prompt, do NOT re-read these files:**\n   - `CODEBASE.md` — slim summary (aggregate list, slice count, conventions)\n   - `.agent_instructions/codebase-skeleton.md` — always-on symbol map (TOK.3.a):\n     aggregate→events→projectors, command→slice, slice→view-tables, HTTP route→handler.\n     ~5k tokens, auto-generated from source by `make codebase-map`. Consult it BEFORE\n     grepping — most \"who handles event X?\" / \"where's command Y?\" / \"which table does\n     slice Z write?\" questions answer in the skeleton without any tool call.\n   - `.agent_instructions/recipes.md` — step-by-step playbooks for common task types\n   Use CODEBASE.md + codebase-skeleton.md to check if a slice/aggregate already exists and\n   who handles what. Use recipes.md for the exact file structure and patterns. Only Read\n   the specific EXAMPLE file referenced in the recipe (e.g., `create_customer/command.go`),\n   not the whole codebase.\n\n   BATCH STEPS 0a-0c: after finding the task (0a), read spec sections + DOMAIN_MODEL.md\n   ALL IN ONE TURN with parallel Read calls. Do NOT read them one at a time across turns.\n\n0-DEPLOY. **Deploy sentinel — check before every task:**\n   ```bash\n   test -f .deploy-now \u0026\u0026 echo \"DEPLOY NOW: $(cat .deploy-now)\"\n   ```\n   If `.deploy-now` exists, deploy BEFORE doing any task:\n   0. **REUSE BEFORE SNAPSHOT (no duplicates).** First check whether a snapshot PR\n      is already open — only ONE may ever be in flight:\n      ```bash\n      gh pr list --state open --json number,headRefName,createdAt -q '[.[] | select(.headRefName | startswith(\"snapshot/\"))] | (sort_by(.createdAt) | last // {}) | .number // empty'\n      ```\n      Also check `.deploy-now-pr` (a PR number persisted by a prior timeout). If\n      either yields an open PR number N, **resume polling N (step 3) — do NOT run\n      `make snapshot`.** Only create a fresh snapshot when no open snapshot PR exists.\n      Creating a second PR while one is open stacked #339/#340/#341 on 2026-05-24.\n   1. `make snapshot` — creates PR from HEAD, opens PR to main (ONLY if step 0 found none)\n   2. Extract PR number from output (look for `PR created: .../pull/N`)\n   2.5. **AUTO-RESOLVE A DIRTY PR (RG.DEPLOY-GATE.AUTORESOLVE-DIRTY).** Before polling,\n      un-wedge the PR if GitHub marks it conflicting — a `mergeStateStatus=DIRTY`\n      (`mergeable=CONFLICTING`) snapshot PR NEVER triggers `pull_request` CI, so\n      `statusCheckRollup` stays EMPTY (not failing) and step 3 would idle forever:\n      ```bash\n      scripts/ralph-resolve-dirty-snapshot-pr.sh N\n      ```\n      The script is a no-op when the PR is clean (safe to always run). When the PR\n      is dirty it merges `origin/main` into the snapshot branch with `-X ours` (the\n      snapshot side is authoritative — its `IMPLEMENTATION_PLAN.md` is strictly\n      newer) in a throwaway worktree and pushes, flipping the PR MERGEABLE so CI\n      starts. It uses a worktree, NOT an in-place checkout, to dodge the root-owned\n      untracked `prometheus/` dir that breaks branch-switch on the agent host. If it\n      reports a non-IMPLEMENTATION_PLAN conflict it cannot resolve, fall back to the\n      heavy hammer `scripts/ralph-redeploy-conflicting.sh N` (close + re-`--drain`).\n   3. Poll until CI completes: `gh pr view N --json statusCheckRollup -q '.statusCheckRollup[] | select(.name != null) | [.name, (.conclusion//.status)] | @tsv'` (the `select(.name != null)` drops GitHub's phantom null trailing element that otherwise emits a bare-tab line)\n   4. If all checks SUCCESS/SKIPPED: `gh pr merge N --squash` then `make post-snapshot`\n   5. If any check FAILED: fix root cause first, then retry snapshot\n   6. `rm .deploy-now .deploy-now-pr` — removes sentinels so they don't fire again\n   7. Continue to the normal task below (don't stop after deploy)\n\n   **WARNING — task work + post-snapshot:** `make post-snapshot` runs `git reset --hard origin/main` which silently destroys any uncommitted tracked edits outside `ralph-logs/`. If a deploy fires mid-iteration while you have unstaged production-code edits, those edits will vanish (lost iter 33's full ENG.RRULE.TEE-SHEET.PASS-ELIGIBILITY patch on 2026-05-22). **Commit your task work BEFORE running post-snapshot**, even if it's WIP. Since RG.SNAPSHOT-GUARD landed, the script now aborts on dirty production paths and tells you to commit/stash — but treat that abort as a self-inflicted speed bump, not a discovery: front-load the commit. Override is `RALPH_FORCE_DISCARD=1`; almost never the right call.\n\n0a. Find the NEXT task. Run: `./scripts/ralph-next-task.sh`\n    It outputs `LINE:TASK_ID` (e.g. `3493:X.26`). It respects the NEXT: focus line and skips BLOCKED tasks.\n    Fallback ordering is `(priority_rank, line_number)` — `[HIGHEST PRIORITY]` (rank 0) wins over\n    `[HIGH]` (1) wins over `[NORMAL]`/unmarked (2) wins over `[NICE-TO-HAVE]` (3). Use\n    `[HIGHEST PRIORITY]` on the task header (e.g. `**ID** [RALPH] [HIGHEST PRIORITY] …`) to jump\n    a task to the front of the queue regardless of where it sits in the file.\n    Do NOT use raw `grep` on IMPLEMENTATION_PLAN.md — output gets mangled by compression tools.\n    Then use the Read tool to read ONLY the 10 lines around that line number to get the task description and verify step.\n    **CACHE RULE: Do NOT edit IMPLEMENTATION_PLAN.md until step 4b (the final commit turn).**\n    Editing it mid-iteration changes the file on disk, which invalidates the prompt cache for\n    all subsequent turns — every turn after the edit pays full input cost instead of cache cost.\n    This applies to ALL prompt-adjacent files: IMPLEMENTATION_PLAN.md, CLAUDE.md, KNOWLEDGE.md.\n\n    **AUTO.DEPLOY co-adence note.** If the plan contains an open `AUTO.DEPLOY.*` task AND\n    `ralph-next-task.sh` picked a different task, that is expected: the picker is\n    `(priority_rank, line_number)` ordered and AUTO.DEPLOY tasks are injected at a\n    specific position. Do NOT swap to the deploy task on your own — note the situation\n    in your scratchpad (step 4a.5) and proceed with the picked task. The deploy fires\n    automatically when the picker reaches it (after S.PICKER.PRIORITY-AWARE lands) or\n    when an explicit `AUTO.DEPLOY.NOW` is injected at the top. Manually re-prioritizing\n    skips priority rank checks and double-commits a deploy that the picker would have\n    handled cleanly one iteration later.\n0b. Study the relevant spec sections for that task (referenced in the plan).\n0c. Read `specs/domain/DOMAIN_MODEL.md` — the canonical domain model reference.\n    ALL domain work must align with this document. If your task contradicts it, flag the conflict.\n0c.5. Read `specs/adr/INDEX.md` — one-line-per-ADR decision index (auto-generated).\n    Cheap to load (≤3 KB). Citing an existing ADR (e.g. \"per ADR-0011\") is faster than\n    re-litigating the decision. If your task touches a topic with an ADR, open the\n    referenced file and align with it. If your task *contradicts* an existing ADR,\n    STOP and surface the conflict — do not silently override.\n0d. Match the task to a recipe in `.agent_instructions/recipes.md` (already in your system prompt).\n    If a recipe matches, follow it exactly — Read only the referenced example file, then implement.\n    If no recipe matches, explore the codebase: BATCH 3+ parallel tool calls (grep + glob) in ONE turn.\n    Check CODEBASE.md (in your system prompt) to know which packages to search.\n0d-UNCOMMITTED. **[UNCOMMITTED] block** — when present in the prompt prelude\n   (advisory, RG.74.bis), a prior iteration left uncommitted/untracked files on\n   disk in this task's slice dir(s). The block is the `git status --short` output\n   for each `internal/slices/\u003cname\u003e` or `bc/\u003cctx\u003e/slices/\u003cname\u003e` referenced by the task. REVIEW and REUSE that\n   on-disk work — read the existing files before re-running `make new-slice` or\n   re-writing them from scratch. Iter 41 left 7 slice files untracked after a\n   zero-diff run; iter 42 burned 4 turns rediscovering them. If the files are\n   correct, just commit them; if stale, reconcile before proceeding.\n\n0e-EVAL. **[PREVIOUS EVALUATOR REJECTED] block** — when present in the prompt\n   prelude (injected above the task context, mirrors the `[SCRATCHPAD]` and\n   `[LAST ITERATION]` block style), the previous iteration's commit was\n   flagged `mismatch` by the post-iteration evaluator. Format:\n\n   ```\n   [PREVIOUS EVALUATOR REJECTED]\n   The evaluator flagged the previous iteration as a mismatch.\n     Task:      \u003cprior task_id\u003e\n     Iteration: \u003cprior iteration #\u003e\n     Commit:    \u003cprior commit sha\u003e\n     Reason:    \u003cevaluator_reason — verbatim, may contain commas\u003e\n\n   The picker demoted \u003cprior task_id\u003e below all other unblocked tasks for\n   this round so a different task can run first. If \u003cprior task_id\u003e is\n   re-picked anyway (because no other unblocked tasks exist), address the\n   reason above in THIS iteration instead of re-shipping a diff with the\n   same shortcoming.\n   ```\n\n   Behavior: the task picker (`scripts/ralph-next-task.sh`,\n   S.RETRO.20260521.EVALUATOR-MISMATCH-GUARD) reads the last metrics row for\n   the current session; if `evaluator_verdict=mismatch` it demotes the\n   rejected task to priority rank 9 (below `[NICE-TO-HAVE]`) so any other\n   unblocked task wins the round. If the rejected task is the only\n   pickable candidate it still gets picked — demotion ≠ block. The block\n   is one-shot: once any iteration writes a new metrics row, the\n   last-row check stops seeing the mismatch verdict and the block stops\n   appearing.\n\n   What you must do when you see this block:\n   - If the picker handed you a DIFFERENT task: note the prior rejection in\n     your scratchpad and continue with your assigned task.\n   - If the picker handed you the SAME task (only-candidate fallback):\n     address the `Reason:` line directly. Do NOT re-ship the same shape of\n     diff — the evaluator already flagged it.\n\n0e-STEER. **[STEER INTERRUPT] block** — injected mid-iteration, NOT in the prelude.\n   Unlike the prelude blocks above, this one can appear at ANY turn, returned by the\n   `check-steer-interrupt.sh` PreToolUse hook the instant the operator drops a hard-stop\n   steer file (`inbox/STEER.HARD.*.md` or `inbox/*hard-steer*.md`) while you are\n   mid-iteration. It surfaces as a blocked tool call whose reason is:\n\n   ```\n   [STEER INTERRUPT] The operator dropped a hard-stop steer mid-iteration:\n     inbox/\u003cfile\u003e.md\n\n   --- steer contents (first 500 chars) ---\n   \u003cverbatim steer text\u003e\n   --- end steer ---\n\n   ACT NOW, do not finish the current task first:\n     1. Commit your in-flight work with a \"(wip - interrupted by operator steer at iter N)\"\n        annotation and leave its checkbox [ ].\n     2. Then carry out the steer above as your next action.\n   ```\n\n   Why it exists: `ralph-inbox-fold.sh` folds new inbox files only BETWEEN iterations,\n   so a steer dropped mid-flight was invisible for ~18-25 min (session 20260521-083736 ran\n   ~22 min on the wrong task after a 11:36 hard-steer). The hook closes that gap to one turn.\n\n   What you must do when you see this block:\n   - STOP the current task immediately. Do not argue with the block or retry the same tool\n     call hoping it clears — it fires once per steer file and will not re-block.\n   - Commit whatever you have with the `(wip - interrupted by operator steer at iter N)`\n     suffix; leave the in-flight task's checkbox `[ ]`.\n   - Carry out the steer's instruction as your next action. If the steer says \"stop\", stop.\n\n0e. SKILL ROUTING — two modes: INVOKE (run a skill) or LOAD (read an agent instruction for context).\n\n    **MODE 1 — INVOKE A SKILL** (task contains `[SKILL:name]` or `[SKILL:name args]`):\n    Use the Skill tool directly. Do NOT implement the task yourself — the skill IS the implementation.\n    ```\n    Skill(skill=\"name\", args=\"args\")\n    ```\n    After the skill completes, read its output to decide whether to mark the task `[x]` (all work done)\n    or leave it `[ ]` (more iterations needed — skill will say so). Re-queue logic lives in the skill.\n\n    **You may also invoke skills proactively** — without an explicit `[SKILL:]` tag — whenever a task\n    clearly maps to a named skill from the available-skills list in your system prompt. Use judgment:\n    if the task description is \"do X end-to-end\" and a skill named X exists, invoke it.\n    Skills are first-class tools. Use them freely.\n\n    **MODE 2 — LOAD CONTEXT** (task matches a tag/keyword — Read the agent instruction file):\n    Batch with other step-0 reads in the SAME parallel turn. Skip if no tag matches.\n\n    | Task tag or keyword | Agent instruction file to Read |\n    |---------------------|-------------------------------|\n    | ANY task (always) — Read ALL THREE in one parallel turn | `.agent_instructions/environments.md` + `.agent_instructions/sandbox-dev-env.md` + `.agent_instructions/pr-to-sandbox.md` |\n    | `[GRAFANA]`, \"dashboard\", \"panel\", \"metrics\", observability | `.agent_instructions/grafana-verify.md` AND `.agent_instructions/grafana-dashboard.md` |\n    | `[UX]`, `[UI]`, `[FRONTEND]`, \"htmx\", \"template\", \"page\" | `.agent_instructions/frontend-design.md` |\n    | `[E2E]`, `[BROWSER]`, \"hurl\", \"Chrome MCP\" | `.agent_instructions/e2e-verify.md` |\n    | `[RESEARCH]`, `[SPIKE]`, `[REFINE]` | `.agent_instructions/research-methodology.md` |\n    | `[INFRA-DECISION]` (load-bearing infra/security/release choice) | `specs/adr/TEMPLATE.md` — copy to `specs/adr/NNNN-slug.md` and fill in. Run `scripts/ralph-adr-update-index.sh` after writing. |\n    | `[INFRA]`, or task touches `infra/aws/*.go`, `Pulumi.sandbox.yaml`, SSM params, DNS records, cloud-init, security groups, ECR, S3 buckets | `.agent_instructions/infra-release.md` — MUST read. Never `pulumi up` locally, never `aws ssm put-parameter` to create resources — write Pulumi Go code and let CI apply. Preview before push. |\n    | `[CI-FIX]` (this iteration is a CI-FIX retry — `RALPH_CI_FIX_RETRIES \u003e 0`) | `.agent_instructions/ci-triage.md` — MUST read. Replaces \"look at the error and fix it\" with structured multi-cause classification + local-reproduce-before-commit. |\n    | `[OIDC]`, `[AUTH]`, \"login flow\", \"Keycloak browser\", \"session verify\" | `.agent_instructions/oidc-browser-verify.md` — 7-step Chrome MCP login flow with Keycloak form selectors. |\n    | `AUTH.*` task ID, or \"keycloak\", \"user_directory\", \"realm role\", \"user management\" | `.agent_instructions/recipes/keycloak-admin-api.md` — admin token acquisition, User CRUD, role assignment, invitation flow, `UserDirectory` port + `KeycloakAdminClient` adapter. |\n    | `oidc`, `keycloak_provider`, `feedback_bearer`, `oidc_login`, path under `internal/adapters/secondary/auth/`, OR an OIDC-shaped failure in `LAST_ITERATION.md` (302 to `/dev/impersonate/users`, \"OIDC login handler init failed\", `/dev/feedback` 503 \"auth service unavailable\") | grep `ralph-logs/KNOWLEDGE.md` for the `[auth] /org/* redirects to /dev/impersonate/users → OIDC handler init silently failed at boot` entry AND read the `RG.RECIPE.OIDC-SPLIT-URL` task body in `IMPLEMENTATION_PLAN.md`. Loads the split-URL root cause (`compose_admin.go` devLoginRedirect ← `cfg.LoginUserGET == nil` ← OIDC init WARN) + fix (`oidc.InsecureIssuerURLContext(ctx, cfg.ExternalURL)` wrap) up-front; closes diagnosis in \u003c2 turns instead of 8. |\n    | `ARCH.QB.VIEW.*` task ID, or \"migrate view slice to QueryBus\", \"extract ReadModel port\" | `.agent_instructions/recipes/arch-qb-view-migration.md` — 5-touch shape (query/handler/postgres_store/module/init + adapter swap), JSONB-in-adapter rule, and the 3 in-the-same-commit fitness updates (knownViolations removal, sliceMinimalStructureAllowlist, sliceFanOutExemptions). |\n    | \"booking\", \"reserve\", \"slot\", \"decider\", or path under `bc/booking/`, `bc/allocation/` | `.agent_instructions/booking-perf.md` |\n    | path under `internal/slices/`, `bc/*/slices/`, \"command_handler\", \"projector\", \"view\", or new table in `academy.*` | `.agent_instructions/cqrs-posture.md` |\n\n    If multiple match (e.g., `[UI]` + `[E2E]`), Read both in ONE parallel turn.\n    Do NOT re-read agent instruction files on subsequent turns — one Read at the start is enough.\n\n1. Your task is to implement that ONE task. Implement first, test once at the end.\n   Search the codebase before writing new code — don't duplicate existing implementations.\n   If the task needs helper types or interfaces from other packages, create them.\n   Implement FULL functionality — no placeholders, no stubs, no TODOs, no \"// TODO: implement later\".\n\n   SEARCH DELEGATION — for any codebase investigation whose expected output spans\n   \u003e3 files or \u003e50 lines (e.g. \"find all callers of X\", \"which tests reference Y\",\n   \"show me every slice that imports Z\"), delegate to the `ralph-searcher` sub-agent\n   via the Task tool. The sub-agent runs on Haiku and its Grep/Read output stays in\n   its own context, keeping the main loop's output_tokens lean. Direct Grep/Read are\n   fine for ≤3-file spot checks; don't round-trip a single-file lookup through a\n   sub-agent. Context Efficiency / Avg Subagent Calls tracks adoption.\n\n   CREATING A NEW SLICE? Use the generator — do NOT hand-write boilerplate:\n   ```\n   make new-slice NAME=cancel_booking KIND=command AGGREGATE=booking\n   make new-slice NAME=view_wallet_balance KIND=view AGGREGATE=wallet\n   make new-slice NAME=expire_stale_bookings KIND=automation AGGREGATE=booking\n   make new-slice NAME=notify_booking_denied KIND=translation EVENT=BookingDenied\n   ```\n   Four slice types per Event Modeling (see `specs/domain/DOMAIN_MODEL.md` § Slice Types):\n   KIND=command (state change): init.go, command_handler.go — user action → events.\n   KIND=view (read model): init.go, query.go, query_handler.go, http_handler.go — events → projection → query.\n   KIND=automation: init.go, processor.go — todo-list view → processor → command (no saga).\n   KIND=translation: init.go, translator.go, translator_test.go — events → external system (email, payment).\n   Then edit the generated files to add domain-specific logic. Do NOT hand-write these files.\n   Optional flags: `EVENT=BookingDenied` (for translation), `ROUTE=\"/admin/things\"` (view HTTP handler).\n\n   BOOKING BC SLICES (`bc/booking/slices/`) — use the per-command handler pattern (no standard CommandHandler type):\n   - Booking command slices use per-command handlers like `CancelBookingHandler`\n     instead of a single `CommandHandler`. The fitness test `TestEveryCommand_HasHandler` already\n     supports this via the `perCmdHandler` fallback (event_model_test.go line ~398).\n   - If a booking slice has `command_handler.go`, add it to the auth check allowlist in\n     `tests/fitness/auth_check_test.go` (if it doesn't call `GetAuthenticationContextFromContext`).\n   - If a booking view slice defines queries inline (no separate `query.go`), add it to\n     `noSeparateQueryFile` in `tests/fitness/cqrs_rules_test.go`.\n   - Fitness tests run in CI only (see DILIGENCE RULES below). After creating a booking slice,\n     verify by reading the fitness-test allowlist and confirming your slice is listed — do\n     NOT run the fitness suite locally (it's slow and the gate is disabled by design).\n   - Note: the old `internal/dcb/` and `internal/slices/dcb_*/` paths were a proof-of-concept\n     that was dropped (tables dropped in migration 20260531002). Use `bc/booking/slices/` instead.\n\n   IMPLEMENT-THEN-TEST — do NOT run tests mid-implementation:\n   ```\n   1. IMPLEMENT: Write ALL production code for the task. Get it compiling.\n      Do NOT run tests until the implementation is complete.\n   2. WRITE TESTS: Write tests for the task's expected behavior.\n      Capture WHY each test exists in a comment — future iterations have no prior context.\n      Name tests `Test{Behavior}_{ExpectedOutcome}` — the name explains WHY the test exists.\n   3. VERIFY: Run `./scripts/ralph-build.sh \u0026\u0026 ./scripts/ralph-vet.sh \u0026\u0026 ./scripts/ralph-lint.sh \u0026\u0026 ./scripts/ralph-test.sh` as ONE command (single turn).\n      `ralph-vet.sh` catches test-only compile errors that `ralph-build.sh` misses (it skips `_test.go` files).\n      Do NOT split build/vet/lint/test into separate turns. Do NOT run `make pre-commit` — it is slow and redundant.\n   3b. INTEGRATION SMOKE: If your diff touches `internal/slices/\u003cX\u003e/` or `bc/\u003cctx\u003e/slices/\u003cX\u003e/` and that slice has\n       integration tests, run `make integration-fast SLICE=\u003cX\u003e` AFTER step 3 passes.\n       Skip if no tests are found (script prints \"NO INTEGRATION TESTS found\").\n       This catches SQL typos and cross-slice bugs locally in \u003c90s instead of waiting\n       10 min for CI. Do NOT run `make test-integration` (full suite, slow).\n   3c. FITNESS MICROSET: `./scripts/ralph-fitness-microset.sh` — fast fitness checks: file line\n       cap (500), function line cap (80), dead code allowlist. Run this after any commit that adds\n       or modifies Go files or web/templates. If it fails, fix before the iteration ends.\n       DO NOT run the full fitness suite (`ralph-fitness.sh`) locally — CI-only by design.\n   4. FIX: If tests fail, fix and re-run. But do NOT loop more than 3 times — see HARD LIMITS.\n   ```\n\n   Each test run costs ~2 turns (run + read output). 5 test runs mid-implementation = 10 wasted turns = 500K cached tokens.\n   One test run at the end = 2 turns. The math is clear: implement first, test once.\n\n   TURN-BUDGET GUARD — if you have used 80+ turns, something is wrong:\n   - At 80 turns: STOP exploring. Commit what you have, even if incomplete.\n     Leave the parent `[ ]` and emit the next `[ ]` sub-atom (`\u003cparent-id\u003e.\u003catom\u003e`)\n     for the remaining work. Do NOT mark the parent `[-]` UNLESS you also leave a\n     `[ ]` descendant sub-task: a `[-]` parent with no open `[ ]` sub-task is frozen\n     forever (both pickers treat `[-]` as never-selectable), which stalled\n     S2/S3/S5.B. Verify with `scripts/ralph-next-task.sh --lint`.\n   - At 100 turns: You are stalling. Commit immediately. Do NOT start new files.\n   - This session had iterations at 1124, 2686, and 4741 turns — all were stalls\n     that produced work achievable in 40 turns. The cost of a stall ($15–35)\n     dwarfs the cost of a partial commit ($2).\n\n   DON'T: Run tests mid-implementation, mock for isolation, or test aggregate internals directly.\n   DO: Implement all code first, write tests, run once through MessageBus boundary.\n   Slice rules (types, independence, isolation) are in DOMAIN_MODEL.md (step 0c).\n   Testing rules are in testing.md (step 0e). Do NOT duplicate them here.\n\n   LONG-RUNNING COMMANDS — bench, replay, and load scripts can take \u003e5 minutes:\n   - DON'T run multi-minute bench scripts inline. Specifically: `scripts/japan-range-bench*.sh`,\n     `scripts/hills-*.sh` full runs, `scripts/ralph-bench.sh` against the full suite, or any\n     `make bench` / `make load-test` invocation without size flags. Iter 14 of session\n     20260517-202253 ran `japan-range-bench-growth.sh --concurrency=10 --cells=2` inline;\n     the bench ran 40+ minutes, the iteration ended while it was still running, the loop\n     blocked waiting for the background process, and the human had to Ctrl+C and restart.\n   - DO use the smoke-test variant for in-iteration verification: smallest concurrency\n     (`--concurrency=1`) and smallest cell count (`--cells=2`), or whatever the script's\n     `--help` advertises as the minimum. Example: `timeout 120s bash scripts/japan-range-bench.sh --cells=2 --concurrency=1`.\n   - DO wrap every long-running shell invocation in `timeout 300s \u003ccmd\u003e` when you must run\n     it inline. The `timeout` exit code (124) is recoverable; a hung iteration is not.\n   - DO offload genuine long runs (\u003e5 min) to a `[BG-POLL \u003csentinel\u003e]` follow-up task per\n     step 6c — launch the bench detached with `nohup`, drop a sentinel on completion, and\n     tag the aggregation task with the sentinel path. The picker skips the follow-up until\n     the sentinel appears, so polling iterations cost $0.\n\n   DILIGENCE RULES — violating any of these means the task is NOT done:\n   - Fitness/architecture tests (`tests/fitness/`) run in CI only. Do NOT run `ralph-fitness.sh`\n     locally and do NOT re-enable the fitness gate in loop.sh — it is disabled by design.\n     Being addressed in S.93 — until the fast path ships (S.93.1–S.93.4), fitness remains CI-only.\n   - FIX ROOT CAUSES, not symptoms. No `// nolint`, `|| true`, error suppression, or skip logic.\n   - Unrelated bugs: fix AND add a KNOWLEDGE.md entry.\n   - No `InMemory*` stores in non-test code — use Postgres-backed implementations.\n   - No `fmt.Printf`/`log.Printf` debug statements — use proper logger.\n   - **CHROME DEVTOOLS MCP IS NOT OPTIONAL** for any task tagged `[GRAFANA]`,\n     `[UI]`, `[UX]`, `[FRONTEND]`, `[E2E]`, `[BROWSER]`, `[FEEDBACK]`, or any task\n     whose verify step mentions a URL (dashboard panel, admin page, `/d/...` path).\n     curl proves the endpoint responded with 200; only a screenshot proves the\n     page RENDERS and the DATA appears. Minimum per task:\n       1. `mcp__chrome-devtools__navigate_page` to the target URL\n       2. `mcp__chrome-devtools__take_screenshot` saved under the iteration dir\n          (e.g. `ralph-logs/sessions/$SESSION_ID/iteration-$ITERATION-$TASK/\u003cname\u003e.png`)\n       3. `mcp__chrome-devtools__list_console_messages` — zero errors (or\n          explain why each one is pre-existing in the iteration commit)\n       4. For Grafana: visit with `?from=now-1h\u0026to=now`, confirm panels show\n          real data; if empty, generate traffic, wait 30s, re-screenshot\n     No screenshot artifact in the iteration dir = task NOT done; commit with\n     `(wip)` suffix and leave the checkbox `[ ]`. This applies even if you\n     verified via curl/CLI — the screenshot is the non-negotiable artifact\n     for surfaces humans can actually see.\n   MIGRATION GOLDEN PATH — ALWAYS create a file under `migrations/`, then `make migrate`.\n   NEVER apply migrations via `docker compose exec postgres psql \u003c migrations/*.sql` — it\n   bypasses goose version tracking and causes duplicate `goose_db_version` rows. If `make\n   migrate` fails (e.g., TLS cert issue), fix the underlying issue rather than bypassing goose.\n   Verify: after `make migrate`, `SELECT MAX(version_id) FROM goose_db_version;` matches your file.\n   MIGRATION SCHEMA-DUMP RULE — after creating or modifying ANY file in `migrations/`:\n   1. `docker compose build migrate \u0026\u0026 make migrate` (apply the migration)\n   2. `make schema-dump` (regenerate `schema/academy.sql` with updated migrations hash)\n   3. `git add migrations/\u003cnew-file\u003e.sql schema/academy.sql` (stage BOTH files)\n   CI runs `scripts/check-schema-dump.sh` which hashes ALL tracked files in `migrations/`\n   and compares against the hash in `schema/academy.sql` header. If the migration file is\n   untracked or `schema/academy.sql` is stale, CI fails with \"STALE: schema/academy.sql is\n   out of date\". `make pre-commit` does NOT run this check — it only fires in CI.\n   MIGRATION CONSTRAINT RULE — S.46.1 broke CI because test cleanup wasn't updated:\n   DON'T: Add a UNIQUE, CHECK, or EXCLUSION constraint without checking test fixtures.\n   DO: `grep -rn 'INSERT INTO academy.\u003ctable\u003e' tests/` → update cleanup/teardown in same commit.\n   Why: constraints make previously-valid test data invalid. Tests that seed rows without\n   cleaning up will fail with constraint violations, but only in CI (local may pass by luck).\n   - Before committing: `grep -rn 'TODO\\|FIXME\\|HACK\\|XXX' \u003cchanged files\u003e` — fix any found.\n   - After committing: `git diff HEAD~1` — would you approve this PR?\n   - Max 50 lines/function, 300 lines/file. Never swallow errors. Every public function needs a test.\n   - Read per-directory CLAUDE.md files for package-specific rules (engine, admin, bootstrap).\n   - SCOPE CHECK: if the task touches \u003e5 files, STOP and plan subtasks first. Commit each\n     subtask separately. A 900s timeout with no commit = wasted iteration.\n   UI SCOPE RULE — any task touching `web/templates/` or `htmx/` MUST touch ≤5 production\n   files. If exceeded, split by layer: (a) data/handler, (b) template/route,\n   (c) interactivity/verify. Each sub-task stays within the 5-file limit and commits\n   independently. RULES.UI.1 stalled at 6 AST types in one task; splitting into 4 atoms\n   produced 4 clean commits at 25–29 turns each.\n   WIDE TASK RULE — \"add X to all Y slices\" tasks are 2-3x more expensive than focused ones:\n   DON'T: Create a single task like \"add org-scoping to all view query handlers.\"\n   DO: Pre-split into one sub-task per slice: S.N.a (resources), S.N.b (events), etc.\n   Each sub-task reads 2 files and writes 2 files instead of 10+10. If you discover a wide\n   task during planning, split it BEFORE adding to the plan. Existing example: ISO.2–ISO.7.\n   PARTLY-DONE WIDE TASK — when you finish ONE slice of a wide task and more remain,\n   emit the next `[ ]` sub-atom (`\u003cparent-id\u003e.\u003catom\u003e`) for the remaining work instead\n   of just marking the parent `[-]`. A `[-]` parent with no open `[ ]` descendant\n   sub-task is FROZEN forever — both pickers treat `[-]` as never-selectable (this\n   markdown picker skips it; planparse maps `[-]`→done), so its prose-only remaining\n   work is never picked up again (this stalled S2/S3/S5.B.SCENARIO). Either keep the\n   parent `[ ]`, or mark it `[-]` AND leave a `[ ]` sub-atom. Guard:\n   `scripts/ralph-next-task.sh --lint` flags any `[-]` task with no open `[ ]` sub-task.\n\n   NO BACKWARDS COMPATIBILITY — pre-production, vertical slices, OCP. When a domain\n   event changes: change the event type/shape; grep affected slices (`rg 'OldEventName'\n   bc/ internal/`); delete + rescaffold with `make new-slice`; port logic against the new\n   event. No dual-write, no deprecation window, no historic-rows fitness tests, no\n   aliasing old→new event types in the registry. Event-store rows from before the\n   change do not need to replay — `make migrate` rebuilds projections from the current\n   event shape. If a projector needs both old + new shapes, you are doing it wrong —\n   rebuild the projector. Applies to domain events, command names, aggregate names,\n   view table names, and public slice APIs.\n\n   GRAFANA DASHBOARD TASKS — see `.agent_instructions/grafana-verify.md` (loaded via skill routing 0e).\n   Key rule: fix ONE panel at a time, validate with `scripts/validate-dashboards.py`, never rewrite from scratch.\n\n   BUG-FIX WORKFLOW — required for any task tagged `[BUG]` or any fix to existing behavior:\n   - FIRST write a test that fails because of the bug. The test name must describe the\n     boundary condition or invariant being violated (e.g., `TestBooking_AtWindowClose_Denied`).\n   - Stage the failing test locally (`git add -p` the test file).\n   - THEN apply the production fix and confirm the test now passes.\n   - Do not skip this step even if the fix looks obvious — the test proves the bug existed\n     and prevents regression. A fix without a failing test is indistinguishable from a guess.\n   - If you cannot reproduce the bug with a test, document WHY in the commit message\n     (e.g., \"race condition only under load\", \"requires external service state\").\n\n   UI BUG VERIFICATION — for `[UX]`/`[UI]` tasks or htmx/ changes:\n   See `.agent_instructions/frontend-design.md` (loaded via skill routing 0e).\n   Key rule: reproduce the ACTUAL USER FLOW (click, fill, submit), screenshot to iteration dir.\n\n   [VERIFY-SANDBOX] PRE-FLIGHT — before invoking the `verify-flow` skill (see S.273):\n   - The skill reads its flow definitions from `fixtures/verify-flow/flows.yaml`.\n     If the flow name your task references is NOT present there, the skill cannot run\n     regardless of the slice's wiring state. ALWAYS run\n     `grep -E \"^[[:space:]]*\u003cflow-name\u003e:\" fixtures/verify-flow/flows.yaml` FIRST.\n   - If the flow is missing, do NOT invoke the skill. Either (a) the slice's HTTP\n     route + handler doesn't exist yet (mark `[BLOCKED:DEPS \u003cmissing-task-id\u003e]`),\n     or (b) the flow YAML itself needs an entry (file an inbox task to add it and\n     mark this one `[BLOCKED:NO-FLOW]`). Session 20260526-144918 wasted iters 165 +\n     168 on TREE.MOVE.3.B precisely because this preflight was skipped.\n   - LOGIN-REACHABILITY: for a flow with `login != none`, also confirm the role can\n     actually REACH the flow's `url:` before paying the ~$1 + 6-turn Chrome-MCP\n     drive. Two cheap probes (the skill's Phase 0b.1 runs them automatically):\n     (1) one SSM `SELECT` for the grant the route requires —\n     `org_admins_view` for `/org/*` + `/admin/*`, `players_view.home_org_id` for\n     `/play/*` + `/org/tee-sheet` — keyed by the role's email from the CLAUDE.md\n     Dev Actors table; (2) one `fetch(url, {redirect:'manual'})` to catch an\n     unmounted route or an off-URL server-side redirect (a 3xx to the auth proxy is\n     EXPECTED and fine). If the grant row is missing or the route 404s/redirects\n     off-target, the skill fails fast with \"flow's login can't reach url on env\"\n     (`skip_reason: login-cannot-reach-url`) in ≤2 turns instead of driving the\n     browser. This is the DEMO.4.B iter-13 failure: `admin@test.com` lacked the\n     `OrgAdminGranted` seed on sandbox, so `/org/setup/structure` bounced to\n     `/dev/impersonate/users` and every browser step failed downstream.\n\n   BOUNDARY CASES — for any task touching ranges, intervals, dates, or numeric thresholds:\n   - Document interval inclusivity `[a,b)` or `[a,b]` at call sites.\n   - Test boundary values: at-start, at-end, zero-length, one-before, one-after.\n   - Test 0, 1, max, max+1, negative for numeric logic.\n   - Verify both sides of an interval exchange agree on open/closed.\n   - Silent parse/unmarshal failures are bugs — return errors, don't return false.\n   - Name tests explicitly: `Test{Thing}_AtWindowClose_Succeeds`.\n\n   REMOVAL TASKS — completeness checklist:\n   When the task is \"remove X\" / \"delete X\" / \"deprecate X\" / \"decommission X\", you MUST\n   audit and clean up ALL of these locations before marking the task done:\n\n   1. Code call sites: `grep -r '\u003cX\u003e' bc/ internal/ pkg/ cmd/`\n   2. Imports / go.mod: `grep '\u003cX\u003e' go.mod go.sum \u0026\u0026 go mod tidy`\n   3. Docker compose services: `grep -r '\u003cX\u003e' deploy/`\n   4. Env vars / secrets: `grep -r '\u003cX\u003e_' .env* deploy/ infra/`\n   5. CI/CD references: `grep -r '\u003cX\u003e' .github/`\n   6. Documentation: `grep -r '\u003cX\u003e' docs/ specs/ CLAUDE.md README.md`\n   7. Grafana dashboards: `grep -r '\u003cX\u003e' deploy/sandbox/grafana/`\n   8. Inbox / plan references: `grep -r '\u003cX\u003e' inbox/ archive/inbox/ IMPLEMENTATION_PLAN.md`\n\n   In the commit message, list which categories had matches and were cleaned. If a\n   category had no matches, omit it. If you intentionally left some references (e.g.\n   archive/ history), state why.\n\n## Definition of Done (by task type)\n\n   **Backend logic:** Tests pass + exercise via UI/API + screenshot Domain Observability\n   dashboard (`localhost:3002/d/domain-observability/`) AND Tempo Traces\n   (`localhost:3002/d/tempo-traces/`). Command must appear in both.\n\n   **HTTP handlers:** Tests + Hurl E2E + verify auth rejection (401) + screenshot HTTP RED\n   dashboard (`localhost:3002/d/http-red-method/`) AND Tempo Traces.\n\n   **Admin UI:** CI green + Chrome MCP screenshot + submit forms + verify data renders.\n   Check Domain Observability + Tempo Traces for triggered commands.\n\n   **Infrastructure / CI/CD:** CI green + document what was verified.\n   For CI changes: push, wait for `gh run list`, verify `conclusion == success`.\n\n   **Observability:** Verify via curl/CLI first, then Grafana screenshot showing real data.\n   Generate traffic if panels show \"No data\", wait 15-30s, re-check.\n\n   **Batch changes:** Verify each affected page/endpoint individually — not just one.\n\n   ## E2E flow verification (for domain-affecting changes)\n\n   See `.agent_instructions/e2e-verify.md` (loaded via skill routing 0e).\n   Key rule: Hurl scripts first, observability second, Chrome MCP last (1-2 screenshots only).\n   Skip for: engine logic, infrastructure, CI/CD, docs, unrelated admin pages.\n\n2. See CLAUDE.md for Docker commands, test scripts, port mapping, and Chrome MCP usage.\n   Avoid `make pre-commit` (slow) — use `ralph-build.sh \u0026\u0026 ralph-vet.sh \u0026\u0026 ralph-lint.sh \u0026\u0026 ralph-test.sh`.\n\n3. Verify in running app — MANDATORY, never skip:\n   - Screenshot via Chrome MCP. Save to iteration dir.\n   - API: verify endpoint responds (not 404). UI: screenshot with real data + proper CSS.\n   - Grafana: navigate to dashboard, set last 15 min, screenshot. Generate traffic if \"No data\".\n   - INTERACT LIKE A USER: click, fill forms, submit. If click fails → BUG, fix root cause.\n   - NOT DONE if: placeholder, 404, in-memory-only, unstyled, broken clicks, empty Grafana panels.\n\n4. CHECKPOINT COMMIT — commit early and often, not just at the end.\n   After tests pass, commit IMMEDIATELY. Do not do more work after tests pass.\n   If you have been working for 50+ turns, commit what you have NOW even if not fully done.\n   An incomplete commit is better than losing all work to a context overflow.\n\n   **FB task — stuck guard:** If this is a `FB.*` task and you have used \u003e30 turns without writing any code yet, STOP reading and commit a `(wip)` note in `LAST_ITERATION.md` that lists: (a) the files you explored, (b) the concrete blocker (e.g., \"prerequisite command slice not yet wired\", \"ambiguous task description\"). This gives the next iteration a head start instead of repeating the same reads.\n\n   **Tidy First** (see step 0f): never mix refactoring + feature in one commit.\n   If both needed: `refactor:` commit first, then `feat:` commit.\n\n   **4a. BEFORE committing**, batch: write LAST_ITERATION.md + mark task done + run ralph-diff.sh in ONE turn.\n   Write `ralph-logs/LAST_ITERATION.md` with:\n   - `## Steps` — numbered list of what you did (search, create, wire, test, screenshot, commit)\n   - `## Could Still Be Wrong` — list 3 ways your change could be wrong. For EACH entry,\n     you MUST cite concrete evidence inline on the same bullet, in one of these forms:\n       - `Evidence: TestFooBar_ReturnsDenied PASS` (exact test name + pass, ran this iteration)\n       - `Evidence: screenshot ralph-logs/sessions/\u003csession\u003e/iteration-\u003cN\u003e-\u003cTASK\u003e/\u003cfile\u003e.png`\n       - `Evidence: impossible because \u003cspecific reason tied to code/type/constraint\u003e` (explain\n         why the failure mode cannot occur — compile-time check, DB constraint, etc.)\n     Vague hand-waves (\"tests cover this\", \"we validated it\", \"should be fine\") are NOT\n     evidence and do NOT satisfy the rule. If you cannot produce evidence for all three\n     claims, you may NOT flip `[ ]` → `[x]` in step 4b; commit `(wip)` and leave the\n     checkbox unchecked (see gate in step 4b).\n   - `## Friction` — one-line entries with tags: NAVIGATION, BOILERPLATE, TOOLING, WIRING,\n     TESTING, MIGRATION, DEVEX, CI, DOCS, PATTERN, WISH. Feeds into /retro aggregation.\n     **META.1.d — primary failure class.** Each bullet is classified into an\n     AgentBench-style taxonomy by `scripts/ralph-extract-friction-class.py` and stored\n     in `academy.ralph_metrics.friction_class` (plurality vote) plus\n     `friction_class_counts` (JSONB distribution). Format: `- CATEGORY: description`\n     uses the default class-mapping below; `- CATEGORY[CLASS]: description` overrides\n     the default when the category is ambiguous. Classes:\n       - `TOOL_OUTPUT` — tool returned wrong/malformed/truncated output\n       - `LONG_HORIZON` — task scope too large to finish in one iter (boilerplate, context budget)\n       - `INSTRUCTION_AMBIGUOUS` — prompt/spec/docs unclear or contradictory\n       - `WIRING` — DI/composition-root/bootstrap wiring bug\n       - `ENV_DRIFT` — container/cache/config drift from expected state (docker, migrate image, keycloak)\n       - `KNOWLEDGE_GAP` — didn't know how part of the codebase worked (had to grep/explore)\n     Default category→class mapping (used when no `[CLASS]` override): NAVIGATION/PATTERN→KNOWLEDGE_GAP,\n     BOILERPLATE→LONG_HORIZON, TOOLING/MIGRATION/DEVEX/CI→ENV_DRIFT, WIRING→WIRING,\n     TESTING→TOOL_OUTPUT, DOCS/WISH→INSTRUCTION_AMBIGUOUS.\n     **HARD CAP: top-K=5 entries max per iteration.** Only record friction you actually hit\n     this iteration; prioritize items that (a) cost ≥1 turn, (b) are likely to recur, or\n     (c) have a concrete fix you can name. Drop anything that doesn't meet those bars —\n     speculative or cosmetic nits waste the next iteration's attention. Items that recur\n     across iterations are auto-promoted to `KNOWLEDGE.md` by the META.1.c recurrence\n     scanner (once landed); discarded entries are NOT lost forever, they just need to\n     recur to earn their way in. Do NOT pad to 5 — 0, 1, or 2 entries is fine and normal.\n     The post-iteration extractor already applies `head -5`\n     (`scripts/ralph-post-iteration.sh` line ~454); writing more than 5 is wasted tokens\n     because the surplus is silently truncated before reaching the next iteration.\n   - `## Speed Up` — reflect on what slowed you down this iteration and propose ONE concrete\n     improvement. Examples: \"I grepped 8 slices to find who handles BookingApprovedEvent — an\n     event→slice index in CODEBASE.md would save 3 turns\", \"I hand-wrote projector_adapter.go\n     boilerplate — `make new-slice KIND=view` should generate this\". If the improvement is\n     actionable, also add it to IMPLEMENTATION_PLAN.md Discovered Issues as a task:\n     `- [ ] **RG.{N}** [RALPH] {description}`. Use the RG prefix (Ralph Growth) so these\n     self-improvement tasks are distinguishable. Only add if genuinely useful — not every\n     iteration needs one. Skip if nothing slowed you down.\n   This MUST happen before the commit so it is part of the main work, not an afterthought\n   that gets skipped when context runs low.\n\n   **4a.5. SCRATCHPAD — leave a note for next-iteration-Ralph (S.173).**\n   Before committing, append ≤200 tokens (≤800 chars) to\n   `ralph-logs/sessions/$SESSION_ID/SCRATCHPAD.md` capturing:\n   - **Surprises** — files or patterns that caught you off guard this iteration.\n   - **Gotchas** — specific pitfalls you hit and how you recovered.\n   - **Hint** — one line that will save next-iteration-Ralph a turn if it picks a\n     related task.\n\n   Do NOT summarize the task (the commit message and LAST_ITERATION.md handle that).\n   Do NOT include long file paths that are already in the commit diff. Do NOT exceed\n   800 chars per entry — pre-iteration trim keeps the file under 2KB (rolling).\n   Format:\n   ```\n   ## iter N — TASK_ID\n   - surprise/gotcha/hint: one or two lines\n\n   ```\n   Append, never overwrite. Skip entirely if nothing non-obvious came up.\n\n   **4b. Mark the task** as done in IMPLEMENTATION_PLAN.md: change `- [ ]` to `- [x]`.\n   **SELF-VERIFICATION GATE — read before flipping the checkbox:**\n   Re-read the `## Could Still Be Wrong` section you just wrote. For EACH of the 3 claims,\n   confirm an inline `Evidence:` citation (test name + PASS, screenshot path, or\n   impossibility argument — see step 4a). If ANY claim lacks evidence, you MUST:\n     1. Leave the checkbox as `- [ ]`.\n     2. Append ` (wip)` to the task description OR add a continuation sub-task under\n        Discovered Issues noting which claim lacks evidence.\n     3. Use `git commit -m \"feat(scope): summary (wip)\"` — the `(wip)` suffix signals\n        an incomplete iteration so the next run picks it up.\n   A task with unverified claims flipped to `[x]` is a lie to the next iteration and\n   to the human reviewer. The gate exists to prevent that. Sleeper tasks are exempt\n   from the flip rule (they stay `[ ]` forever regardless) but STILL require evidence\n   citations for their `Could Still Be Wrong` entries.\n   If you discover new issues or tasks, add them to the Discovered Issues section.\n   **THIS IS THE ONLY TURN where you edit IMPLEMENTATION_PLAN.md, CLAUDE.md, or KNOWLEDGE.md.**\n   Editing these files earlier busts the prompt cache — every subsequent turn pays full input\n   cost (~$0.50/turn extra). Batch ALL edits to these files into this single final turn.\n\n   **Feedback threads:** Do NOT read or write `feedback/threads/*.json` yourself — the files\n   are 25-63 KB and reading them wastes turns. The feedback context is already in your prompt.\n   If a `[FEEDBACK]` block is present, it may contain MULTIPLE threads. Address ALL of them:\n   - **Action threads** (open, in_progress): fix the issue, update status via curl, commit.\n     Quick wins (typo, missing field, wrong label): fix inline and mark `done`.\n     Bugs needing investigation: mark `in_progress` or `accepted` and add a plan task.\n     Not reproducible or out of scope: mark `rejected` with a brief reason via curl.\n   - **Discussion threads** (in_discussion): ENGAGE IN DISCUSSION. Post a reply via curl.\n   - **Reopened threads** (done/rejected/accepted with a human follow-up): the human posted\n     after you closed the thread. Treat like a new action/discussion: read their message,\n     respond via curl, update status (e.g., back to `in_progress` or `in_discussion`).\n     NEVER ignore a thread where a human was the last to respond.\n   After fixing or addressing a thread, ALWAYS post a thread-specific reply via curl\n   explaining IN DETAIL what you did for that specific thread. Be explicit about the\n   changes — file paths, what was added/removed, why. Do NOT rely on generic commit messages.\n   Use the **Ready-to-run commands** at the bottom of each thread block — they call the\n   reply/status wrappers (scripts/ralph-feedback-reply.sh, scripts/ralph-feedback-status.sh)\n   with the thread ID pre-filled. For a reply, write your reply text to the named file FIRST\n   (Write tool) then run the wrapper — it mints a fresh token and json.dumps the body, so a\n   shell-quoting or invalid-JSON bug is impossible. Do NOT hand-build a curl with an inline\n   JSON body. Do NOT reconstruct the URL or headers from memory — that posts to localhost and\n   the sandbox never sees it.\n\n   **STATUS / REPLY CONSISTENCY (FB-793e).** Before moving to the next thread,\n   verify your reply prose matches the status you are about to set:\n   - If the reply describes a landed change (\"fixed\", \"done\", \"shipped\",\n     \"changed X at Y:line\") → status MUST be `done` (or `rejected` if you refused).\n   - If the reply describes planned/deferred work (\"added a plan task\",\n     \"will land in\", \"tracked as TASK.N\") → status MUST be `accepted`\n     (work scheduled, not started) OR `in_progress` (started, not finished).\n   - If the reply asks a clarifying question or continues discussion →\n     status MUST be `in_discussion`.\n   - Mismatches (reply says \"done\" but status `in_progress`, or reply says\n     \"I'll add a task\" but status `done`) confuse the human reviewer and cause\n     reopened threads next iteration. Re-read each `reply + PATCH status` pair\n     before committing. This check is behavioral — no automated gate runs.\n\n   **4b.5. Principle-sampled pre-commit critique (META.1.e).** After 4a/4b\n   but before the commit, sample the top-3 highest-voted KNOWLEDGE.md\n   principles whose `[category]` tag matches this task and write a\n   one-sentence self-critique against each. Run:\n   ```bash\n   python3 scripts/ralph-sample-principles.py sample \\\n       --task-id \"$TASK_ID\" --append\n   ```\n   This appends (idempotently replaces) a `## Principle Checks` section to\n   `ralph-logs/LAST_ITERATION.md` with one bullet per principle. Edit the\n   file and replace each `_(fill in: ...)_` placeholder with either:\n   - `[ok] \u003cone sentence on why this change respects the principle\u003e` — or\n   - `(trigger) \u003cone sentence on how this change may violate the principle\u003e`\n   Then run:\n   ```bash\n   python3 scripts/ralph-sample-principles.py check\n   ```\n   Exit codes: `0` all ok, `1` section missing or unfilled placeholder,\n   `3` at least one `(trigger)` present. Exit `3` means you MUST tag the\n   commit message `(wip)` per META.1.a and leave the task checkbox `[ ]`.\n   Exit `1` means fix the unfilled bullets before committing. This gate is\n   runtime — the script runs inside the iteration, at commit time, not as\n   a passive post-hoc analysis.\n\n   **4b.5.5. Local-reproduce gate (CI-touching changes).** If this iteration\n   adds or changes a CI step (a `run:` block in `.github/workflows/*.yml`) OR\n   adds/changes a tool the CI runs (gosec, govulncheck, golangci-lint, hurl,\n   pulumi preview, …), you MUST run the equivalent command locally and confirm\n   it passes BEFORE committing. The 60-90s push-and-wait cycle on CI is a debugger\n   you should not be using. See `.agent_instructions/ci-triage.md` step 4 for\n   the local-reproduce table. Skipping this rung is how we shipped 13 gosec\n   findings + a tee-masked pulumi failure on 2026-04-28 and burned 3 CI-FIX\n   retries figuring it out. If the check is genuinely not reproducible locally\n   (e.g. requires runner-only secrets), state so explicitly in the commit message.\n\n   **4b.6. ADR gate (only fires for `[INFRA-DECISION]` tasks).** Before the commit:\n   ```bash\n   scripts/ralph-adr-check.sh \"$TASK_LINE\"\n   ```\n   Exit `0` if the task isn't tagged `[INFRA-DECISION]` OR an ADR was added/modified\n   in this iteration. Exit `1` means the gate fired — copy `specs/adr/TEMPLATE.md`\n   to `specs/adr/NNNN-slug.md`, fill in Context/Decision/Consequences (~1 page),\n   then re-run. The ADR captures the *why* in one searchable place so future\n   iterations don't re-litigate. After writing, also run\n   `scripts/ralph-adr-update-index.sh` so the INDEX picks up the new file.\n\n   **4c. Commit** (ONE LINE — no multi-paragraph messages):\n   ```bash\n   git add bc/ internal/ tests/ web/ migrations/ schema/ scripts/ Dockerfile docker-compose*.yml .github/ CLAUDE.md ralph-logs/KNOWLEDGE.md ralph-logs/LAST_ITERATION.md specs/adr/\n   git commit -m \"feat(scope): one-line summary\"\n   ```\n\n6. ONE task per iteration. Do not batch. STOP IMMEDIATELY after committing and writing LAST_ITERATION.md.\n   Do NOT respond to background agent completions after you have committed — each response costs ~$2 in cache reads.\n   Do NOT launch background agents for fitness tests or `make pre-commit` — they complete after you're done and waste tokens.\n\n   TURNS BUDGET — two checkpoints:\n   Turn 40 checkpoint: if you haven't started writing production code by turn 40, you are\n   exploring too long. Commit a research note with what you've learned and add a\n   continuation task. The next iteration starts with a warm cache and your notes.\n   Turn 50 checkpoint: if you haven't started writing production code, STOP.\n   You are over-reading or the task needs splitting. Commit what you have (even if partial)\n   and add a continuation task: \"{task} part 2 — {what remains}\".\n   The next iteration picks it up with a warm cache. Reading 50+ turns without coding\n   means either the task scope is wrong or you're exploring without a plan.\n\n   STALL DETECTION — self-check mid-iteration; if any \"Alert\" column fires,\n   change approach. If the different approach doesn't fix it, STOP and add the\n   issue to Discovered Issues.\n\n   | Signal              | Self-check (this session)                   | Target    | Alert → action                   |\n   |---------------------|---------------------------------------------|-----------|----------------------------------|\n   | Same error repeated | Last 2 tool/test errors identical?          | never     | yes → COMPLETELY different path  |\n   | Edit-test cycles    | Consecutive failed test runs on same code   | ≤ 3       | ≥ 5 → step back, rethink         |\n   | Tool calls / minute | Your tool calls ÷ wall-clock minutes so far | ≥ 2.2 TPM | \u003c 1.0 TPM → thrashing, simplify  |\n   | Parallelism         | Turns with ≥2 parallel calls ÷ total turns  | ≥ 0.35    | \u003c 0.20 → batch reads/greps       |\n\n   Full 8-channel framework (flail, cache hit, task latency, cost-per-commit,\n   rework) in `docs/research/ralph-behavior-signals.md` — those are measured\n   across iterations, not self-checkable mid-session.\n\n   HARD LIMITS — commit what you have and stop if ANY of these are reached:\n   - 60 turns — you are near context limit. Commit with \"(partial)\" suffix.\n   - 3 failed test-fix cycles — the approach isn't working. Revert with `git checkout -- .` and add\n     a [RESEARCH] task: \"investigate why {task} failed — {error}\". Move to next task.\n   - Tests still failing after implementation — do NOT mark task as done. Commit with \"(wip)\" suffix.\n   An incomplete commit is infinitely better than lost work from context overflow.\n\n6b. BLOCKED? SOLVE THE ROOT CAUSE FIRST — don't churn the symptom, park, or work around.\n\n   **ROOT-CAUSE-FIRST (operator directive 2026-05-27 — overrides the reflex to park).**\n   When you hit a blocker:\n   1. **Diagnose to the ROOT cause**, not the surface symptom. Ask: \"what is the\n      actual thing that must change for this to work?\" A failing CI run, a denied\n      signup, a parse error — these are symptoms. The root is *why* they fail.\n   2. **If the root is within your power → fix it NOW.** Pivot to the root fix; it\n      outranks the blocked task (you cannot finish the blocked task without it).\n      Don't re-try / re-run / work around the symptom. Fixing the root IS following\n      the dependency chain, not scope creep (see Kind A below).\n   3. **If the root is genuinely human-gated** (a credential, IAM, an external\n      system, a product decision) → **PR-and-ping**: draft everything you can as a\n      PR and ping Gustaf (per the PR-and-ping pattern). Do NOT just park-and-move-on.\n   4. **Still file the inbox task** for tracking (see ALWAYS FILE A TASK below), but\n      ALSO act on the root per (2)/(3) — the task is a record, not a substitute for\n      the fix.\n\n   **Anti-patterns to STOP:**\n   - Symptom churn — re-running CI without fixing *why* it fails (iters 99/102/104\n     re-ran CI on a phantom orphaned flake instead of fixing why CI-FIX mis-fires).\n   - Park-and-move-on without addressing the root (FUNNEL.6 parked \"signup blocked\n     by realm config\" + filed a blocker instead of fixing the realm config).\n   - Band-aid workarounds that leave the root broken.\n   - Closing a task \"blocked, no fix\" when the fix is within reach.\n\n   **Budget:** no more than 1 symptom-retry before pivoting to the root. An in-power\n   blocker gets a root-cause fix in the SAME or NEXT iteration (not a park); a\n   human-gated blocker produces a PR+ping (not a bare blocked task).\n\n   AFTER applying root-cause-first, classify the blocker:\n\n   **Kind A — fixable bug in project code (Academy Go, HTMX templates, migrations,\n   test fixtures, scenario transformers, Hills bundle schema). Do NOT mark blocked.\n   FIX IT.** The root cause of most \"blocked\" hedges is a concrete typed error,\n   parse failure, missing wiring, or contract mismatch sitting one level\n   upstream of ITER_TASK. Fixing it IS part of closing ITER_TASK — you're not\n   scope-creeping, you're following the dependency chain.\n   Protocol:\n   1. Identify the bug with one-line evidence (file:line + the failing\n      symptom — a stack trace, a diff, a failed assertion).\n   2. Fix it. Same iteration, same commit. If the fix is wholly separate\n      from ITER_TASK (touches unrelated code) spawn\n      `HILLS.SIM.FIX.\u003cslug\u003e` (or `\u003cDOMAIN\u003e.FIX.\u003cslug\u003e`) to record\n      what was fixed, then continue.\n   3. Re-run whatever verification was blocked on the bug, until\n      ITER_TASK's own Verify: step passes.\n   4. Commit once, with BOTH the fix and the ITER_TASK deliverable in it.\n      Commit message: `fix(scope): bug + feat(task): deliverable`.\n   Concrete example (SIM.5 iter 34): picked SIM.5, saw org.json parse\n   error because `booking_type` was a number but the Go struct expected a\n   string. That is Kind A. Fix the struct (or add custom unmarshalling),\n   re-run SIM.4 to produce artifacts, THEN run SIM.5 against them. Do\n   NOT mark SIM.5 blocked — the bug is in your codebase, you own it,\n   fix it. \"Let me document the blocked state\" is the wrong reflex.\n\n   **Kind C — write-permission denial.** If an `Edit` or `Write` call is\n   permission-denied for a path, do NOT ask the loop to approve the write —\n   there is no human in the loop. Instead: mark the task `SKIPPED` with\n   reason `BLOCKED_BY_PERMISSION:\u003cpath\u003e`, commit what you have, and STOP.\n   Add a task: \"Add `\u003cpath\u003e` to `.claude/settings.json` Edit/Write allowlist\".\n\n   **Kind B — infrastructure/external wall you genuinely cannot fix from\n   inside an iteration.** THESE are the cases that legitimately warrant\n   \"add a task, commit, stop\":\n   - Write-permission denied → Kind C above (do NOT ask for approval)\n   - MCP tool not available → add task: \"Fix MCP server startup for {tool}\"\n   - Container needs restart but you can't → add task: \"Restart container and verify {page}\"\n   - Task is too large for one iteration → add task: \"{task} part 2 — {what remains}\"\n   - Missing infrastructure (make target, migration, npm package) → add task: \"Add {what's missing}\"\n   - Codebase pattern unclear AFTER \u003e20 turns of investigation → add task:\n     \"[RESEARCH] investigate {pattern} and document in ARCHITECTURE.md\"\n   - External credential / secret missing → use the PR-and-ping pattern below\n     (NOT a bare `[GUSTAF]` task — see PR-AND-PING)\n   Add the task to IMPLEMENTATION_PLAN.md under \"Discovered Issues\", commit\n   what you have, and STOP. The next iteration (or a human) will pick it up.\n\n   **PR-AND-PING pattern** (operator directive 2026-05-26 — \"just create PRs and\n   ping me\"). Whenever you hit work that genuinely needs a human (a privileged\n   `pulumi up`, a secret value, an approval, an external SaaS action), draft\n   EVERYTHING you can as a PR + ping Gustaf — never a bare blocked `[GUSTAF]`\n   task that just sits and waits. The wrapper:\n\n   ```bash\n   # 1. Stage your draft changes on a fresh branch (NOT the feature branch).\n   git checkout -b ralph/\u003cshort-slug\u003e\n   git add \u003cfiles\u003e \u0026\u0026 git commit -m \"draft(\u003cscope\u003e): \u003cone-line\u003e\"\n\n   # 2. Write the SUMMARY and PRIVILEGED COMMAND to two files (DO NOT bake\n   #    secret values into either file — describe the command, let Gustaf\n   #    supply the value from his own credential store).\n   cat \u003e /tmp/summary.md \u003c\u003c'EOF'\n   Adds \u003cthing\u003e. Ralph cannot run \u003cprivileged step\u003e because \u003creason\u003e.\n   EOF\n   cat \u003e /tmp/priv-cmd.sh \u003c\u003c'EOF'\n   pulumi up --stack sandbox --yes\n   EOF\n\n   # 3. Run the wrapper — creates branch + PR + reviewer + inbox ping, AND blocks\n   #    the gated task in ralph_db so the picker stops re-handing it (--task-id).\n   scripts/ralph-pr-and-ping.sh \\\n     --title \"Apply pulumi diff for RG.X\" \\\n     --branch \"ralph/rg-x-apply\" \\\n     --summary /tmp/summary.md \\\n     --privileged-cmd-file /tmp/priv-cmd.sh \\\n     --task-id \"RG.X\"\n   ```\n\n   ALWAYS pass `--task-id \u003cid\u003e` when the ping is gated on a specific backlog task:\n   the wrapper then `POST`s `/dev/ralph/tasks/\u003cid\u003e/block` after the PR lands, so the\n   ranking picker skips it instead of burning an iteration on it every round (the\n   iter 118+121 `RALPH.CP.S7.b` churn — code done, apply vault-gated, left `[ ]`).\n   The block is best-effort (a failed POST never fails the PR+ping). Un-block by\n   flipping the task `[x]` (or `POST .../unblock`) once the human runs the step.\n\n   The wrapper assigns `gustaf-ag47` as reviewer by default (override via\n   `RALPH_REVIEWER_HANDLE` env). The reviewer handle is the one used in\n   `.github/CODEOWNERS` for human-review paths — confirmed via `gh api\n   /orgs/sweetspotio/members`. The wrapper ALSO drops a `[GUSTAF]` inbox note\n   linking the PR # so the plan picks it up next iteration.\n\n   **NEVER bake secret values into a PR body.** Describe the command Gustaf\n   runs; let him supply the secret from his own store. A PR with a secret in\n   the body is a leak, not a ping.\n\n   After running the wrapper, continue working on OTHER tasks — do NOT block\n   the loop waiting for Gustaf. The next iteration picks up the inbox note and\n   the plan tracks the PR; merge happens out-of-band.\n\n   **Heuristic to decide A vs B:** ask \"if I had 20 more turns, could I make\n   this work?\" — if yes, it's Kind A; fix it. If no, it's Kind B; log and\n   stop. Default to Kind A when in doubt — the cost of a wrong \"fix it\"\n   judgment is one extra commit; the cost of a wrong \"mark blocked\"\n   judgment is a whole iteration lost to a task that never lands.\n\n   **ALWAYS FILE A TASK (THIS IS NOT OPTIONAL):**\n   Whenever you encounter ANY of the following — even if it does not block\n   your current task — file an inbox task BEFORE you exit. Drop a file in\n   `inbox/YYYY-MM-DD-HHMM-\u003cshort-slug\u003e.md` (the next iter folds it into\n   the plan). Issues unlogged become issues forgotten.\n\n   - **Surprising or broken behavior** (something didn't work the way the\n     code/spec/comment said it would). Tag `[BUG]` or `[RESEARCH]`.\n   - **A test failed for a reason orthogonal to your change.** File even\n     if you can't fix it now. Tag `[BUG]` with the test path + failure.\n   - **Dead code, unused config, stale doc, dangling reference.** Tag\n     `[NORMAL]` or `[NICE-TO-HAVE]`.\n   - **Took \u003e2 turns to understand something** that wasn't obvious from\n     code/specs. The next person/iter shouldn't pay that cost. Tag\n     `[REFINE]` to update the relevant doc.\n   - **A script silently swallowed an error** (`|| true`, `2\u003e/dev/null`,\n     missing pipefail). Tag `[BUG]`.\n   - **A migration / config / dependency was missing or wrong** in the\n     dev environment but you worked around it. Tag `[NORMAL]`.\n   - **CI passed but the change is suspicious** (e.g., test count\n     dropped, fitness allowlist grew, gocognit warning suppressed).\n     Tag `[RESEARCH]` to audit.\n\n   **ONE TASK PER FAILURE MODE — NEVER BUNDLE.** When a multi-step\n   process produces N distinct failures, file N separate inbox tasks,\n   not one rescue task with N issues inside. Different priorities,\n   different scopes, parallelizable across iters. Single bundled tasks\n   become single bundled timeouts.\n\n   The bar for filing is intentionally low. If you hesitated for \u003e5\n   seconds wondering \"should I file this?\", file it. Cost of a frivolous\n   inbox task: ~10 lines and 0 follow-up if it's not real. Cost of NOT\n   filing a real issue: it's gone.\n\n6c. TASK TYPES — see `.agent_instructions/research-methodology.md` for detailed workflows (loaded via 0e):\n   - **[RESEARCH]**: investigate, document findings in `docs/research/`, add max 5 sub-tasks with\n     severity tags. Commit: `research(scope): summary`. Do NOT implement code. STOP after commit.\n     RESEARCH tasks produce TWO outputs: (1) a `docs/research/` artifact documenting findings\n     and recommendation, (2) follow-up CODE tasks in IMPLEMENTATION_PLAN.md that implement the\n     decision. A RESEARCH task that produces code instead of a doc is wrong — the next iteration\n     will implement the code tasks. Tasks whose body starts with \"Resolve\", \"Decide\", \"Evaluate\",\n     \"Pick between\", or asks a design question MUST be tagged `[RESEARCH]`.\n   - **[SPIKE]**: throwaway PoC, output decision + tasks if viable. Commit: `spike: {topic}`\n   - **[REFINE]**: improve existing doc/spec, add tasks for gaps. Commit: `refine: {document}`\n   - **[SLEEPER]**: recurring low-priority background work, picked up when no regular tasks remain.\n     Reduced timeout (300s). Output MUST be docs/reports/tasks — NEVER modify `internal/`.\n     **DO NOT mark the sleeper task as `[x]`** — sleepers are recurring and stay `[ ]` forever.\n     Loop.sh tracks last-run via `\u003c!-- ran: timestamp --\u003e` comment and rotates among sleepers.\n     **MANDATORY: For EVERY issue/gap/recommendation in your report, append a new task**\n     to IMPLEMENTATION_PLAN.md with a concrete Verify step.\n     Each task MUST include a severity tag: `[CRITICAL]` (breaks invariants, data loss risk),\n     `[NORMAL]` (tech debt, coupling, should fix), or `[NICE-TO-HAVE]` (cleanup, style).\n     Format: `- [ ] **XX.N** [CRITICAL] Description...`\n     A sleeper that produces 5 findings must produce 5 new tasks. A report without tasks\n     is a failed sleeper — findings that don't become tasks are forgotten within days.\n     Commit: `sleeper(scope): summary`. Max 30 turns.\n   - **[BG-POLL \\\u003csentinel\\\u003e]**: task is SKIPPED by the task picker while the sentinel file\n     doesn't exist. When the file appears, the task becomes pickable and the LLM runs once\n     for aggregation. Use for long-running background processes (bench runs, data imports)\n     where polling wastes $1-5/iter for zero-diff iterations. The bench-launching iteration\n     creates the sentinel on completion: `nohup bash -c './bench.sh \u0026\u0026 touch .bg-poll/my.done' \u0026`.\n     Tag the follow-up task `[BG-POLL .bg-poll/my.done]`. While the sentinel is absent, the\n     task picker skips it and picks other work; if no other work remains, Ralph enters idle mode.\n     Env: `RALPH_BG_POLL_WAIT_S` (default 300s) controls the secondary sleep-guard interval.\n   - All others: implement code as normal.\n\n6a. GITHUB WORKFLOW HEALTH — HIGHEST PRIORITY:\n    If this iteration is `CI-FIX` (ITER_TASK=CI-FIX or CI_CONTEXT_FILE is set), GitHub workflow\n    failures are your ONLY job. Do not start any other work until all failing workflows are resolved.\n    This applies to ALL workflows — not just \"Continuous Integration\":\n\n    | Workflow | How to fix |\n    |---|---|\n    | Continuous Integration | Read CI logs → identify category (Docker build / compile / test / swag / trivy / gosec) → fix root cause |\n    | Trivy Image Scan | Update distroless SHA: `docker pull gcr.io/distroless/static-debian12:nonroot \u0026\u0026 docker inspect --format='{{index .RepoDigests 0}}' gcr.io/distroless/static-debian12:nonroot` → replace line in Dockerfile |\n    | Deploy Sandbox | Check deploy logs via SSM (`docker logs academy-app-1 --tail 50`) → identify what failed |\n    | Infra Drift Detection | Read the issue body → identify drifted resources → fix in `infra/aws/*.go` |\n    | Any other workflow | Read the run logs via `gh run view \u003crun-id\u003e --log-failed` → identify and fix |\n\n    Check failures on BOTH branches:\n    ```bash\n    gh run list --branch main --limit 8 --json workflowName,conclusion,headSha,url | python3 -c \"import sys,json; [print(r['workflowName'],r['conclusion'],r['url']) for r in json.load(sys.stdin) if r['conclusion']=='failure']\"\n    gh run list --branch \"$(git branch --show-current)\" --limit 8 --json workflowName,conclusion,headSha,url | python3 -c \"import sys,json; [print(r['workflowName'],r['conclusion'],r['url']) for r in json.load(sys.stdin) if r['conclusion']=='failure']\"\n    ```\n\n    CI uses `docker-compose.ci.yml` + `BUILD_TARGET=ci`. Read `scripts/ci.sh` for the pipeline.\n    Verify fix compiles locally before committing. Include Dockerfile, scripts/, .github/ in git add.\n    **No retry limit** — keep fixing until `gh run list` shows only successes.\n7. See CLAUDE.md \"Project Guard Rails\" for engine rules, infrastructure restrictions, and outcome vocabulary.\n   **INFRA GUARDRAIL:** If your task requires a new SSM parameter, DNS record, EC2 cloud-init change, security group rule, S3 bucket, or any other AWS resource: (a) write the Pulumi Go code in `infra/aws/` FIRST, (b) run `pulumi preview --stack sandbox` locally to verify the diff, (c) let `make snapshot` → PR → CI apply it. NEVER use `aws ssm put-parameter`, `aws route53 change-resource-record-sets`, or any AWS CLI write command to create or mutate infra directly. The verify condition for any infra-touching task must cite the `infra/aws/*.go` file changed AND confirm the resource appeared in a `pulumi preview` diff.\n\n   ## GitOps — Never Write Directly to EC2\n\n   All sandbox changes go through Git → CI → deploy. This means:\n\n   - DO NOT use `aws ssm send-command` to write files, patch configs, or restart services\n   - DO NOT use `aws s3 cp` to push config files to the EC2 as a workaround\n   - DO NOT call `caddy reload`, `docker compose restart`, or `systemctl` via SSM to apply undeployed changes\n\n   The correct path: edit locally → git commit → make snapshot → merge → CI deploys.\n\n   SSM is allowed READ-ONLY: docker logs, docker ps, psql SELECT, curl health checks.\n   SSM put-parameter is allowed for NEW credentials only (never config).\n\n   If you are tempted to SSM-write something: STOP. Commit the change instead.\n\n8. If all tasks in the current slice are checked, output \"Slice N complete.\" and stop.\n   Do NOT start the next slice without a plan regeneration (`./loop.sh plan 1`).\n9. When you learn something new about building or testing, update CLAUDE.md\n   (Operational Notes section) — keep it brief. Status updates go in IMPLEMENTATION_PLAN.md.\n10. KNOWLEDGE BASE: Read ralph-logs/KNOWLEDGE.md at start. Increment votes if an entry helps.\n    Add new entries for gotchas/patterns you discover (votes: 1). Keep entries 3-5 lines max.\n","overlay → PROMPT_build.md":"","plan (ralph-2)":"We are building a predicate-based rules engine grounded in many-sorted first-order logic (MS-FOL), a golf domain consumer, and an HTMX admin frontend. The engine is domain-agnostic: consumers register sorts, symbols, and evaluation contexts at runtime.\n\n0a. Study @specs/README.md for the architecture overview and full spec index.\n0b. Study specs in parallel subagents. Prioritize by slice:\n    - Slice 1 (engine): specs 01-04, 11\n    - Slice 2 (golf domain): specs domain/05-06\n    - Slice 3 (event sourcing): specs domain/07, 08-event-store-walkthrough\n    - Slice 4 (booking+pricing): specs domain/08-09\n    - Slice 5 (frontend): specs infrastructure/13\n    - Slice 6 (NLP+DSL): specs domain/14, 09\n    - Slice 7 (migration): specs domain/15\n    - Supporting: specs domain/10-12, adr/\n0c. Study the codebase with parallel subagents: `bc/`, `internal/`, `cmd/`, `tests/`, `migrations/`.\n    Do NOT assume anything is missing — search first.\n0d. If `make dev` is running, inspect the live app and database via chrome-devtools MCP and psql.\n\n1. Compare ALL specs against the codebase. For each component across all 7 slices\n   (spec domain/16 §13), determine: does it exist? Is it complete? Is it tested?\n   Use parallel subagents. Check for:\n   - Existing implementations (even partial)\n   - TODOs, stubs, placeholder implementations\n   - Skipped or flaky tests\n   - Inconsistencies between code and specs\n   - Infrastructure already built vs still needed\n\n2. Generate @IMPLEMENTATION_PLAN.md as a prioritized checklist covering the FULL system,\n   organized into the 7 vertical slices from spec domain/16 §13:\n\n   ```markdown\n   # Implementation Plan\n\n   Generated: {date}\n   Last updated: {date}\n\n   ## Current Focus\n   {Which slice and what specifically}\n\n   ## Slice 1: FOL Engine Core (specs 01-04, 11)\n\n   Engine library: sorts, AST, symbol registry, fact store, type checker, evaluator, rules.\n   Gate: `go test ./internal/engine/... -race` passes. Type checker rejects every invalid AST.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 2: Golf Domain Bindings + Evaluation (specs domain/05-06)\n\n   Golf-specific predicates, EvaluationContext assembly, ground fact projections.\n   Gate: All golf predicates registered. 1000 concurrent evaluations pass under -race.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 3: Event Sourcing + Persistence (specs domain/07, 08-event-store)\n\n   Append-only event store, aggregate reconstruction, projections, PostgreSQL adapters.\n   Gate: `make test-integration` passes. Deterministic replay. Optimistic concurrency.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 4: Booking Engine (specs domain/08-09)\n\n   End-to-end booking flow with rule evaluation, pricing pipeline, slot management.\n   Gate: Create org -\u003e course -\u003e pass -\u003e rules -\u003e book (allowed) -\u003e book again (denied).\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 5: Frontend (spec infrastructure/13)\n\n   HTMX + DaisyUI admin UI: rule composer, trace viewer, simulator, symbol browser.\n   Gate: Rule composer produces valid ASTs. Trace viewer renders evaluation logs.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 6: NLP Decomposition (specs domain/14, 09)\n\n   Natural language -\u003e typed AST with human approval. DSL parser and read-back.\n   Gate: Swedish example decomposes into 5 resolved rules.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 7: Migration (spec domain/15)\n\n   PHP -\u003e Go migration tooling with parallel validation.\n   Gate: 0 discrepancies for 7 consecutive days.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Discovered Issues\n   {Issues found during planning}\n\n   ## Notes\n   {Architectural observations, existing code to leverage, patterns discovered}\n   ```\n\n   Adapt the plan based on what you find. Break large tasks into one-iteration-sized pieces.\n   Annotate dependencies: `(blocks: X, Y)`. Note existing code to leverage.\n   For infrastructure specs (17-40), only plan tasks for things NOT already implemented.\n\n3. Commit: `git add IMPLEMENTATION_PLAN.md \u0026\u0026 git commit -m \"plan: full implementation plan from spec analysis\"`\n\n99999. PLAN ONLY. Do NOT implement anything. Do NOT write Go code.\n999999. Do NOT assume functionality is missing — search the codebase first with parallel subagents.\n9999999. Every task must reference which spec and section it implements.\n99999999. If you find existing code that partially implements a spec, note what's done and what's missing — do NOT plan to rewrite from scratch. Build on what exists.\n999999999. Key design decisions are RESOLVED (see @AGENTS.md). Do NOT revisit them.\n9999999999. Study @specs/adr/ — the 15 ADRs document design decisions with rationale. Respect them.\n99999999999. Each task should be completable in ONE Ralph build iteration. If a task is too large, split it.\n999999999999. The 7 slices are vertical — each delivers shippable value. You can stop after any slice. Later slices build on earlier ones but don't invalidate them.\n9999999999999. Domain aggregates that already exist (Organization, Resource, Event) need EXTENSION for engine integration, not rewriting. Plan the delta.\n99999999999999. Infrastructure specs (17-40) describe the EXISTING platform. Only plan tasks for gaps between spec and implementation.\n999999999999999. The plan must be specific enough that a stateless AI agent (with no memory of this planning session) can pick any task and implement it from the spec references alone.\n","plan (ralph-3)":"We are building a predicate-based rules engine grounded in many-sorted first-order logic (MS-FOL), a golf domain consumer, and an HTMX admin frontend. The engine is domain-agnostic: consumers register sorts, symbols, and evaluation contexts at runtime.\n\n0a. Study @specs/README.md for the architecture overview and full spec index.\n0b. Study specs in parallel subagents. Prioritize by slice:\n    - Slice 1 (engine): specs 01-04, 11\n    - Slice 2 (golf domain): specs domain/05-06\n    - Slice 3 (event sourcing): specs domain/07, 08-event-store-walkthrough\n    - Slice 4 (booking+pricing): specs domain/08-09\n    - Slice 5 (frontend): specs infrastructure/13\n    - Slice 6 (NLP+DSL): specs domain/14, 09\n    - Slice 7 (migration): specs domain/15\n    - Supporting: specs domain/10-12, adr/\n0c. Study the codebase with parallel subagents: `internal/`, `cmd/`, `tests/`, `migrations/`.\n    Do NOT assume anything is missing — search first.\n0d. If `make dev` is running, inspect the live app and database via chrome-devtools MCP and psql.\n\n1. Compare ALL specs against the codebase. For each component across all 7 slices\n   (spec domain/16 §13), determine: does it exist? Is it complete? Is it tested?\n   Use parallel subagents. Check for:\n   - Existing implementations (even partial)\n   - TODOs, stubs, placeholder implementations\n   - Skipped or flaky tests\n   - Inconsistencies between code and specs\n   - Infrastructure already built vs still needed\n\n2. Generate @IMPLEMENTATION_PLAN.md as a prioritized checklist covering the FULL system,\n   organized into the 7 vertical slices from spec domain/16 §13:\n\n   ```markdown\n   # Implementation Plan\n\n   Generated: {date}\n   Last updated: {date}\n\n   ## Current Focus\n   {Which slice and what specifically}\n\n   ## Slice 1: FOL Engine Core (specs 01-04, 11)\n\n   Engine library: sorts, AST, symbol registry, fact store, type checker, evaluator, rules.\n   Gate: `go test ./internal/engine/... -race` passes. Type checker rejects every invalid AST.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 2: Golf Domain Bindings + Evaluation (specs domain/05-06)\n\n   Golf-specific predicates, EvaluationContext assembly, ground fact projections.\n   Gate: All golf predicates registered. 1000 concurrent evaluations pass under -race.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 3: Event Sourcing + Persistence (specs domain/07, 08-event-store)\n\n   Append-only event store, aggregate reconstruction, projections, PostgreSQL adapters.\n   Gate: `make test-integration` passes. Deterministic replay. Optimistic concurrency.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 4: Booking Engine (specs domain/08-09)\n\n   End-to-end booking flow with rule evaluation, pricing pipeline, slot management.\n   Gate: Create org -\u003e course -\u003e pass -\u003e rules -\u003e book (allowed) -\u003e book again (denied).\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 5: Frontend (spec infrastructure/13)\n\n   HTMX + DaisyUI admin UI: rule composer, trace viewer, simulator, symbol browser.\n   Gate: Rule composer produces valid ASTs. Trace viewer renders evaluation logs.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 6: NLP Decomposition (specs domain/14, 09)\n\n   Natural language -\u003e typed AST with human approval. DSL parser and read-back.\n   Gate: Swedish example decomposes into 5 resolved rules.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 7: Migration (spec domain/15)\n\n   PHP -\u003e Go migration tooling with parallel validation.\n   Gate: 0 discrepancies for 7 consecutive days.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Discovered Issues\n   {Issues found during planning}\n\n   ## Notes\n   {Architectural observations, existing code to leverage, patterns discovered}\n   ```\n\n   Adapt the plan based on what you find. Break large tasks into one-iteration-sized pieces.\n   Annotate dependencies: `(blocks: X, Y)`. Note existing code to leverage.\n   For infrastructure specs (17-40), only plan tasks for things NOT already implemented.\n\n3. Commit: `git add IMPLEMENTATION_PLAN.md \u0026\u0026 git commit -m \"plan: full implementation plan from spec analysis\"`\n\n99999. PLAN ONLY. Do NOT implement anything. Do NOT write Go code.\n999999. Do NOT assume functionality is missing — search the codebase first with parallel subagents.\n9999999. Every task must reference which spec and section it implements.\n99999999. If you find existing code that partially implements a spec, note what's done and what's missing — do NOT plan to rewrite from scratch. Build on what exists.\n999999999. Key design decisions are RESOLVED (see @AGENTS.md). Do NOT revisit them.\n9999999999. Study @specs/adr/ — the 15 ADRs document design decisions with rationale. Respect them.\n99999999999. Each task should be completable in ONE Ralph build iteration. If a task is too large, split it.\n999999999999. The 7 slices are vertical — each delivers shippable value. You can stop after any slice. Later slices build on earlier ones but don't invalidate them.\n9999999999999. Domain aggregates that already exist (Organization, Resource, Event) need EXTENSION for engine integration, not rewriting. Plan the delta.\n99999999999999. Infrastructure specs (17-40) describe the EXISTING platform. Only plan tasks for gaps between spec and implementation.\n999999999999999. The plan must be specific enough that a stateless AI agent (with no memory of this planning session) can pick any task and implement it from the spec references alone.\n","plan (ralph-4)":"We are building a predicate-based rules engine grounded in many-sorted first-order logic (MS-FOL), a golf domain consumer, and an HTMX admin frontend. The engine is domain-agnostic: consumers register sorts, symbols, and evaluation contexts at runtime.\n\n0a. Study @specs/README.md for the architecture overview and full spec index.\n0b. Study specs in parallel subagents. Prioritize by slice:\n    - Slice 1 (engine): specs 01-04, 11\n    - Slice 2 (golf domain): specs domain/05-06\n    - Slice 3 (event sourcing): specs domain/07, 08-event-store-walkthrough\n    - Slice 4 (booking+pricing): specs domain/08-09\n    - Slice 5 (frontend): specs infrastructure/13\n    - Slice 6 (NLP+DSL): specs domain/14, 09\n    - Slice 7 (migration): specs domain/15\n    - Supporting: specs domain/10-12, adr/\n0c. Study the codebase with parallel subagents: `bc/`, `internal/`, `cmd/`, `tests/`, `migrations/`.\n    Do NOT assume anything is missing — search first.\n0d. If `make dev` is running, inspect the live app and database via chrome-devtools MCP and psql.\n\n1. Compare ALL specs against the codebase. For each component across all 7 slices\n   (spec domain/16 §13), determine: does it exist? Is it complete? Is it tested?\n   Use parallel subagents. Check for:\n   - Existing implementations (even partial)\n   - TODOs, stubs, placeholder implementations\n   - Skipped or flaky tests\n   - Inconsistencies between code and specs\n   - Infrastructure already built vs still needed\n\n2. Generate @IMPLEMENTATION_PLAN.md as a prioritized checklist covering the FULL system,\n   organized into the 7 vertical slices from spec domain/16 §13:\n\n   ```markdown\n   # Implementation Plan\n\n   Generated: {date}\n   Last updated: {date}\n\n   ## Current Focus\n   {Which slice and what specifically}\n\n   ## Slice 1: FOL Engine Core (specs 01-04, 11)\n\n   Engine library: sorts, AST, symbol registry, fact store, type checker, evaluator, rules.\n   Gate: `go test ./internal/engine/... -race` passes. Type checker rejects every invalid AST.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 2: Golf Domain Bindings + Evaluation (specs domain/05-06)\n\n   Golf-specific predicates, EvaluationContext assembly, ground fact projections.\n   Gate: All golf predicates registered. 1000 concurrent evaluations pass under -race.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 3: Event Sourcing + Persistence (specs domain/07, 08-event-store)\n\n   Append-only event store, aggregate reconstruction, projections, PostgreSQL adapters.\n   Gate: `make test-integration` passes. Deterministic replay. Optimistic concurrency.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 4: Booking Engine (specs domain/08-09)\n\n   End-to-end booking flow with rule evaluation, pricing pipeline, slot management.\n   Gate: Create org -\u003e course -\u003e pass -\u003e rules -\u003e book (allowed) -\u003e book again (denied).\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 5: Frontend (spec infrastructure/13)\n\n   HTMX + DaisyUI admin UI: rule composer, trace viewer, simulator, symbol browser.\n   Gate: Rule composer produces valid ASTs. Trace viewer renders evaluation logs.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 6: NLP Decomposition (specs domain/14, 09)\n\n   Natural language -\u003e typed AST with human approval. DSL parser and read-back.\n   Gate: Swedish example decomposes into 5 resolved rules.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Slice 7: Migration (spec domain/15)\n\n   PHP -\u003e Go migration tooling with parallel validation.\n   Gate: 0 discrepancies for 7 consecutive days.\n\n   - [ ] {task} (spec NN §M)\n   ...\n\n   ## Discovered Issues\n   {Issues found during planning}\n\n   ## Notes\n   {Architectural observations, existing code to leverage, patterns discovered}\n   ```\n\n   Adapt the plan based on what you find. Break large tasks into one-iteration-sized pieces.\n   Annotate dependencies: `(blocks: X, Y)`. Note existing code to leverage.\n   For infrastructure specs (17-40), only plan tasks for things NOT already implemented.\n\n3. Commit: `git add IMPLEMENTATION_PLAN.md \u0026\u0026 git commit -m \"plan: full implementation plan from spec analysis\"`\n\n99999. PLAN ONLY. Do NOT implement anything. Do NOT write Go code.\n999999. Do NOT assume functionality is missing — search the codebase first with parallel subagents.\n9999999. Every task must reference which spec and section it implements.\n99999999. If you find existing code that partially implements a spec, note what's done and what's missing — do NOT plan to rewrite from scratch. Build on what exists.\n999999999. Key design decisions are RESOLVED (see @AGENTS.md). Do NOT revisit them.\n9999999999. Study @specs/adr/ — the 15 ADRs document design decisions with rationale. Respect them.\n99999999999. Each task should be completable in ONE Ralph build iteration. If a task is too large, split it.\n999999999999. The 7 slices are vertical — each delivers shippable value. You can stop after any slice. Later slices build on earlier ones but don't invalidate them.\n9999999999999. Domain aggregates that already exist (Organization, Resource, Event) need EXTENSION for engine integration, not rewriting. Plan the delta.\n99999999999999. Infrastructure specs (17-40) describe the EXISTING platform. Only plan tasks for gaps between spec and implementation.\n999999999999999. The plan must be specific enough that a stateless AI agent (with no memory of this planning session) can pick any task and implement it from the spec references alone.\n"},"specs":[{"spec_id":"research-2026-06-09-spec-to-execution-tracking","path":"specs/research/2026-06-09_spec-to-execution-tracking.md","title":"Spec → Decomposition → Fleet Execution Tracking on the Ralph Board","version":1,"ingested_at":"2026-06-13T02:45:08Z","body_md":"# Spec → Decomposition → Fleet Execution Tracking on the Ralph Board\n\n**Status:** Research/design spec — for discussion\n**Date:** 2026-06-09\n**Author:** Gustaf (product), Claude (drafting)\n**Question answered:** *\"I want to spec a beefy thing (storefront + platform-topology specs) and track how it decomposes into tasks that Ralph VMs pick up and implement — in the sandbox C2 dashboard. What are the missing pieces?\"*\n**Builds on:** spec 45 (`45-developer-observability-kanban.md` — DORA/flow/kanban basis + the feedback→board→delivery loop), `feedback-to-ralph-kanban.md`, RALPH.CP.SSOT (bootstrap SSOT, ~built), the live `ralph_db` schema (verified on sandbox 2026-06-09).\n\n---\n\n## 1. What the C2 plane already does well (verified live — do NOT rebuild)\n\n| Capability | Where | State |\n|---|---|---|\n| Task SSOT with claims | `academy.ralph_tasks`: status `open/claimed/blocked/done`, priority+generated rank, `claim_until`, `FOR UPDATE SKIP LOCKED` picker | ✅ live |\n| **First-class dependencies** | `blocked_by text[]` + GIN index | ✅ live (underused by UI) |\n| Cross-instance routing | `domain`, `assigned_to`, per-branch indexes (CP.S8) | ✅ live |\n| Kanban board | `/dev/ralph/board`: columns, drag-to-promote, instance picker, USER INBOX (feedback threads), DONE-with-PR-link | ✅ live |\n| Feedback→delivery loop | spec 45 §4.1: feedback → route-to-board → `FEEDBACK.\u003cid\u003e` task → Ralph PR → reply → resolved | ✅ live |\n| **Event-sourced task aggregate** | `ralph_task` aggregate with `/dev/ralph/aggregate/ralph_task/log` + per-event **revert** | ✅ live — the extension seam |\n| Per-iteration telemetry | `ralph_metrics`: cost, turns, commit SHAs, evaluator_verdict, per session/iter; iteration drill-down endpoint | ✅ live |\n| Plan snapshots + ingest | `ralph_plan_snapshots`, `academy-ralph-plan-ingest` (sole writer of `body_md`) | ✅ live — the precedent for spec ingest |\n| Per-instance bootstrap SSOT | `/dev/ralph/bootstrap?instance=` (plan+prompt+config) | ✅ live |\n\n**The honest gap in one sentence:** everything above operates on *individual tasks*; the only notion of \"epic\" or \"spec\" is the **dotted naming convention** (`OPS.ZDT.S1`, `RALPH.TENANT.S3.b`) — never modeled, so the board cannot answer *\"how far along is the storefront spec, what's blocked, what did it cost, and what did it ship?\"*\n\n## 2. The missing pieces (each one independently observable today)\n\n1. **Spec as an entity.** Specs live only in git. The dashboard can't render the spec being executed, link a task to the section that motivates it, or notice the spec changed after decomposition. *(Precedent exists: plan snapshots — spec snapshots are the same move.)*\n2. **Epic as an entity.** No `epic_id` — progress, cost, and blockage can't roll up. Today I compute \"OPS.ZDT is 80% done\" by grepping checkboxes over three files.\n3. **The decomposition act is untracked.** Spec→tasks happens in an orchestrator/operator's head, lands as hand-written inbox markdown. No record of *what decomposition was proposed, by whom, reviewed when, published as which tasks* — and no re-decomposition path when the spec evolves (today: a confusing pile of follow-up inbox drops).\n4. **No review gate between \"proposed\" and \"pickable.\"** The moment a task row lands `open`, a VM can claim it. For a beefy epic you want draft → operator edits/re-orders on the board → publish. *(The \"phantom-slice / _invalid_\" inbox files on ralph-1 are this missing gate, visible as workaround debris.)*\n5. **Dependency DAG is invisible.** `blocked_by[]` exists but the board renders columns, not the graph — for an 8-slice epic with cross-instance edges (storefront S5 needs S1; metering P2 needs P1) the operator can't see the critical path or what publishing unblocks.\n6. **Acceptance ≠ done.** `status=done` means *Ralph said done*. The false-done class (wip heartbeat fix marked complete; sandbox-verify skipped) has bitten repeatedly. The per-iteration `evaluator_verdict` exists but never gates task state, and `[VERIFY-SANDBOX]` is prose in `block_md`.\n7. **Traceability chain is fragmented.** task→commit SHAs (in `ralph_metrics`), task→PR (DONE column), deploy→SHA (deploy workflow) all exist separately; nothing renders *spec § → task → commits → PR → deploy → verify evidence* as one chain.\n8. **Cost rolls up per iteration, not per epic.** The $10–15/task-arc discipline (CLAUDE.md) is enforced by eyeball; \"what did the ZDT epic cost?\" requires manual joins I do by hand in retros.\n\n## 3. Design — smallest schema that closes all eight\n\nExtend the **existing event-sourced `ralph_task` pattern** (new aggregates + columns, not a new system):\n\n```\nralph_specs   (spec_id PK, path, title, content_hash, body_md, ingested_at, version)\n              -- snapshot ingest, same mechanism as ralph_plan_snapshots;\n              -- re-ingest on change bumps version (spec-drift becomes visible)\n\nralph_epics   (epic_id PK, spec_id FK, title, status draft|active|done|parked,\n               prefix,           -- e.g. 'STOREFRONT' — retrofits the naming convention\n               created_by, published_at)\n\nralph_tasks   += epic_id (nullable FK)         -- rollup key\n              += spec_ref text                  -- 'path#section-anchor' provenance\n              += status 'draft'                 -- the review gate (draft not pickable:\n                                                --  every pickable index already filters\n                                                --  status='open', so draft is invisible\n                                                --  to pickers with zero picker changes)\n              += accepted_at, accepted_by       -- acceptance gate distinct from done\n```\n\nEvents on the existing aggregate stream: `SpecIngested`, `EpicCreated`, `DecompositionProposed{epic, tasks[]}`, `TaskPublished`, `TaskAccepted{evidence}`, `EpicCompleted`. Free because the aggregate log + revert endpoints already exist — a bad decomposition is *revertable* like any other ralph_task event.\n\n### The decomposition workflow (the new verb)\n```\n1  INGEST     operator (or CI on spec merge) POSTs the spec file\n              → ralph_specs snapshot, renders in a board side panel\n2  DECOMPOSE  an agent run (orchestrator session or headless `claude -p`\n              \"decompose-epic\" job) reads the spec → proposes the task DAG:\n              ids, priorities, domains (→ which VM lane), blocked_by edges,\n              Verify: lines, spec_ref anchors → lands as status='draft'\n3  REVIEW     board \"DRAFT\" swimlane: operator edits titles/priorities,\n              re-wires DAG edges, deletes overreach — drag-to-promote UX reused\n4  PUBLISH    draft→open per task or whole-epic; instantly pickable by the\n              fleet through the UNCHANGED picker\n5  TRACK      epic header bar: n/m done · cost-to-date · oldest-WIP-age ·\n              blocked-count · DAG mini-map; task cards show spec_ref chip\n6  ACCEPT     done→accepted gates on evidence: evaluator_verdict, commits\n              reachable from origin/main, [VERIFY-SANDBOX] artifact link;\n              epic completes only when all tasks ACCEPTED (kills false-done)\n7  RE-DECOMPOSE spec v(N+1) ingested → agent diffs against open epic tasks\n              → proposes add/modify/close set → same review gate\n```\n\n**Naming note:** this is *task* decomposition tooling for the operator loop — distinct from the deferred NLP rule \"Decomposer\" (CLAUDE.md guard rail; spec 11 §3). Call it the **epic planner** to avoid collision.\n\n### Metrics (spec 45-conformant, no fabricated ETAs)\nPer epic: throughput (accepted/wk), WIP, age of oldest open task, blocked count, **cost-to-date** (join `ralph_metrics` on task_id), flow efficiency. Explicitly NO completion-date forecasts (no-estimates rule); the burndown shows *observed* flow only.\n\n## 4. Slices (each shippable; first three give 80% of the value)\n\n| # | Slice | Delivers | Verify |\n|---|---|---|---|\n| **TRACK.S1** | `ralph_specs` ingest + board spec panel | The beefy spec is *visible in the C2 UI*, version-stamped | ingest storefront+topology specs; render; re-ingest bumps version |\n| **TRACK.S2** | `ralph_epics` + `epic_id`/`spec_ref` on tasks + **prefix retrofit** (backfill epic_id for OPS.ZDT.\\*, RALPH.TENANT.\\*, STOREFRONT.\\*) + epic progress header on board | Rollup: n/m, cost-to-date, blocked count — including for *historical* epics | board shows OPS.ZDT 5/7 with real cost from ralph_metrics |\n| **TRACK.S3** | `draft` status + DRAFT swimlane + publish action (+ decompose agent job posting drafts) | The review gate; decomposition becomes a tracked, revertable act | drafted tasks invisible to `next-task`; publish → picked next iteration |\n| TRACK.S4 | DAG view from `blocked_by` + cross-instance lanes | Critical path + \"publishing X unblocks Y,Z\" visible | render storefront S1→S5 edge; lane per domain/instance |\n| TRACK.S5 | Acceptance gate (done→accepted w/ evidence chips: verdict, main-reachable SHA, sandbox artifact) | Kills the false-done class at the board level | a wip-marked-done task shows unaccepted + why |\n| TRACK.S6 | Traceability chain view (spec§ → task → commits → PR → deploy) | One-click audit of what a spec shipped | walk OPS.ZDT.CADDY-GRACEFUL end-to-end (it has the full chain today) |\n| TRACK.S7 | Re-decomposition diff flow | Spec evolution without inbox-drop chaos | edit spec → proposed delta → review → publish |\n\n**Dogfood plan:** ship TRACK.S1–S3, then run the **storefront + platform-topology epics through the new pipeline as their own first decomposition** — STOREFRONT.S1–S8 and P1–P7 become the inaugural tracked epics, and the tracking system's remaining slices (S4–S7) are themselves tracked by it.\n\n## 5. Open decisions\n1. **Who runs DECOMPOSE?** Orchestrator session (human-in-loop, today's reality) vs a headless on-sandbox `claude -p` job triggered from the board (button: \"propose decomposition\"). *(Lean: start orchestrator-driven posting drafts via API; headless later — it's the same API.)*\n2. **Spec ingest trigger:** manual POST vs CI-on-merge for `specs/**` changes. *(Lean: CI-on-merge — spec drift then surfaces automatically.)*\n3. **Acceptance authority:** operator-only click vs auto-accept when evidence predicates pass (verdict=pass ∧ SHA on main ∧ artifact present). *(Lean: auto with operator override — consistent with the fleet's autonomy level.)*\n4. **Epic ↔ instance assignment:** keep domain-routing only, or add epic-level lane pinning (whole epic → ralph-1)? *(Lean: domain routing suffices; epics already split by domain naturally.)*\n5. Where the decompose-agent prompt lives: repo (`prompts/epic-planner.md`, versioned, fleet-visible) — aligns with the in-repo-docs rule.\n"}],"epics":[{"epic_id":"OPS.ZDT","title":"Zero-downtime deploy","status":"active","prefix":"OPS.ZDT","total_tasks":14,"done_tasks":13,"blocked_tasks":1,"oldest_wip_at":"2026-06-20T22:49:52Z","cost_to_date_usd":2.60537},{"epic_id":"PLATFORM","title":"Platform","status":"active","prefix":"PLATFORM","total_tasks":10,"done_tasks":9,"blocked_tasks":1,"oldest_wip_at":"2026-06-20T21:14:54Z","cost_to_date_usd":0},{"epic_id":"RALPH.CP","title":"Ralph control-plane DB cutover","status":"active","prefix":"RALPH.CP","total_tasks":78,"done_tasks":67,"blocked_tasks":8,"oldest_wip_at":"2026-06-20T23:21:58Z","cost_to_date_usd":855.950598},{"epic_id":"RALPH.TENANT","title":"Ralph multi-tenant control plane","status":"active","prefix":"RALPH.TENANT","total_tasks":22,"done_tasks":22,"blocked_tasks":0,"oldest_wip_at":"","cost_to_date_usd":30.504308},{"epic_id":"STOREFRONT","title":"Storefront","status":"active","prefix":"STOREFRONT","total_tasks":0,"done_tasks":0,"blocked_tasks":0,"oldest_wip_at":"","cost_to_date_usd":0},{"epic_id":"TRACK","title":"Spec-to-execution tracking","status":"active","prefix":"TRACK","total_tasks":7,"done_tasks":3,"blocked_tasks":4,"oldest_wip_at":"2026-06-20T23:21:58Z","cost_to_date_usd":0}]}
