From 0704a7a65bc82fc93d7c25546079e69bb8a44784 Mon Sep 17 00:00:00 2001 From: Snider Date: Wed, 25 Mar 2026 13:35:14 +0000 Subject: [PATCH] =?UTF-8?q?feat:=20session=20continuity=20plans=20?= =?UTF-8?q?=E2=80=94=20RFC.plan.md=20+=20plan.1=20+=20plan.2?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit RFC.plan.md: master context document for future sessions - 5 root causes, 3 critical bugs, key decisions, what NOT to do - Session context that won't survive compact - Cross-references to existing RFCs that solve problems RFC.plan.1.md: first session priorities - Fix 3 critical bugs (one-line changes) - AX-7 rename for core/go - Start Registry[T] RFC.plan.2.md: subsequent session goals - Registry + migration - Action system - core/agent cascade fix - c.Process() + go-process v0.7.0 Future sessions: read RFC.plan.md first, then the numbered plan for that session's scope. Co-Authored-By: Virgil --- docs/RFC.plan.1.md | 85 ++++++++++++++++++++++++++++++++++++++++++ docs/RFC.plan.2.md | 43 +++++++++++++++++++++ docs/RFC.plan.md | 93 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 221 insertions(+) create mode 100644 docs/RFC.plan.1.md create mode 100644 docs/RFC.plan.2.md create mode 100644 docs/RFC.plan.md diff --git a/docs/RFC.plan.1.md b/docs/RFC.plan.1.md new file mode 100644 index 0000000..62fac62 --- /dev/null +++ b/docs/RFC.plan.1.md @@ -0,0 +1,85 @@ +# RFC Plan 1 — First Session Priorities + +> Read RFC.plan.md first. This is what to do in the FIRST session after compact. + +## Priority 1: Fix the 3 Critical Bugs (Plan 1) + +These are one-line to five-line changes. Ship as v0.7.1. + +### Bug 1: ACTION stops on !OK (ipc.go line ~33) + +```go +// CURRENT (broken — handler 3 failing silences handlers 4 and 5): +for _, h := range handlers { + if r := h(c, msg); !r.OK { return r } +} + +// FIX: +for _, h := range handlers { + func() { + defer func() { if r := recover(); r != nil { Error("handler panic", "err", r) } }() + h(c, msg) + }() +} +``` + +This also fixes P7-3 (no panic recovery) in the same change. + +### Bug 2: Run() leaks on startup failure (core.go Run method) + +Add one line: +```go +func (c *Core) Run() { + defer c.ServiceShutdown(context.Background()) // ADD THIS + // ... rest unchanged +} +``` + +### Bug 3: Remove stale Embed() and fix comment + +Delete `func (c *Core) Embed() Result` from core.go. +Fix the `New()` comment to show `*Core` return. + +### Test all 3 with AX-7 naming: +``` +TestIpc_Action_Ugly_HandlerFailsChainContinues +TestIpc_Action_Ugly_HandlerPanicsChainContinues +TestCore_Run_Ugly_StartupFailureCallsShutdown +``` + +## Priority 2: AX-7 Rename for core/go + +Run the same Python rename script used on core/agent: + +```python +# Same script from core/agent session — applies to any Go package +# Changes TestFoo_Good to TestFile_Foo_Good +``` + +This is mechanical. No logic changes. Just naming. + +Then run gap analysis: +```bash +python3 -c "... same gap analysis script ..." +``` + +## Priority 3: Start Registry[T] (Plan 2) + +Create `registry.go` with the type. Write tests FIRST (AX-7 complete from day one): + +``` +TestRegistry_Set_Good +TestRegistry_Set_Bad +TestRegistry_Set_Ugly +TestRegistry_Get_Good +... +``` + +Then migrate `serviceRegistry` first (most tested, most used). + +## What To Skip In First Session + +- Plan 3 (Actions) — needs Registry first +- Plan 4 (Process) — needs Actions first +- Plan 6 (ecosystem sweep) — needs everything first +- Any breaking changes — v0.7.1 is additive only diff --git a/docs/RFC.plan.2.md b/docs/RFC.plan.2.md new file mode 100644 index 0000000..dc4f9f0 --- /dev/null +++ b/docs/RFC.plan.2.md @@ -0,0 +1,43 @@ +# RFC Plan 2 — Registry + Actions Sessions + +> After Plan 1 bugs are fixed and AX-7 rename is done. + +## Session Goal: Registry[T] + First Migration + +1. Build `registry.go` with full AX-7 tests +2. Migrate `serviceRegistry` → `ServiceRegistry` embedding `Registry[*Service]` +3. Verify all existing tests still pass +4. Commit + push + +## Session Goal: Action System + +1. Rename `task.go` → `action.go` +2. Move `RegisterAction`/`RegisterActions`/`RegisterTask` to `ipc.go` +3. Build `ActionDef` type with `Run()`, `Exists()`, `Def()` +4. Wire `c.Action("name")` dual-purpose accessor +5. Full AX-7 tests +6. Commit + push + +## Session Goal: Migrate core/agent Handlers + +1. Register named Actions in `agentic.Register()` +2. Replace nested `c.ACTION()` cascade with Task pipeline +3. Test that queue drains properly after agent completion +4. This is the P6-1 fix — the queue starvation bug + +## Session Goal: c.Process() + go-process v0.7.0 + +1. Update go-process factory to return `core.Result` +2. Add `process.Register` direct factory +3. Remove `agentic.ProcessRegister` bridge +4. Add `Process` primitive to core/go (sugar over Actions) +5. Migrate core/agent `proc.go` → `s.core.Process()` calls +6. Delete `proc.go` and `ensureProcess()` + +## Between Sessions + +Each session should produce: +- Working code (all tests pass) +- A commit with conventional message +- Updated coverage numbers +- Any new findings added to RFC.md passes diff --git a/docs/RFC.plan.md b/docs/RFC.plan.md new file mode 100644 index 0000000..84d6f98 --- /dev/null +++ b/docs/RFC.plan.md @@ -0,0 +1,93 @@ +# RFC Plan — How to Work With This Spec + +> For future Claude sessions. Read this FIRST before touching code. + +## What Exists + +- `docs/RFC.md` — 3,845-line API spec with 108 findings across 13 passes +- `docs/RFC.implementation.{1-6}.md` — ordered implementation plans +- `llm.txt` — agent entry point +- `CLAUDE.md` — session-specific instructions + +## The 108 Findings Reduce to 5 Root Causes + +1. **Type erasure** (16 findings) — `Result{Value: any}` loses compile-time safety. Mitigate with typed methods + AX-7 tests. Not fixable without abandoning Result. + +2. **No internal boundaries** (14 findings) — `*Core` grants God Mode. Solved by porting RFC-004 (Entitlements) from CorePHP. v0.9.0 work. + +3. **Synchronous everything** (12 findings) — IPC dispatch blocks. ACTION cascade in core/agent blocks queue for minutes. Fixed by Action/Task system (Plan 3). + +4. **No recovery path** (10 findings) — `os.Exit` bypasses defer. No cleanup on failure. Fixed by Plan 1 (defer + RunE + panic recovery). + +5. **Missing primitives** (8 findings) — No ID, validation, health, atomic writes. Fixed by Plan 5. + +## Implementation Order + +``` +Plan 1 → v0.7.1 (ship immediately, zero breakage) +Plan 2 → Registry[T] (foundation — Plans 3-4 depend on this) +Plan 3 → Action/Task (execution primitive — Plan 4 depends on this) +Plan 4 → c.Process() (needs go-process v0.7.0 update first) +Plan 5 → Missing primitives + AX-7 (independent, do alongside 2-4) +Plan 6 → Ecosystem sweep (after 1-5, dispatched via Codex) +``` + +## 3 Critical Bugs — Fix First + +1. **P4-3:** `ipc.go` — ACTION handler returning `!OK` stops entire broadcast chain. Other handlers never fire. Fix: call all handlers, don't stop on failure. + +2. **P6-1:** core/agent `handlers.go` — Nested `c.ACTION()` calls create synchronous cascade 4 levels deep. QA → PR → Verify → Merge blocks Poke handler for minutes. Queue doesn't drain. Fix: replace with Task pipeline (needs Plan 3). + +3. **P7-2:** `core.go` — `Run()` calls `os.Exit(1)` on startup failure without calling `ServiceShutdown()`. Running services leak. Fix: add `defer c.ServiceShutdown()` + replace `os.Exit` with error return. + +## Key Design Decisions Already Made + +- **CamelCase = primitive** (brick), **UPPERCASE = convenience** (sugar) +- **Core is Lego bricks** — export the bricks, hide the safety mechanisms +- **Fs.root is the ONE exception** — security boundaries stay unexported +- **Registration IS permission** — no handler = no capability +- **`error` at Go interface boundary, `Result` at Core contract boundary** +- **Dual-purpose methods** (Service, Command, Action) — keep as sugar, Registry has explicit Get/Set +- **Array[T] and ConfigVar[T] are guardrail primitives** — model-proof, not speculative +- **ServiceRuntime[T] and manual `.core = c` are both valid** — document both +- **Startable V2 returns Result** — add alongside V1 for backwards compat +- **`RunE()` alongside `Run()`** — no breakage + +## Existing RFCs That Solve Open Problems + +| Problem | RFC | Core Provides | Consumer Implements | +|---------|-----|---------------|-------------------| +| Permissions | RFC-004 Entitlements | `c.Entitlement()` interface | go-entitlements package | +| Config context | RFC-003 Config Channels | `c.Config()` with channel | config channel service | +| Secrets | RFC-012 SMSG | `c.Secret()` interface | go-smsg / env fallback | +| Validation | RFC-009 Sigil | Transform chain interface | validator implementations | +| Containers | RFC-014 TIM | `c.Fs()` sandbox | TIM = OS isolation | +| In-memory fs | RFC-013 DataNode | `c.Data()` mounts fs.FS | DataNode / Borg | +| Lazy startup | RFC-002 Event Modules | Event declaration | Lazy instantiation | + +Core stays stdlib-only. Consumers bring implementations via WithService. + +## What NOT to Do + +- Don't add dependencies to core/go (it's stdlib + go-io + go-log only) +- Don't use `os/exec` — go-process is the only allowed user (P9-1: core/go itself violates this in app.go — fix it) +- Don't use `unsafe.Pointer` on Core types — add legitimate APIs instead +- Don't call `os.Exit` inside Core — return errors, let main() exit +- Don't create global mutable state — use Core's Registry +- Don't auto-discover via reflect — use explicit registration (HandleIPCEvents is the last magic method) + +## AX-7 Status + +- core/agent: 92% (840 tests, 79.9% coverage) +- core/go: 14% (83.6% coverage but wrong naming — needs rename + gap fill) +- Rename script exists (Python, used on core/agent — same script works) +- 212 functions × 3 categories = 636 target for core/go + +## Session Context That Won't Be In Memory + +- The ACTION cascade (P6-1) is the root cause of "agents finish but queue doesn't drain" +- status.json has 51 unprotected read-modify-write sites (P4-9) — real race condition +- The Fs sandbox is bypassed by 2 files using unsafe.Pointer (P11-2) +- `core.Env("DIR_HOME")` is cached at init — `t.Setenv` doesn't override it (P2-5) +- go-process `NewService` returns `(any, error)` not `core.Result` — needs v0.7.0 update +- Multiple Core instances share global state (assetGroups, systemInfo, defaultLog)