feat: session continuity plans — RFC.plan.md + plan.1 + plan.2
RFC.plan.md: master context document for future sessions - 5 root causes, 3 critical bugs, key decisions, what NOT to do - Session context that won't survive compact - Cross-references to existing RFCs that solve problems RFC.plan.1.md: first session priorities - Fix 3 critical bugs (one-line changes) - AX-7 rename for core/go - Start Registry[T] RFC.plan.2.md: subsequent session goals - Registry + migration - Action system - core/agent cascade fix - c.Process() + go-process v0.7.0 Future sessions: read RFC.plan.md first, then the numbered plan for that session's scope. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
9cd83daaae
commit
0704a7a65b
3 changed files with 221 additions and 0 deletions
85
docs/RFC.plan.1.md
Normal file
85
docs/RFC.plan.1.md
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
# RFC Plan 1 — First Session Priorities
|
||||
|
||||
> Read RFC.plan.md first. This is what to do in the FIRST session after compact.
|
||||
|
||||
## Priority 1: Fix the 3 Critical Bugs (Plan 1)
|
||||
|
||||
These are one-line to five-line changes. Ship as v0.7.1.
|
||||
|
||||
### Bug 1: ACTION stops on !OK (ipc.go line ~33)
|
||||
|
||||
```go
|
||||
// CURRENT (broken — handler 3 failing silences handlers 4 and 5):
|
||||
for _, h := range handlers {
|
||||
if r := h(c, msg); !r.OK { return r }
|
||||
}
|
||||
|
||||
// FIX:
|
||||
for _, h := range handlers {
|
||||
func() {
|
||||
defer func() { if r := recover(); r != nil { Error("handler panic", "err", r) } }()
|
||||
h(c, msg)
|
||||
}()
|
||||
}
|
||||
```
|
||||
|
||||
This also fixes P7-3 (no panic recovery) in the same change.
|
||||
|
||||
### Bug 2: Run() leaks on startup failure (core.go Run method)
|
||||
|
||||
Add one line:
|
||||
```go
|
||||
func (c *Core) Run() {
|
||||
defer c.ServiceShutdown(context.Background()) // ADD THIS
|
||||
// ... rest unchanged
|
||||
}
|
||||
```
|
||||
|
||||
### Bug 3: Remove stale Embed() and fix comment
|
||||
|
||||
Delete `func (c *Core) Embed() Result` from core.go.
|
||||
Fix the `New()` comment to show `*Core` return.
|
||||
|
||||
### Test all 3 with AX-7 naming:
|
||||
```
|
||||
TestIpc_Action_Ugly_HandlerFailsChainContinues
|
||||
TestIpc_Action_Ugly_HandlerPanicsChainContinues
|
||||
TestCore_Run_Ugly_StartupFailureCallsShutdown
|
||||
```
|
||||
|
||||
## Priority 2: AX-7 Rename for core/go
|
||||
|
||||
Run the same Python rename script used on core/agent:
|
||||
|
||||
```python
|
||||
# Same script from core/agent session — applies to any Go package
|
||||
# Changes TestFoo_Good to TestFile_Foo_Good
|
||||
```
|
||||
|
||||
This is mechanical. No logic changes. Just naming.
|
||||
|
||||
Then run gap analysis:
|
||||
```bash
|
||||
python3 -c "... same gap analysis script ..."
|
||||
```
|
||||
|
||||
## Priority 3: Start Registry[T] (Plan 2)
|
||||
|
||||
Create `registry.go` with the type. Write tests FIRST (AX-7 complete from day one):
|
||||
|
||||
```
|
||||
TestRegistry_Set_Good
|
||||
TestRegistry_Set_Bad
|
||||
TestRegistry_Set_Ugly
|
||||
TestRegistry_Get_Good
|
||||
...
|
||||
```
|
||||
|
||||
Then migrate `serviceRegistry` first (most tested, most used).
|
||||
|
||||
## What To Skip In First Session
|
||||
|
||||
- Plan 3 (Actions) — needs Registry first
|
||||
- Plan 4 (Process) — needs Actions first
|
||||
- Plan 6 (ecosystem sweep) — needs everything first
|
||||
- Any breaking changes — v0.7.1 is additive only
|
||||
43
docs/RFC.plan.2.md
Normal file
43
docs/RFC.plan.2.md
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
# RFC Plan 2 — Registry + Actions Sessions
|
||||
|
||||
> After Plan 1 bugs are fixed and AX-7 rename is done.
|
||||
|
||||
## Session Goal: Registry[T] + First Migration
|
||||
|
||||
1. Build `registry.go` with full AX-7 tests
|
||||
2. Migrate `serviceRegistry` → `ServiceRegistry` embedding `Registry[*Service]`
|
||||
3. Verify all existing tests still pass
|
||||
4. Commit + push
|
||||
|
||||
## Session Goal: Action System
|
||||
|
||||
1. Rename `task.go` → `action.go`
|
||||
2. Move `RegisterAction`/`RegisterActions`/`RegisterTask` to `ipc.go`
|
||||
3. Build `ActionDef` type with `Run()`, `Exists()`, `Def()`
|
||||
4. Wire `c.Action("name")` dual-purpose accessor
|
||||
5. Full AX-7 tests
|
||||
6. Commit + push
|
||||
|
||||
## Session Goal: Migrate core/agent Handlers
|
||||
|
||||
1. Register named Actions in `agentic.Register()`
|
||||
2. Replace nested `c.ACTION()` cascade with Task pipeline
|
||||
3. Test that queue drains properly after agent completion
|
||||
4. This is the P6-1 fix — the queue starvation bug
|
||||
|
||||
## Session Goal: c.Process() + go-process v0.7.0
|
||||
|
||||
1. Update go-process factory to return `core.Result`
|
||||
2. Add `process.Register` direct factory
|
||||
3. Remove `agentic.ProcessRegister` bridge
|
||||
4. Add `Process` primitive to core/go (sugar over Actions)
|
||||
5. Migrate core/agent `proc.go` → `s.core.Process()` calls
|
||||
6. Delete `proc.go` and `ensureProcess()`
|
||||
|
||||
## Between Sessions
|
||||
|
||||
Each session should produce:
|
||||
- Working code (all tests pass)
|
||||
- A commit with conventional message
|
||||
- Updated coverage numbers
|
||||
- Any new findings added to RFC.md passes
|
||||
93
docs/RFC.plan.md
Normal file
93
docs/RFC.plan.md
Normal file
|
|
@ -0,0 +1,93 @@
|
|||
# RFC Plan — How to Work With This Spec
|
||||
|
||||
> For future Claude sessions. Read this FIRST before touching code.
|
||||
|
||||
## What Exists
|
||||
|
||||
- `docs/RFC.md` — 3,845-line API spec with 108 findings across 13 passes
|
||||
- `docs/RFC.implementation.{1-6}.md` — ordered implementation plans
|
||||
- `llm.txt` — agent entry point
|
||||
- `CLAUDE.md` — session-specific instructions
|
||||
|
||||
## The 108 Findings Reduce to 5 Root Causes
|
||||
|
||||
1. **Type erasure** (16 findings) — `Result{Value: any}` loses compile-time safety. Mitigate with typed methods + AX-7 tests. Not fixable without abandoning Result.
|
||||
|
||||
2. **No internal boundaries** (14 findings) — `*Core` grants God Mode. Solved by porting RFC-004 (Entitlements) from CorePHP. v0.9.0 work.
|
||||
|
||||
3. **Synchronous everything** (12 findings) — IPC dispatch blocks. ACTION cascade in core/agent blocks queue for minutes. Fixed by Action/Task system (Plan 3).
|
||||
|
||||
4. **No recovery path** (10 findings) — `os.Exit` bypasses defer. No cleanup on failure. Fixed by Plan 1 (defer + RunE + panic recovery).
|
||||
|
||||
5. **Missing primitives** (8 findings) — No ID, validation, health, atomic writes. Fixed by Plan 5.
|
||||
|
||||
## Implementation Order
|
||||
|
||||
```
|
||||
Plan 1 → v0.7.1 (ship immediately, zero breakage)
|
||||
Plan 2 → Registry[T] (foundation — Plans 3-4 depend on this)
|
||||
Plan 3 → Action/Task (execution primitive — Plan 4 depends on this)
|
||||
Plan 4 → c.Process() (needs go-process v0.7.0 update first)
|
||||
Plan 5 → Missing primitives + AX-7 (independent, do alongside 2-4)
|
||||
Plan 6 → Ecosystem sweep (after 1-5, dispatched via Codex)
|
||||
```
|
||||
|
||||
## 3 Critical Bugs — Fix First
|
||||
|
||||
1. **P4-3:** `ipc.go` — ACTION handler returning `!OK` stops entire broadcast chain. Other handlers never fire. Fix: call all handlers, don't stop on failure.
|
||||
|
||||
2. **P6-1:** core/agent `handlers.go` — Nested `c.ACTION()` calls create synchronous cascade 4 levels deep. QA → PR → Verify → Merge blocks Poke handler for minutes. Queue doesn't drain. Fix: replace with Task pipeline (needs Plan 3).
|
||||
|
||||
3. **P7-2:** `core.go` — `Run()` calls `os.Exit(1)` on startup failure without calling `ServiceShutdown()`. Running services leak. Fix: add `defer c.ServiceShutdown()` + replace `os.Exit` with error return.
|
||||
|
||||
## Key Design Decisions Already Made
|
||||
|
||||
- **CamelCase = primitive** (brick), **UPPERCASE = convenience** (sugar)
|
||||
- **Core is Lego bricks** — export the bricks, hide the safety mechanisms
|
||||
- **Fs.root is the ONE exception** — security boundaries stay unexported
|
||||
- **Registration IS permission** — no handler = no capability
|
||||
- **`error` at Go interface boundary, `Result` at Core contract boundary**
|
||||
- **Dual-purpose methods** (Service, Command, Action) — keep as sugar, Registry has explicit Get/Set
|
||||
- **Array[T] and ConfigVar[T] are guardrail primitives** — model-proof, not speculative
|
||||
- **ServiceRuntime[T] and manual `.core = c` are both valid** — document both
|
||||
- **Startable V2 returns Result** — add alongside V1 for backwards compat
|
||||
- **`RunE()` alongside `Run()`** — no breakage
|
||||
|
||||
## Existing RFCs That Solve Open Problems
|
||||
|
||||
| Problem | RFC | Core Provides | Consumer Implements |
|
||||
|---------|-----|---------------|-------------------|
|
||||
| Permissions | RFC-004 Entitlements | `c.Entitlement()` interface | go-entitlements package |
|
||||
| Config context | RFC-003 Config Channels | `c.Config()` with channel | config channel service |
|
||||
| Secrets | RFC-012 SMSG | `c.Secret()` interface | go-smsg / env fallback |
|
||||
| Validation | RFC-009 Sigil | Transform chain interface | validator implementations |
|
||||
| Containers | RFC-014 TIM | `c.Fs()` sandbox | TIM = OS isolation |
|
||||
| In-memory fs | RFC-013 DataNode | `c.Data()` mounts fs.FS | DataNode / Borg |
|
||||
| Lazy startup | RFC-002 Event Modules | Event declaration | Lazy instantiation |
|
||||
|
||||
Core stays stdlib-only. Consumers bring implementations via WithService.
|
||||
|
||||
## What NOT to Do
|
||||
|
||||
- Don't add dependencies to core/go (it's stdlib + go-io + go-log only)
|
||||
- Don't use `os/exec` — go-process is the only allowed user (P9-1: core/go itself violates this in app.go — fix it)
|
||||
- Don't use `unsafe.Pointer` on Core types — add legitimate APIs instead
|
||||
- Don't call `os.Exit` inside Core — return errors, let main() exit
|
||||
- Don't create global mutable state — use Core's Registry
|
||||
- Don't auto-discover via reflect — use explicit registration (HandleIPCEvents is the last magic method)
|
||||
|
||||
## AX-7 Status
|
||||
|
||||
- core/agent: 92% (840 tests, 79.9% coverage)
|
||||
- core/go: 14% (83.6% coverage but wrong naming — needs rename + gap fill)
|
||||
- Rename script exists (Python, used on core/agent — same script works)
|
||||
- 212 functions × 3 categories = 636 target for core/go
|
||||
|
||||
## Session Context That Won't Be In Memory
|
||||
|
||||
- The ACTION cascade (P6-1) is the root cause of "agents finish but queue doesn't drain"
|
||||
- status.json has 51 unprotected read-modify-write sites (P4-9) — real race condition
|
||||
- The Fs sandbox is bypassed by 2 files using unsafe.Pointer (P11-2)
|
||||
- `core.Env("DIR_HOME")` is cached at init — `t.Setenv` doesn't override it (P2-5)
|
||||
- go-process `NewService` returns `(any, error)` not `core.Result` — needs v0.7.0 update
|
||||
- Multiple Core instances share global state (assetGroups, systemInfo, defaultLog)
|
||||
Loading…
Add table
Reference in a new issue