agent/docs/RFC-AGENT-PIPELINE.md

247 lines
8.5 KiB
Markdown
Raw Permalink Normal View History

# Agentic Pipeline v2 — Autonomous Dispatch→Verify→Merge
> The full autonomous pipeline: issue → dispatch → implement → verify → PR → merge.
> CodeRabbit findings = 0 is the KPI.
---
## Pipeline Flow
```
Issue created (Forge/GitHub)
→ core-agent picks up event
→ Selects flow YAML based on event type + repo
→ Prepares sandboxed workspace (CODEX.md, .core/reference/)
→ Dispatches agent (codex/gemini/claude)
→ Agent implements in workspace
→ QA flow runs (build, test, vet, lint)
→ If QA passes → create PR to dev
→ CodeRabbit reviews PR
→ If findings = 0 → auto-merge
→ If findings > 0 → dispatch fix agent → repeat
→ PR merged → training data captured
→ Issue closed
```
## Key Design Decisions
### Sandboxing
Agents MUST be sandboxed to their assigned repo. Unsandboxed writes caused the CLI mess
(agent wrote files to wrong repo). Workspace isolation is non-negotiable.
### CodeRabbit KPI
CodeRabbit findings = 0 is the target. Every finding means:
- Template didn't prevent it → fix the template
- Model didn't catch it → add to training data
- Convention wasn't documented → add to RFC
Zero findings = complete convention coverage.
### Checkin API
Agents check in with status via api.lthn.sh. Current blocker: Forge webhooks
need to fire to lthn.sh so the orchestrator knows when to start the pipeline.
### Security Model (from Charon flows)
Orchestrator uses STRUCTURAL signals only (labels, PR state, review counts).
Never parses comment CONTENT — immune to prompt injection via issue comments.
## Agent Pool Configuration
See `code/core/go/agent/RFC.md` §Dispatch & Pool Routing for the full `agent.yaml` schema (concurrency, rates, model variants, agent identities).
Concurrency enforced by runner service (core/agent). Slot reservation prevents
TOCTOU race between parallel dispatches.
## go-process Improvements Needed
- `Timeout` — kill after N minutes (currently agents can run forever)
- `GracePeriod` — SIGTERM before SIGKILL
- `KillGroup` — kill process group, not just PID (prevents orphaned subprocesses)
## Metrics
- 25 repos auto-merged in recent sweep
- 74 findings on core/agent alone (70+ fixed)
- Zero-finding rate improving as templates capture conventions
## `core pipeline` Command Tree (Go Implementation)
```
core pipeline
├── audit <repo> # Stage 1: audit issues → implementation issues
├── epic
│ ├── create <repo> # Stage 2: group issues into epics
│ ├── run <epic-number> # Stage 3: dispatch + monitor an epic
│ ├── status [epic-number] # Show epic progress
│ └── sync <epic-number> # Tick parent checklist from closed children
├── monitor [repo] # Watch all open PRs, auto-intervene
├── fix
│ ├── reviews <pr-number> # "Can you fix the code reviews?"
│ ├── conflicts <pr-number> # "Can you fix the merge conflict?"
│ ├── format <pr-number> # gofmt, commit, push (no AI)
│ └── threads <pr-number> # Resolve all threads after fix
├── onboard <repo> # Full: audit → epic → dispatch
├── budget # Daily usage vs pool
│ ├── plan # Optimal dispatch for today
│ └── log # Append dispatch event
└── training
├── capture <pr-number> # Journal entry for merged PR
├── stats # Summary across journals
└── export # Clean export for LEM training
```
## MetaReader — Structural Signals Only
The core abstraction. Every pipeline decision comes through this interface. **NEVER reads comment bodies, commit messages, PR descriptions, or review content.**
```go
type MetaReader interface {
GetPRMeta(repo string, pr int) (*PRMeta, error)
GetEpicMeta(repo string, issue int) (*EpicMeta, error)
GetIssueState(repo string, issue int) (string, error)
GetCommentReactions(repo string, commentID int64) ([]ReactionMeta, error)
}
```
### PRMeta
```go
type PRMeta struct {
Number int
State string // OPEN, MERGED, CLOSED
Mergeable string // MERGEABLE, CONFLICTING, UNKNOWN
HeadSHA string
HeadDate time.Time
AutoMerge bool
BaseBranch string
HeadBranch string
Checks []CheckMeta
ThreadsTotal int
ThreadsResolved int
HasEyesReaction bool // 👀 = agent acknowledged
}
type CheckMeta struct {
Name string // "qa", "build", "org-gate"
Conclusion string // "SUCCESS", "FAILURE", ""
Status string // "COMPLETED", "QUEUED", "IN_PROGRESS"
}
```
### EpicMeta
```go
type EpicMeta struct {
Number int
State string
Children []ChildMeta
}
type ChildMeta struct {
Number int
Checked bool // [x] vs [ ]
State string // OPEN, CLOSED
PRs []int
}
```
### Security: What's Explicitly Excluded
The MetaReader has NO methods for:
- `GetCommentBodies` — injection vector
- `GetCommitMessages` — can contain crafted instructions
- `GetPRDescription` — attacker-controlled in fork PRs
- `GetReviewThreadContent` — untrusted input
Implementation uses `gh api` with `--jq` filters that strip content at the query level. Content never enters the Go process.
## Three-Stage Pipeline
```
STAGE 1: AUDIT (flow: audit-issues)
Input: Repo with [Audit] issues
Output: Implementation issues (1 per finding)
→ Classify findings (severity, type, scope, complexity)
→ Detect patterns (3+ similar → framework issue)
→ Close audit issues, link to children
STAGE 2: ORGANISE (flow: create-epic)
Input: Implementation issues
Output: Epic parent with children, branch, phase ordering
→ Group by theme (security, quality, testing)
→ Order into phases (blockers → parallel → cleanup)
→ Create epic branch off dev
STAGE 3: EXECUTE (flow: issue-epic)
Input: Epic with children, branch
Output: Merged PRs, closed issues, training data
→ Dispatch Phase 1 to agents
→ Monitor: CI, reviews, conflicts, merges
→ Intervene: fix reviews / fix conflicts
→ Phase complete → dispatch next phase
→ Epic complete → merge epic branch to dev
```
## Gotchas (Battle-Tested)
| Gotcha | Fix |
|--------|-----|
| Jules creates PRs as user, not bot | Match by branch/issue linkage, not author |
| `git push origin dev` ambiguous (tag+branch) | Use `HEAD:refs/heads/dev` |
| Base branch gofmt breaks ALL PRs | Fix base first, not the PRs |
| Auto-merge needs explicit permissions in caller | Add `permissions: contents: write, pull-requests: write` |
| `--squash` conflicts with merge queue | Use `--auto` alone — queue controls strategy |
## Knowledge Accumulation (Discussions Strategy)
Non-actionable findings (nitpicks, patterns, style preferences) get posted to a queryable knowledge base (Forge/OpenBrain). When patterns emerge, humans create issues.
```
Build → Agents review → Actionable → Fix immediately
→ Non-actionable → Post to knowledge base
→ Patterns emerge
→ Human creates Issue
→ Agent picks up via pipeline
```
### Discussion Categories
| Channel | Category | Purpose |
|---------|----------|---------|
| 🚧 dev | PR build findings | Per-PR QA findings |
| 🛩️ alpha | Canary findings | Early testing |
| 🛸 beta | Integration findings | Integration testing |
| 🚀 stable | Release audit | Production audit |
### Naming: `{tool}:v{VERSION}`
`qa:v0.0.4.pr.264`, `lint:v0.0.4-alpha.42`, `audit:v0.0.4`
Tool prefixes: `qa:`, `lint:`, `static:`, `docker:`, `e2e:`, `perf:`, `security:`, `audit:`
### Pattern Detection
Query discussions to surface patterns across builds:
```bash
# 47 aria-label mentions across dev discussions → time for a11y audit issue
gh api graphql ... | grep -c "aria-label"
```
### CLI Integration
```bash
core go qa --post-findings # Post lint findings to discussion
core php qa --post-findings # Same for PHP
core qa # Aggregated summary
```
### Connection to Training
Discussion patterns → Issue → Agent implements → PR merged → findings captured as LEM training data. The feedback loop that makes agents better at conventions over time.
---
## Related RFCs
- `code/core/agent/flow/` — Flow YAML system
- `code/core/agent/RFC.md` — Agent dispatch system
- `project/lthn/lem/RFC-TRAINING-PIPELINE.md` — Findings → training data