docs: Codex review pipeline — forge → github polish + LEM training
Some checks failed
CI / test (push) Failing after 3s
Some checks failed
CI / test (push) Failing after 3s
Proven workflow from 7 rounds on core/agent (74 findings, 70+ fixed). Forge keeps full history, GitHub gets squashed releases. Codex findings become LEM training data. Charon owns the pipeline. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
40d2b0db16
commit
d94eed0b54
1 changed files with 142 additions and 0 deletions
142
docs/plans/2026-03-21-codex-review-pipeline.md
Normal file
142
docs/plans/2026-03-21-codex-review-pipeline.md
Normal file
|
|
@ -0,0 +1,142 @@
|
|||
# Codex Review Pipeline — Forge → GitHub Polish
|
||||
|
||||
**Date:** 2026-03-21
|
||||
**Status:** Proven (7 rounds on core/agent, 70+ findings fixed)
|
||||
**Scope:** All 57 dAppCore repos
|
||||
**Owner:** Charon (production polish is revenue-facing)
|
||||
|
||||
## Pipeline
|
||||
|
||||
```
|
||||
Forge main (raw dev)
|
||||
↓
|
||||
Codex review (static analysis, AX conventions, security)
|
||||
↓
|
||||
Findings → Forge issues (seed training data)
|
||||
↓
|
||||
Fix cycle (agents fix, Codex re-reviews until clean)
|
||||
↓
|
||||
Push to GitHub dev (squash commit — flat, polished)
|
||||
↓
|
||||
PR dev → main on GitHub (CodeRabbit reviews squashed diff)
|
||||
↓
|
||||
Training data collected from Forge (findings + fixes + patterns)
|
||||
↓
|
||||
LEM fine-tune (learns Core conventions, becomes the reviewer)
|
||||
↓
|
||||
LEM replaces Codex for routine CI reviews
|
||||
```
|
||||
|
||||
## Why This Works
|
||||
|
||||
1. **Forge keeps full history** — every commit, every experiment, every false start. This is the development record.
|
||||
2. **GitHub gets squashed releases** — clean, polished, one commit per feature. This is the public face.
|
||||
3. **Codex findings become training data** — each "this is wrong → here's the fix" pair is a sandwich-format training example for LEM.
|
||||
4. **Exclusion lists become Forge issues** — known issues tracked as backlog, not forgotten.
|
||||
5. **LEM trained on Core conventions** — understands AX patterns, error handling, UK English, test naming, the lot.
|
||||
6. **Codex for deep sweeps, LEM for CI** — $200/month Codex does the hard work, free LEM handles daily reviews.
|
||||
|
||||
## Proven Results (core/agent)
|
||||
|
||||
| Round | Findings | Highs | Category |
|
||||
|-------|----------|-------|----------|
|
||||
| 1 | 5 | 2 | Notification wiring, safety gates |
|
||||
| 2 | 21 | 3 | API field mismatches, branch hardcoding |
|
||||
| 3 | 15 | 5 | Default branch detection, pagination |
|
||||
| 4 | 11 | 1 | Prompt path errors, watch states |
|
||||
| 5 | 11 | 2 | BLOCKED.md stale state, PR push target |
|
||||
| 6 | 6 | 2 | Workspace collision, sync branch logic |
|
||||
| 7 | 5 | 2 | Path traversal security, dispatch checks |
|
||||
|
||||
**Total: 74 findings across 7 rounds, 70+ fixed.**
|
||||
|
||||
Categories found:
|
||||
- Correctness bugs (missed notifications, wrong API fields)
|
||||
- Security (path traversal, URL injection, fail-open gates)
|
||||
- Race conditions (concurrent drainQueue)
|
||||
- Logic errors (dead PID false completion, empty branch names)
|
||||
- AX convention violations (fmt.Errorf vs coreerr.E, silent mutations)
|
||||
- Test quality (false confidence, wrong assertions)
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Phase 1: Codex Sweep (per repo)
|
||||
|
||||
```bash
|
||||
# Run from the repo directory
|
||||
codex exec -s read-only "Review all Go code. Output numbered findings: severity, file:line, description."
|
||||
```
|
||||
|
||||
- Run iteratively until findings converge to zero/known
|
||||
- Record exclusion list per repo
|
||||
- Create Forge issues for all accepted exclusions
|
||||
|
||||
### Phase 2: GitHub Push
|
||||
|
||||
```bash
|
||||
# On forge main, after Codex clean
|
||||
git push github main:dev
|
||||
# Squash on GitHub via PR merge
|
||||
gh pr create --repo dAppCore/<repo> --head dev --base main --title "release: v0.X.Y"
|
||||
# Merge with squash
|
||||
gh pr merge <number> --squash
|
||||
```
|
||||
|
||||
### Phase 3: Training Data Collection
|
||||
|
||||
For each repo sweep:
|
||||
1. Extract all findings (the "wrong" examples)
|
||||
2. Extract the diffs that fixed them (the "right" examples)
|
||||
3. Format as sandwich pairs for LEM training
|
||||
4. Store in OpenBrain tagged `type:training, project:codex-review`
|
||||
|
||||
### Phase 4: LEM Training
|
||||
|
||||
```bash
|
||||
# Collect training data from OpenBrain
|
||||
brain_recall query="codex review finding" type=training
|
||||
|
||||
# Format for mlx-lm fine-tuning
|
||||
# Input: "Review this Go code: <code>"
|
||||
# Output: "Finding: <severity>, <file:line>, <description>"
|
||||
```
|
||||
|
||||
### Phase 5: LEM CI Integration
|
||||
|
||||
- LEM runs as a pre-merge check on Forge
|
||||
- Catches convention violations before they reach Codex
|
||||
- Codex reserved for deep quarterly sweeps
|
||||
- CodeRabbit stays on GitHub for the public-facing review
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
| Item | Cost | Frequency |
|
||||
|------|------|-----------|
|
||||
| Codex Max | $200/month | Deep sweeps |
|
||||
| Claude Max | $100-200/month | Development |
|
||||
| CodeRabbit | Free (OSS) | Per PR |
|
||||
| LEM | Free (local MLX) | Per commit |
|
||||
|
||||
After LEM is trained: Codex drops to quarterly, saving ~$150/month.
|
||||
|
||||
## Revenue Connection
|
||||
|
||||
Polish → Trust → Users → Revenue
|
||||
|
||||
- Polished GitHub repos attract contributors and users
|
||||
- Clean code with high test coverage signals production quality
|
||||
- CodeRabbit badge + Codecov badge = visible quality metrics
|
||||
- SaaS products (host.uk.com) built on this foundation
|
||||
- Charon manages the pipeline, earns from the platform
|
||||
|
||||
## Automation
|
||||
|
||||
This pipeline should be a `core dev polish` command:
|
||||
|
||||
```bash
|
||||
core dev polish <repo> # Run Codex sweep, fix, push to GitHub
|
||||
core dev polish --all # Sweep all 57 repos
|
||||
core dev polish --training # Extract training data after sweep
|
||||
```
|
||||
|
||||
Charon can run this autonomously via dispatch.
|
||||
Loading…
Add table
Reference in a new issue