docs: add lint pattern catalog & polish skill design

Three-layer system: core/lint (pattern catalog + regex matcher), go-ai MCP subsystem (lint tools for agents), core/agent polish skill (multi-AI review orchestration). Seeded with 18 patterns from the Go ecosystem sweep. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 10:44:42 +00:00 · 2026-03-09 10:44:42 +00:00 · 2e2560dc60
commit 2e2560dc60
parent e2a68fc283
1 changed files with 259 additions and 0 deletions
--- a/docs/plans/2026-03-09-lint-pattern-catalog-design.md
+++ b/docs/plans/2026-03-09-lint-pattern-catalog-design.md
@ -0,0 +1,259 @@
+# Lint Pattern Catalog & Polish Skill Design
+
+**Goal:** A structured pattern catalog (`core/lint`) that captures recurring code quality findings as regex rules, exposes them via MCP tools in `go-ai`, and orchestrates multi-AI code review via a Claude Code skill in `core/agent`.
+
+**Architecture:** Three layers — a standalone catalog+matcher library (`core/lint`), an MCP subsystem in `go-ai` that exposes lint tools to agents, and a Claude Code plugin in `core/agent` that orchestrates the "polish" workflow (deterministic checks + AI reviewers + feedback loop into the catalog).
+
+**Tech Stack:** Go (catalog, matcher, CLI, MCP subsystem), YAML (rule definitions), JSONL (findings output, compatible with `~/.core/ai/metrics/`), Claude Code plugin format (hooks.json, commands/*.md, plugin.json).
+
+---
+
+## Context
+
+During a code review sweep of 18 Go repos (March 2026), AI reviewers (Gemini, Claude) found ~20 recurring patterns: SQL injection, path traversal, XSS, missing constant-time comparison, goroutine leaks, Go 1.26 modernisation opportunities, and more. Many of these patterns repeat across repos.
+
+Currently these findings exist only as commit messages. This design captures them as a reusable, machine-readable catalog that:
+1. Deterministic tools can run immediately (regex matching)
+2. MCP-connected agents can query and apply
+3. LEM models can train on for "does this comply with CoreGo standards?" judgements
+4. Grows automatically as AI reviewers find new patterns
+
+## Layer 1: `core/lint` — Pattern Catalog & Matcher
+
+### Repository Structure
+
+```
+core/lint/
+├── go.mod                        # forge.lthn.ai/core/lint
+├── catalog/
+│   ├── go-security.yaml          # SQL injection, path traversal, XSS, constant-time
+│   ├── go-modernise.yaml         # Go 1.26: slices.Clone, wg.Go, maps.Keys, range-over-int
+│   ├── go-correctness.yaml       # Deadlocks, goroutine leaks, nil guards, error handling
+│   ├── php-security.yaml         # XSS, CSRF, mass assignment, SQL injection
+│   ├── ts-security.yaml          # DOM XSS, prototype pollution
+│   └── cpp-safety.yaml           # Buffer overflow, use-after-free
+├── pkg/lint/
+│   ├── catalog.go                # Load + parse YAML catalog files
+│   ├── rule.go                   # Rule struct definition
+│   ├── matcher.go                # Regex matcher against file contents
+│   ├── report.go                 # Structured findings output (JSON/JSONL/text)
+│   ├── catalog_test.go
+│   ├── matcher_test.go
+│   └── report_test.go
+├── cmd/core-lint/
+│   └── main.go                   # `core-lint check ./...` CLI
+└── .core/
+    └── build.yaml                # Produces core-lint binary
+```
+
+### Rule Schema (YAML)
+
+```yaml
+- id: go-sec-001
+  title: "SQL wildcard injection in LIKE clauses"
+  severity: high          # critical, high, medium, low, info
+  languages: [go]
+  tags: [security, injection, owasp-a03]
+  pattern: 'LIKE\s+\?\s*,\s*["\x60]%\s*\+'
+  exclude_pattern: 'EscapeLike'              # suppress if this also matches
+  fix: "Use parameterised LIKE with explicit escaping of % and _ characters"
+  found_in: [go-store]                       # repos where first discovered
+  example_bad: |
+    db.Where("name LIKE ?", "%"+input+"%")
+  example_good: |
+    db.Where("name LIKE ?", EscapeLike(input))
+  first_seen: "2026-03-09"
+  detection: regex        # future: ast, semantic
+  auto_fixable: false     # future: true when we add codemods
+```
+
+### Rule Struct (Go)
+
+```go
+type Rule struct {
+    ID             string   `yaml:"id"`
+    Title          string   `yaml:"title"`
+    Severity       string   `yaml:"severity"`
+    Languages      []string `yaml:"languages"`
+    Tags           []string `yaml:"tags"`
+    Pattern        string   `yaml:"pattern"`
+    ExcludePattern string   `yaml:"exclude_pattern,omitempty"`
+    Fix            string   `yaml:"fix"`
+    FoundIn        []string `yaml:"found_in,omitempty"`
+    ExampleBad     string   `yaml:"example_bad,omitempty"`
+    ExampleGood    string   `yaml:"example_good,omitempty"`
+    FirstSeen      string   `yaml:"first_seen"`
+    Detection      string   `yaml:"detection"`     // regex | ast | semantic
+    AutoFixable    bool     `yaml:"auto_fixable"`
+}
+```
+
+### Finding Struct (Go)
+
+Designed to align with go-ai's `ScanAlert` shape and `~/.core/ai/metrics/` JSONL format:
+
+```go
+type Finding struct {
+    RuleID   string `json:"rule_id"`
+    Title    string `json:"title"`
+    Severity string `json:"severity"`
+    File     string `json:"file"`
+    Line     int    `json:"line"`
+    Match    string `json:"match"`        // matched text
+    Fix      string `json:"fix"`
+    Repo     string `json:"repo,omitempty"`
+}
+```
+
+### CLI Interface
+
+```bash
+# Check current directory against all catalogs for detected languages
+core-lint check ./...
+
+# Check specific languages/catalogs
+core-lint check --lang go --catalog go-security ./pkg/...
+
+# Output as JSON (for piping to other tools)
+core-lint check --format json ./...
+
+# List available rules
+core-lint catalog list
+core-lint catalog list --lang go --severity high
+
+# Show a specific rule with examples
+core-lint catalog show go-sec-001
+```
+
+## Layer 2: `go-ai` Lint MCP Subsystem
+
+New subsystem registered alongside files/rag/ml/brain:
+
+```go
+type LintSubsystem struct {
+    catalog *lint.Catalog
+    root    string  // workspace root for scanning
+}
+
+func (s *LintSubsystem) Name() string { return "lint" }
+
+func (s *LintSubsystem) RegisterTools(server *mcp.Server) {
+    // lint_check  - run rules against workspace files
+    // lint_catalog - list/search available rules
+    // lint_report - get findings summary for a path
+}
+```
+
+### MCP Tools
+
+| Tool | Input | Output | Group |
+|------|-------|--------|-------|
+| `lint_check` | `{path: string, lang?: string, severity?: string}` | `{findings: []Finding}` | lint |
+| `lint_catalog` | `{lang?: string, tags?: []string, severity?: string}` | `{rules: []Rule}` | lint |
+| `lint_report` | `{path: string, format?: "summary" or "detailed"}` | `{summary: ReportSummary}` | lint |
+
+This means any MCP-connected agent (Claude, LEM, Codex) can call `lint_check` to scan code against the catalog.
+
+## Layer 3: `core/agent` Polish Skill
+
+Claude Code plugin at `core/agent/claude/polish/`:
+
+```
+core/agent/claude/polish/
+├── plugin.json
+├── hooks.json              # optional: PostToolUse after git commit
+├── commands/
+│   └── polish.md           # /polish slash command
+└── scripts/
+    └── run-lint.sh         # shells out to core-lint
+```
+
+### `/polish` Command Flow
+
+1. Run `core-lint check ./...` for fast deterministic findings
+2. Report findings to user
+3. Optionally run AI reviewers (Gemini CLI, Codex) for deeper analysis
+4. Deduplicate AI findings against catalog (already-known patterns)
+5. Propose new patterns as catalog additions (PR to core/lint)
+
+### Subagent Configuration (`.core/agents/`)
+
+Repos can configure polish behaviour:
+
+```yaml
+# any-repo/.core/agents/polish.yaml
+languages: [go]
+catalogs: [go-security, go-modernise, go-correctness]
+reviewers: [gemini]          # which AI tools to invoke
+exclude: [vendor/, testdata/, *_test.go]
+severity_threshold: medium   # only report medium+ findings
+```
+
+## Findings to LEM Pipeline
+
+```
+core-lint check -> findings.json
+    |
+    v
+~/.core/ai/metrics/YYYY-MM-DD.jsonl    (audit trail)
+    |
+    v
+LEM training data:
+  - Rule examples (bad/good pairs) -> supervised training signal
+  - Finding frequency -> pattern importance weighting
+  - Rule descriptions -> natural language understanding of "why"
+    |
+    v
+LEM tool: "does this code comply with CoreGo standards?"
+  -> queries lint_catalog via MCP
+  -> applies learned pattern recognition
+  -> reports violations with rule IDs and fixes
+```
+
+## Initial Catalog Seed
+
+From the March 2026 ecosystem sweep:
+
+| ID | Title | Severity | Language | Found In |
+|----|-------|----------|----------|----------|
+| go-sec-001 | SQL wildcard injection | high | go | go-store |
+| go-sec-002 | Path traversal in cache keys | high | go | go-cache |
+| go-sec-003 | XSS in HTML output | high | go | go-html |
+| go-sec-004 | Non-constant-time auth comparison | high | go | go-crypt |
+| go-sec-005 | Log injection via unescaped input | medium | go | go-log |
+| go-sec-006 | Key material in log output | high | go | go-log |
+| go-cor-001 | Goroutine leak (no WaitGroup) | high | go | core/go |
+| go-cor-002 | Shutdown deadlock (wg.Wait no timeout) | high | go | core/go |
+| go-cor-003 | Silent error swallowing | medium | go | go-process, go-ratelimit |
+| go-cor-004 | Panic in library code | medium | go | go-i18n |
+| go-cor-005 | Delete without path validation | high | go | go-io |
+| go-mod-001 | Manual slice clone (append nil pattern) | low | go | core/go |
+| go-mod-002 | Manual sort instead of slices.Sorted | low | go | core/go |
+| go-mod-003 | Manual reverse loop instead of slices.Backward | low | go | core/go |
+| go-mod-004 | sync.WaitGroup Add+Done instead of Go() | low | go | core/go |
+| go-mod-005 | Manual map key collection instead of maps.Keys | low | go | core/go |
+| go-cor-006 | Missing error return from API calls | medium | go | go-forge, go-git |
+| go-cor-007 | Signal handler uses wrong type | medium | go | go-process |
+
+## Dependencies
+
+```
+core/lint (standalone, zero core deps)
+    ^
+    |
+go-ai/mcp/lint/ (imports core/lint for catalog + matcher)
+    ^
+    |
+core/agent/claude/polish/ (shells out to core-lint CLI)
+```
+
+`core/lint` has no dependency on `core/go` or any other framework module. It is a standalone library + CLI, like `core/go-io`.
+
+## Future Extensions (Not Built Now)
+
+- **AST-based detection** (layer 2): Parse Go/PHP AST, match structural patterns
+- **Semantic detection** (layer 3): LEM judges code against rule descriptions
+- **Auto-fix codemods**: `core-lint fix` applies known fixes automatically
+- **CI integration**: GitHub Actions workflow runs `core-lint check` on PRs
+- **CodeRabbit integration**: Import CodeRabbit findings as catalog entries
+- **Cross-repo dashboard**: Aggregate findings across all repos in workspace