lint/docs/architecture.md
Snider e876b62045 docs: add human-friendly documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 13:02:40 +00:00

320 lines
12 KiB
Markdown

---
title: Architecture
description: Internal design of core/lint -- types, data flow, and extension points
---
# Architecture
This document explains how `core/lint` works internally. It covers the core library (`pkg/lint`), the PHP quality pipeline (`pkg/php`), and the QA command layer (`cmd/qa`).
## Overview
The system is organised into three layers:
```
cmd/core-lint CLI entry point (lint check, lint catalog)
cmd/qa QA workflow commands (watch, review, health, issues, PHP tools)
|
pkg/lint Core library: rules, catalog, matcher, scanner, reporting
pkg/php PHP tool wrappers: format, analyse, audit, security, test
pkg/detect Project type detection
|
catalog/*.yaml Embedded rule definitions
```
The root `lint.go` file ties the catalog layer to the library:
```go
//go:embed catalog/*.yaml
var catalogFS embed.FS
func LoadEmbeddedCatalog() (*lintpkg.Catalog, error) {
return lintpkg.LoadFS(catalogFS, "catalog")
}
```
This means all YAML rules are baked into the binary at compile time. There are no runtime file lookups.
## Core Types (pkg/lint)
### Rule
A `Rule` represents a single lint check loaded from YAML. Key fields:
```go
type Rule struct {
ID string `yaml:"id"`
Title string `yaml:"title"`
Severity string `yaml:"severity"` // info, low, medium, high, critical
Languages []string `yaml:"languages"` // e.g. ["go"], ["go", "php"]
Tags []string `yaml:"tags"` // e.g. ["security", "injection"]
Pattern string `yaml:"pattern"` // Regex pattern to match
ExcludePattern string `yaml:"exclude_pattern"` // Regex to suppress false positives
Fix string `yaml:"fix"` // Human-readable remediation
Detection string `yaml:"detection"` // "regex" (extensible to other types)
AutoFixable bool `yaml:"auto_fixable"`
ExampleBad string `yaml:"example_bad"`
ExampleGood string `yaml:"example_good"`
FoundIn []string `yaml:"found_in"` // Repos where pattern was observed
FirstSeen string `yaml:"first_seen"`
}
```
Each rule validates itself via `Validate()`, which checks required fields and compiles regex patterns. Severity is constrained to five levels: `info`, `low`, `medium`, `high`, `critical`.
### Catalog
A `Catalog` is a flat collection of rules with query methods:
- `ForLanguage(lang)` -- returns rules targeting a specific language
- `AtSeverity(threshold)` -- returns rules at or above a severity level
- `ByID(id)` -- looks up a single rule
Loading is done via `LoadDir(dir)` for filesystem paths or `LoadFS(fsys, dir)` for embedded filesystems. Both read all `.yaml` files in the directory and parse them into `[]Rule`.
### Matcher
The `Matcher` is the regex execution engine. It pre-compiles all regex-detection rules into `compiledRule` structs:
```go
type compiledRule struct {
rule Rule
pattern *regexp.Regexp
exclude *regexp.Regexp
}
```
`NewMatcher(rules)` compiles patterns once. `Match(filename, content)` then scans line by line:
1. For each compiled rule, check if the filename itself matches the exclude pattern (e.g., skip `_test.go` files).
2. For each line, test against the rule's pattern.
3. If the line matches, check the exclude pattern to suppress false positives.
4. Emit a `Finding` with file, line number, matched text, and remediation advice.
Non-regex detection types are silently skipped, allowing the catalog schema to support future detection mechanisms (AST, semantic) without breaking the matcher.
### Scanner
The `Scanner` orchestrates directory walking and language-aware matching:
1. Walk the directory tree, skipping excluded directories (`vendor`, `node_modules`, `.git`, `testdata`, `.core`).
2. For each file, detect its language from the file extension using `DetectLanguage()`.
3. Filter the rule set to only rules targeting that language.
4. Build a language-scoped `Matcher` and run it against the file content.
Supported language extensions:
| Extension | Language |
|-----------|----------|
| `.go` | go |
| `.php` | php |
| `.ts`, `.tsx` | ts |
| `.js`, `.jsx` | js |
| `.cpp`, `.cc`, `.c`, `.h` | cpp |
| `.py` | py |
### Finding
A `Finding` is the output of a match:
```go
type Finding struct {
RuleID string `json:"rule_id"`
Title string `json:"title"`
Severity string `json:"severity"`
File string `json:"file"`
Line int `json:"line"`
Match string `json:"match"`
Fix string `json:"fix"`
Repo string `json:"repo,omitempty"`
}
```
### Report
The `report.go` file provides three output formats:
- `WriteText(w, findings)` -- human-readable: `file:line [severity] title (rule-id)`
- `WriteJSON(w, findings)` -- pretty-printed JSON array
- `WriteJSONL(w, findings)` -- newline-delimited JSON (one object per line)
`Summarise(findings)` aggregates counts by severity.
## Data Flow
A typical scan follows this path:
```
YAML files ──> LoadFS() ──> Catalog{Rules}
|
ForLanguage() / AtSeverity()
|
[]Rule (filtered)
|
NewScanner(rules)
|
ScanDir(root) / ScanFile(path)
|
┌───────────────┼───────────────┐
│ Walk tree │ Detect lang │
│ Skip dirs │ Filter rules │
│ │ NewMatcher() │
│ │ Match() │
└───────────────┴───────────────┘
|
[]Finding
|
WriteText() / WriteJSON() / WriteJSONL()
```
## Cyclomatic Complexity Analysis (pkg/lint/complexity.go)
The module includes a native Go AST-based cyclomatic complexity analyser. It uses `go/parser` and `go/ast` -- no external tools required.
```go
results, err := lint.AnalyseComplexity(lint.ComplexityConfig{
Threshold: 15,
Path: "./pkg/...",
})
```
Complexity is calculated by starting at 1 and incrementing for each branching construct:
- `if`, `for`, `range`, `case` (non-default), `comm` (non-default)
- `&&`, `||` binary expressions
- `type switch`, `select`
There is also `AnalyseComplexitySource(src, filename, threshold)` for testing without file I/O.
## Coverage Tracking (pkg/lint/coverage.go)
The coverage subsystem supports:
- **Parsing** Go coverage output (`ParseCoverProfile` for `-coverprofile` format, `ParseCoverOutput` for `-cover` output)
- **Snapshotting** via `CoverageSnapshot` (timestamp, per-package percentages, metadata)
- **Persistence** via `CoverageStore` (JSON file-backed append-only store)
- **Regression detection** via `CompareCoverage(previous, current)` which returns a `CoverageComparison` with regressions, improvements, new packages, and removed packages
## Vulnerability Checking (pkg/lint/vulncheck.go)
`VulnCheck` wraps `govulncheck -json` and parses its newline-delimited JSON output into structured `VulnFinding` objects. The parser handles three message types from govulncheck's wire format:
- `config` -- extracts the module path
- `osv` -- stores vulnerability metadata (ID, aliases, summary, affected ranges)
- `finding` -- maps OSV IDs to call traces and affected packages
## Toolkit (pkg/lint/tools.go)
The `Toolkit` struct wraps common developer commands into structured Go APIs. It executes subprocesses and parses their output:
| Method | Wraps | Returns |
|--------|-------|---------|
| `FindTODOs(dir)` | `git grep` | `[]TODO` |
| `Lint(pkg)` | `go vet` | `[]ToolFinding` |
| `Coverage(pkg)` | `go test -cover` | `[]CoverageReport` |
| `RaceDetect(pkg)` | `go test -race` | `[]RaceCondition` |
| `AuditDeps()` | `govulncheck` (text) | `[]Vulnerability` |
| `ScanSecrets(dir)` | `gitleaks` | `[]SecretLeak` |
| `GocycloComplexity(threshold)` | `gocyclo` | `[]ComplexFunc` |
| `DepGraph(pkg)` | `go mod graph` | `*Graph` |
| `GitLog(n)` | `git log` | `[]Commit` |
| `DiffStat()` | `git diff --stat` | `DiffSummary` |
| `UncommittedFiles()` | `git status` | `[]string` |
| `Build(targets...)` | `go build` | `[]BuildResult` |
| `TestCount(pkg)` | `go test -list` | `int` |
| `CheckPerms(dir)` | `filepath.Walk` | `[]PermIssue` |
| `ModTidy()` | `go mod tidy` | `error` |
All methods use the `Run(name, args...)` helper which captures stdout, stderr, and exit code.
## PHP Quality Pipeline (pkg/php)
The `pkg/php` package provides structured wrappers around PHP ecosystem tools. Each tool has:
1. **Detection** -- checks for config files and vendor binaries (e.g., `DetectAnalyser`, `DetectPsalm`, `DetectRector`)
2. **Options struct** -- configures the tool run
3. **Execution function** -- builds the command, runs it, and returns structured results
### Supported Tools
| Function | Tool | Purpose |
|----------|------|---------|
| `Format()` | Laravel Pint | Code style formatting |
| `Analyse()` | PHPStan / Larastan | Static analysis |
| `RunPsalm()` | Psalm | Type-level static analysis |
| `RunAudit()` | Composer audit + npm audit | Dependency vulnerability scanning |
| `RunSecurityChecks()` | Built-in checks | .env exposure, debug mode, filesystem security |
| `RunRector()` | Rector | Automated code refactoring |
| `RunInfection()` | Infection | Mutation testing |
| `RunTests()` | Pest / PHPUnit | Test execution |
### QA Pipeline
The pipeline system (`pipeline.go` + `runner.go`) organises checks into three stages:
- **Quick** -- audit, fmt, stan (fast, run on every push)
- **Standard** -- psalm (if available), test
- **Full** -- rector, infection (slow, run in full QA)
The `QARunner` builds `process.RunSpec` objects with dependency ordering (e.g., `stan` runs after `fmt`, `test` runs after `stan`). This allows future parallelisation while respecting ordering constraints.
### Project Detection (pkg/detect)
The `detect` package identifies project types by checking for marker files:
- `go.mod` present => Go project
- `composer.json` present => PHP project
`DetectAll(dir)` returns all detected types, enabling polyglot project support.
## QA Command Layer (cmd/qa)
The `cmd/qa` package provides workflow-level commands that integrate with GitHub via the `gh` CLI:
- **watch** -- polls GitHub Actions for a specific commit, shows real-time status, drills into failure details (failed job, step, error line from logs)
- **review** -- fetches open PRs, analyses CI status, review decisions, and merge readiness, suggests next actions
- **health** -- scans all repos in a `repos.yaml` registry, reports aggregate CI health with pass rates
- **issues** -- fetches issues across repos, categorises them (needs response, ready, blocked, triage), prioritises by labels and activity
- **docblock** -- parses Go source with `go/ast`, counts exported symbols with and without doc comments, enforces a coverage threshold
Commands register themselves via `cli.RegisterCommands` in an `init()` function, making them available when the package is imported.
## Extension Points
### Adding New Rules
Create a new YAML file in `catalog/` following the schema:
```yaml
- id: go-xxx-001
title: "Description of the issue"
severity: medium # info, low, medium, high, critical
languages: [go]
tags: [security]
pattern: 'regex-pattern'
exclude_pattern: 'false-positive-filter'
fix: "How to fix the issue"
detection: regex
auto_fixable: false
example_bad: 'problematic code'
example_good: 'corrected code'
```
The file will be embedded automatically on the next build.
### Adding New Detection Types
The `Detection` field on `Rule` currently supports `"regex"`. The `Matcher` skips non-regex rules, so adding a new detection type (e.g., `"ast"` for Go AST patterns) requires:
1. Adding the new type to the `Validate()` method
2. Creating a new matcher implementation
3. Integrating it into `Scanner.ScanDir()`
### Loading External Catalogs
Use `LoadDir(path)` to load rules from a directory on disk rather than the embedded catalog:
```go
cat, err := lintpkg.LoadDir("/path/to/custom/rules")
```
This allows organisations to maintain private rule sets alongside the built-in catalog.