--- title: Architecture description: Internal design of core/lint -- types, data flow, and extension points --- # Architecture This document explains how `core/lint` works internally. It covers the core library (`pkg/lint`), the PHP quality pipeline (`pkg/php`), and the QA command layer (`cmd/qa`). ## Overview The system is organised into three layers: ``` cmd/core-lint CLI entry point (lint check, lint catalog) cmd/qa QA workflow commands (watch, review, health, issues, PHP tools) | pkg/lint Core library: rules, catalog, matcher, scanner, reporting pkg/php PHP tool wrappers: format, analyse, audit, security, test pkg/detect Project type detection | catalog/*.yaml Embedded rule definitions ``` The root `lint.go` file ties the catalog layer to the library: ```go //go:embed catalog/*.yaml var catalogFS embed.FS func LoadEmbeddedCatalog() (*lintpkg.Catalog, error) { return lintpkg.LoadFS(catalogFS, "catalog") } ``` This means all YAML rules are baked into the binary at compile time. There are no runtime file lookups. ## Core Types (pkg/lint) ### Rule A `Rule` represents a single lint check loaded from YAML. Key fields: ```go type Rule struct { ID string `yaml:"id"` Title string `yaml:"title"` Severity string `yaml:"severity"` // info, low, medium, high, critical Languages []string `yaml:"languages"` // e.g. ["go"], ["go", "php"] Tags []string `yaml:"tags"` // e.g. ["security", "injection"] Pattern string `yaml:"pattern"` // Regex pattern to match ExcludePattern string `yaml:"exclude_pattern"` // Regex to suppress false positives Fix string `yaml:"fix"` // Human-readable remediation Detection string `yaml:"detection"` // "regex" (extensible to other types) AutoFixable bool `yaml:"auto_fixable"` ExampleBad string `yaml:"example_bad"` ExampleGood string `yaml:"example_good"` FoundIn []string `yaml:"found_in"` // Repos where pattern was observed FirstSeen string `yaml:"first_seen"` } ``` Each rule validates itself via `Validate()`, which checks required fields and compiles regex patterns. Severity is constrained to five levels: `info`, `low`, `medium`, `high`, `critical`. ### Catalog A `Catalog` is a flat collection of rules with query methods: - `ForLanguage(lang)` -- returns rules targeting a specific language - `AtSeverity(threshold)` -- returns rules at or above a severity level - `ByID(id)` -- looks up a single rule Loading is done via `LoadDir(dir)` for filesystem paths or `LoadFS(fsys, dir)` for embedded filesystems. Both read all `.yaml` files in the directory and parse them into `[]Rule`. ### Matcher The `Matcher` is the regex execution engine. It pre-compiles all regex-detection rules into `compiledRule` structs: ```go type compiledRule struct { rule Rule pattern *regexp.Regexp exclude *regexp.Regexp } ``` `NewMatcher(rules)` compiles patterns once. `Match(filename, content)` then scans line by line: 1. For each compiled rule, check if the filename itself matches the exclude pattern (e.g., skip `_test.go` files). 2. For each line, test against the rule's pattern. 3. If the line matches, check the exclude pattern to suppress false positives. 4. Emit a `Finding` with file, line number, matched text, and remediation advice. Non-regex detection types are silently skipped, allowing the catalog schema to support future detection mechanisms (AST, semantic) without breaking the matcher. ### Scanner The `Scanner` orchestrates directory walking and language-aware matching: 1. Walk the directory tree, skipping excluded directories (`vendor`, `node_modules`, `.git`, `testdata`, `.core`). 2. For each file, detect its language from the file extension using `DetectLanguage()`. 3. Filter the rule set to only rules targeting that language. 4. Build a language-scoped `Matcher` and run it against the file content. Supported language extensions: | Extension | Language | |-----------|----------| | `.go` | go | | `.php` | php | | `.ts`, `.tsx` | ts | | `.js`, `.jsx` | js | | `.cpp`, `.cc`, `.c`, `.h` | cpp | | `.py` | py | ### Finding A `Finding` is the output of a match: ```go type Finding struct { RuleID string `json:"rule_id"` Title string `json:"title"` Severity string `json:"severity"` File string `json:"file"` Line int `json:"line"` Match string `json:"match"` Fix string `json:"fix"` Repo string `json:"repo,omitempty"` } ``` ### Report The `report.go` file provides three output formats: - `WriteText(w, findings)` -- human-readable: `file:line [severity] title (rule-id)` - `WriteJSON(w, findings)` -- pretty-printed JSON array - `WriteJSONL(w, findings)` -- newline-delimited JSON (one object per line) `Summarise(findings)` aggregates counts by severity. ## Data Flow A typical scan follows this path: ``` YAML files ──> LoadFS() ──> Catalog{Rules} | ForLanguage() / AtSeverity() | []Rule (filtered) | NewScanner(rules) | ScanDir(root) / ScanFile(path) | ┌───────────────┼───────────────┐ │ Walk tree │ Detect lang │ │ Skip dirs │ Filter rules │ │ │ NewMatcher() │ │ │ Match() │ └───────────────┴───────────────┘ | []Finding | WriteText() / WriteJSON() / WriteJSONL() ``` ## Cyclomatic Complexity Analysis (pkg/lint/complexity.go) The module includes a native Go AST-based cyclomatic complexity analyser. It uses `go/parser` and `go/ast` -- no external tools required. ```go results, err := lint.AnalyseComplexity(lint.ComplexityConfig{ Threshold: 15, Path: "./pkg/...", }) ``` Complexity is calculated by starting at 1 and incrementing for each branching construct: - `if`, `for`, `range`, `case` (non-default), `comm` (non-default) - `&&`, `||` binary expressions - `type switch`, `select` There is also `AnalyseComplexitySource(src, filename, threshold)` for testing without file I/O. ## Coverage Tracking (pkg/lint/coverage.go) The coverage subsystem supports: - **Parsing** Go coverage output (`ParseCoverProfile` for `-coverprofile` format, `ParseCoverOutput` for `-cover` output) - **Snapshotting** via `CoverageSnapshot` (timestamp, per-package percentages, metadata) - **Persistence** via `CoverageStore` (JSON file-backed append-only store) - **Regression detection** via `CompareCoverage(previous, current)` which returns a `CoverageComparison` with regressions, improvements, new packages, and removed packages ## Vulnerability Checking (pkg/lint/vulncheck.go) `VulnCheck` wraps `govulncheck -json` and parses its newline-delimited JSON output into structured `VulnFinding` objects. The parser handles three message types from govulncheck's wire format: - `config` -- extracts the module path - `osv` -- stores vulnerability metadata (ID, aliases, summary, affected ranges) - `finding` -- maps OSV IDs to call traces and affected packages ## Toolkit (pkg/lint/tools.go) The `Toolkit` struct wraps common developer commands into structured Go APIs. It executes subprocesses and parses their output: | Method | Wraps | Returns | |--------|-------|---------| | `FindTODOs(dir)` | `git grep` | `[]TODO` | | `Lint(pkg)` | `go vet` | `[]ToolFinding` | | `Coverage(pkg)` | `go test -cover` | `[]CoverageReport` | | `RaceDetect(pkg)` | `go test -race` | `[]RaceCondition` | | `AuditDeps()` | `govulncheck` (text) | `[]Vulnerability` | | `ScanSecrets(dir)` | `gitleaks` | `[]SecretLeak` | | `GocycloComplexity(threshold)` | `gocyclo` | `[]ComplexFunc` | | `DepGraph(pkg)` | `go mod graph` | `*Graph` | | `GitLog(n)` | `git log` | `[]Commit` | | `DiffStat()` | `git diff --stat` | `DiffSummary` | | `UncommittedFiles()` | `git status` | `[]string` | | `Build(targets...)` | `go build` | `[]BuildResult` | | `TestCount(pkg)` | `go test -list` | `int` | | `CheckPerms(dir)` | `filepath.Walk` | `[]PermIssue` | | `ModTidy()` | `go mod tidy` | `error` | All methods use the `Run(name, args...)` helper which captures stdout, stderr, and exit code. ## PHP Quality Pipeline (pkg/php) The `pkg/php` package provides structured wrappers around PHP ecosystem tools. Each tool has: 1. **Detection** -- checks for config files and vendor binaries (e.g., `DetectAnalyser`, `DetectPsalm`, `DetectRector`) 2. **Options struct** -- configures the tool run 3. **Execution function** -- builds the command, runs it, and returns structured results ### Supported Tools | Function | Tool | Purpose | |----------|------|---------| | `Format()` | Laravel Pint | Code style formatting | | `Analyse()` | PHPStan / Larastan | Static analysis | | `RunPsalm()` | Psalm | Type-level static analysis | | `RunAudit()` | Composer audit + npm audit | Dependency vulnerability scanning | | `RunSecurityChecks()` | Built-in checks | .env exposure, debug mode, filesystem security | | `RunRector()` | Rector | Automated code refactoring | | `RunInfection()` | Infection | Mutation testing | | `RunTests()` | Pest / PHPUnit | Test execution | ### QA Pipeline The pipeline system (`pipeline.go` + `runner.go`) organises checks into three stages: - **Quick** -- audit, fmt, stan (fast, run on every push) - **Standard** -- psalm (if available), test - **Full** -- rector, infection (slow, run in full QA) The `QARunner` builds `process.RunSpec` objects with dependency ordering (e.g., `stan` runs after `fmt`, `test` runs after `stan`). This allows future parallelisation while respecting ordering constraints. ### Project Detection (pkg/detect) The `detect` package identifies project types by checking for marker files: - `go.mod` present => Go project - `composer.json` present => PHP project `DetectAll(dir)` returns all detected types, enabling polyglot project support. ## QA Command Layer (cmd/qa) The `cmd/qa` package provides workflow-level commands that integrate with GitHub via the `gh` CLI: - **watch** -- polls GitHub Actions for a specific commit, shows real-time status, drills into failure details (failed job, step, error line from logs) - **review** -- fetches open PRs, analyses CI status, review decisions, and merge readiness, suggests next actions - **health** -- scans all repos in a `repos.yaml` registry, reports aggregate CI health with pass rates - **issues** -- fetches issues across repos, categorises them (needs response, ready, blocked, triage), prioritises by labels and activity - **docblock** -- parses Go source with `go/ast`, counts exported symbols with and without doc comments, enforces a coverage threshold Commands register themselves via `cli.RegisterCommands` in an `init()` function, making them available when the package is imported. ## Extension Points ### Adding New Rules Create a new YAML file in `catalog/` following the schema: ```yaml - id: go-xxx-001 title: "Description of the issue" severity: medium # info, low, medium, high, critical languages: [go] tags: [security] pattern: 'regex-pattern' exclude_pattern: 'false-positive-filter' fix: "How to fix the issue" detection: regex auto_fixable: false example_bad: 'problematic code' example_good: 'corrected code' ``` The file will be embedded automatically on the next build. ### Adding New Detection Types The `Detection` field on `Rule` currently supports `"regex"`. The `Matcher` skips non-regex rules, so adding a new detection type (e.g., `"ast"` for Go AST patterns) requires: 1. Adding the new type to the `Validate()` method 2. Creating a new matcher implementation 3. Integrating it into `Scanner.ScanDir()` ### Loading External Catalogs Use `LoadDir(path)` to load rules from a directory on disk rather than the embedded catalog: ```go cat, err := lintpkg.LoadDir("/path/to/custom/rules") ``` This allows organisations to maintain private rule sets alongside the built-in catalog.