lint/docs/architecture.md
Snider e876b62045 docs: add human-friendly documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 13:02:40 +00:00

12 KiB

title description
Architecture Internal design of core/lint -- types, data flow, and extension points

Architecture

This document explains how core/lint works internally. It covers the core library (pkg/lint), the PHP quality pipeline (pkg/php), and the QA command layer (cmd/qa).

Overview

The system is organised into three layers:

cmd/core-lint     CLI entry point (lint check, lint catalog)
cmd/qa            QA workflow commands (watch, review, health, issues, PHP tools)
   |
pkg/lint          Core library: rules, catalog, matcher, scanner, reporting
pkg/php           PHP tool wrappers: format, analyse, audit, security, test
pkg/detect        Project type detection
   |
catalog/*.yaml    Embedded rule definitions

The root lint.go file ties the catalog layer to the library:

//go:embed catalog/*.yaml
var catalogFS embed.FS

func LoadEmbeddedCatalog() (*lintpkg.Catalog, error) {
    return lintpkg.LoadFS(catalogFS, "catalog")
}

This means all YAML rules are baked into the binary at compile time. There are no runtime file lookups.

Core Types (pkg/lint)

Rule

A Rule represents a single lint check loaded from YAML. Key fields:

type Rule struct {
    ID             string   `yaml:"id"`
    Title          string   `yaml:"title"`
    Severity       string   `yaml:"severity"`        // info, low, medium, high, critical
    Languages      []string `yaml:"languages"`        // e.g. ["go"], ["go", "php"]
    Tags           []string `yaml:"tags"`             // e.g. ["security", "injection"]
    Pattern        string   `yaml:"pattern"`          // Regex pattern to match
    ExcludePattern string   `yaml:"exclude_pattern"`  // Regex to suppress false positives
    Fix            string   `yaml:"fix"`              // Human-readable remediation
    Detection      string   `yaml:"detection"`        // "regex" (extensible to other types)
    AutoFixable    bool     `yaml:"auto_fixable"`
    ExampleBad     string   `yaml:"example_bad"`
    ExampleGood    string   `yaml:"example_good"`
    FoundIn        []string `yaml:"found_in"`         // Repos where pattern was observed
    FirstSeen      string   `yaml:"first_seen"`
}

Each rule validates itself via Validate(), which checks required fields and compiles regex patterns. Severity is constrained to five levels: info, low, medium, high, critical.

Catalog

A Catalog is a flat collection of rules with query methods:

  • ForLanguage(lang) -- returns rules targeting a specific language
  • AtSeverity(threshold) -- returns rules at or above a severity level
  • ByID(id) -- looks up a single rule

Loading is done via LoadDir(dir) for filesystem paths or LoadFS(fsys, dir) for embedded filesystems. Both read all .yaml files in the directory and parse them into []Rule.

Matcher

The Matcher is the regex execution engine. It pre-compiles all regex-detection rules into compiledRule structs:

type compiledRule struct {
    rule    Rule
    pattern *regexp.Regexp
    exclude *regexp.Regexp
}

NewMatcher(rules) compiles patterns once. Match(filename, content) then scans line by line:

  1. For each compiled rule, check if the filename itself matches the exclude pattern (e.g., skip _test.go files).
  2. For each line, test against the rule's pattern.
  3. If the line matches, check the exclude pattern to suppress false positives.
  4. Emit a Finding with file, line number, matched text, and remediation advice.

Non-regex detection types are silently skipped, allowing the catalog schema to support future detection mechanisms (AST, semantic) without breaking the matcher.

Scanner

The Scanner orchestrates directory walking and language-aware matching:

  1. Walk the directory tree, skipping excluded directories (vendor, node_modules, .git, testdata, .core).
  2. For each file, detect its language from the file extension using DetectLanguage().
  3. Filter the rule set to only rules targeting that language.
  4. Build a language-scoped Matcher and run it against the file content.

Supported language extensions:

Extension Language
.go go
.php php
.ts, .tsx ts
.js, .jsx js
.cpp, .cc, .c, .h cpp
.py py

Finding

A Finding is the output of a match:

type Finding struct {
    RuleID   string `json:"rule_id"`
    Title    string `json:"title"`
    Severity string `json:"severity"`
    File     string `json:"file"`
    Line     int    `json:"line"`
    Match    string `json:"match"`
    Fix      string `json:"fix"`
    Repo     string `json:"repo,omitempty"`
}

Report

The report.go file provides three output formats:

  • WriteText(w, findings) -- human-readable: file:line [severity] title (rule-id)
  • WriteJSON(w, findings) -- pretty-printed JSON array
  • WriteJSONL(w, findings) -- newline-delimited JSON (one object per line)

Summarise(findings) aggregates counts by severity.

Data Flow

A typical scan follows this path:

YAML files ──> LoadFS() ──> Catalog{Rules}
                                |
                     ForLanguage() / AtSeverity()
                                |
                           []Rule (filtered)
                                |
                          NewScanner(rules)
                                |
                  ScanDir(root) / ScanFile(path)
                                |
                ┌───────────────┼───────────────┐
                │  Walk tree    │  Detect lang   │
                │  Skip dirs    │  Filter rules  │
                │               │  NewMatcher()  │
                │               │  Match()       │
                └───────────────┴───────────────┘
                                |
                          []Finding
                                |
              WriteText() / WriteJSON() / WriteJSONL()

Cyclomatic Complexity Analysis (pkg/lint/complexity.go)

The module includes a native Go AST-based cyclomatic complexity analyser. It uses go/parser and go/ast -- no external tools required.

results, err := lint.AnalyseComplexity(lint.ComplexityConfig{
    Threshold: 15,
    Path:      "./pkg/...",
})

Complexity is calculated by starting at 1 and incrementing for each branching construct:

  • if, for, range, case (non-default), comm (non-default)
  • &&, || binary expressions
  • type switch, select

There is also AnalyseComplexitySource(src, filename, threshold) for testing without file I/O.

Coverage Tracking (pkg/lint/coverage.go)

The coverage subsystem supports:

  • Parsing Go coverage output (ParseCoverProfile for -coverprofile format, ParseCoverOutput for -cover output)
  • Snapshotting via CoverageSnapshot (timestamp, per-package percentages, metadata)
  • Persistence via CoverageStore (JSON file-backed append-only store)
  • Regression detection via CompareCoverage(previous, current) which returns a CoverageComparison with regressions, improvements, new packages, and removed packages

Vulnerability Checking (pkg/lint/vulncheck.go)

VulnCheck wraps govulncheck -json and parses its newline-delimited JSON output into structured VulnFinding objects. The parser handles three message types from govulncheck's wire format:

  • config -- extracts the module path
  • osv -- stores vulnerability metadata (ID, aliases, summary, affected ranges)
  • finding -- maps OSV IDs to call traces and affected packages

Toolkit (pkg/lint/tools.go)

The Toolkit struct wraps common developer commands into structured Go APIs. It executes subprocesses and parses their output:

Method Wraps Returns
FindTODOs(dir) git grep []TODO
Lint(pkg) go vet []ToolFinding
Coverage(pkg) go test -cover []CoverageReport
RaceDetect(pkg) go test -race []RaceCondition
AuditDeps() govulncheck (text) []Vulnerability
ScanSecrets(dir) gitleaks []SecretLeak
GocycloComplexity(threshold) gocyclo []ComplexFunc
DepGraph(pkg) go mod graph *Graph
GitLog(n) git log []Commit
DiffStat() git diff --stat DiffSummary
UncommittedFiles() git status []string
Build(targets...) go build []BuildResult
TestCount(pkg) go test -list int
CheckPerms(dir) filepath.Walk []PermIssue
ModTidy() go mod tidy error

All methods use the Run(name, args...) helper which captures stdout, stderr, and exit code.

PHP Quality Pipeline (pkg/php)

The pkg/php package provides structured wrappers around PHP ecosystem tools. Each tool has:

  1. Detection -- checks for config files and vendor binaries (e.g., DetectAnalyser, DetectPsalm, DetectRector)
  2. Options struct -- configures the tool run
  3. Execution function -- builds the command, runs it, and returns structured results

Supported Tools

Function Tool Purpose
Format() Laravel Pint Code style formatting
Analyse() PHPStan / Larastan Static analysis
RunPsalm() Psalm Type-level static analysis
RunAudit() Composer audit + npm audit Dependency vulnerability scanning
RunSecurityChecks() Built-in checks .env exposure, debug mode, filesystem security
RunRector() Rector Automated code refactoring
RunInfection() Infection Mutation testing
RunTests() Pest / PHPUnit Test execution

QA Pipeline

The pipeline system (pipeline.go + runner.go) organises checks into three stages:

  • Quick -- audit, fmt, stan (fast, run on every push)
  • Standard -- psalm (if available), test
  • Full -- rector, infection (slow, run in full QA)

The QARunner builds process.RunSpec objects with dependency ordering (e.g., stan runs after fmt, test runs after stan). This allows future parallelisation while respecting ordering constraints.

Project Detection (pkg/detect)

The detect package identifies project types by checking for marker files:

  • go.mod present => Go project
  • composer.json present => PHP project

DetectAll(dir) returns all detected types, enabling polyglot project support.

QA Command Layer (cmd/qa)

The cmd/qa package provides workflow-level commands that integrate with GitHub via the gh CLI:

  • watch -- polls GitHub Actions for a specific commit, shows real-time status, drills into failure details (failed job, step, error line from logs)
  • review -- fetches open PRs, analyses CI status, review decisions, and merge readiness, suggests next actions
  • health -- scans all repos in a repos.yaml registry, reports aggregate CI health with pass rates
  • issues -- fetches issues across repos, categorises them (needs response, ready, blocked, triage), prioritises by labels and activity
  • docblock -- parses Go source with go/ast, counts exported symbols with and without doc comments, enforces a coverage threshold

Commands register themselves via cli.RegisterCommands in an init() function, making them available when the package is imported.

Extension Points

Adding New Rules

Create a new YAML file in catalog/ following the schema:

- id: go-xxx-001
  title: "Description of the issue"
  severity: medium             # info, low, medium, high, critical
  languages: [go]
  tags: [security]
  pattern: 'regex-pattern'
  exclude_pattern: 'false-positive-filter'
  fix: "How to fix the issue"
  detection: regex
  auto_fixable: false
  example_bad: 'problematic code'
  example_good: 'corrected code'

The file will be embedded automatically on the next build.

Adding New Detection Types

The Detection field on Rule currently supports "regex". The Matcher skips non-regex rules, so adding a new detection type (e.g., "ast" for Go AST patterns) requires:

  1. Adding the new type to the Validate() method
  2. Creating a new matcher implementation
  3. Integrating it into Scanner.ScanDir()

Loading External Catalogs

Use LoadDir(path) to load rules from a directory on disk rather than the embedded catalog:

cat, err := lintpkg.LoadDir("/path/to/custom/rules")

This allows organisations to maintain private rule sets alongside the built-in catalog.