core/lint

Snider e876b62045 docs: add human-friendly documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-11 13:02:40 +00:00

12 KiB

Raw Blame History

title	description
Architecture	Internal design of core/lint -- types, data flow, and extension points

Architecture

This document explains how core/lint works internally. It covers the core library (pkg/lint), the PHP quality pipeline (pkg/php), and the QA command layer (cmd/qa).

Overview

The system is organised into three layers:

cmd/core-lint     CLI entry point (lint check, lint catalog)
cmd/qa            QA workflow commands (watch, review, health, issues, PHP tools)
   |
pkg/lint          Core library: rules, catalog, matcher, scanner, reporting
pkg/php           PHP tool wrappers: format, analyse, audit, security, test
pkg/detect        Project type detection
   |
catalog/*.yaml    Embedded rule definitions

The root lint.go file ties the catalog layer to the library:

//go:embed catalog/*.yaml
var catalogFS embed.FS

func LoadEmbeddedCatalog() (*lintpkg.Catalog, error) {
    return lintpkg.LoadFS(catalogFS, "catalog")
}

This means all YAML rules are baked into the binary at compile time. There are no runtime file lookups.

Core Types (pkg/lint)

Rule

A Rule represents a single lint check loaded from YAML. Key fields:

type Rule struct {
    ID             string   `yaml:"id"`
    Title          string   `yaml:"title"`
    Severity       string   `yaml:"severity"`        // info, low, medium, high, critical
    Languages      []string `yaml:"languages"`        // e.g. ["go"], ["go", "php"]
    Tags           []string `yaml:"tags"`             // e.g. ["security", "injection"]
    Pattern        string   `yaml:"pattern"`          // Regex pattern to match
    ExcludePattern string   `yaml:"exclude_pattern"`  // Regex to suppress false positives
    Fix            string   `yaml:"fix"`              // Human-readable remediation
    Detection      string   `yaml:"detection"`        // "regex" (extensible to other types)
    AutoFixable    bool     `yaml:"auto_fixable"`
    ExampleBad     string   `yaml:"example_bad"`
    ExampleGood    string   `yaml:"example_good"`
    FoundIn        []string `yaml:"found_in"`         // Repos where pattern was observed
    FirstSeen      string   `yaml:"first_seen"`
}

Each rule validates itself via Validate(), which checks required fields and compiles regex patterns. Severity is constrained to five levels: info, low, medium, high, critical.

Catalog

A Catalog is a flat collection of rules with query methods:

ForLanguage(lang) -- returns rules targeting a specific language
AtSeverity(threshold) -- returns rules at or above a severity level
ByID(id) -- looks up a single rule

Loading is done via LoadDir(dir) for filesystem paths or LoadFS(fsys, dir) for embedded filesystems. Both read all .yaml files in the directory and parse them into []Rule.

Matcher

The Matcher is the regex execution engine. It pre-compiles all regex-detection rules into compiledRule structs:

type compiledRule struct {
    rule    Rule
    pattern *regexp.Regexp
    exclude *regexp.Regexp
}

NewMatcher(rules) compiles patterns once. Match(filename, content) then scans line by line:

For each compiled rule, check if the filename itself matches the exclude pattern (e.g., skip _test.go files).
For each line, test against the rule's pattern.
If the line matches, check the exclude pattern to suppress false positives.
Emit a Finding with file, line number, matched text, and remediation advice.

Non-regex detection types are silently skipped, allowing the catalog schema to support future detection mechanisms (AST, semantic) without breaking the matcher.

Scanner

The Scanner orchestrates directory walking and language-aware matching:

Walk the directory tree, skipping excluded directories (vendor, node_modules, .git, testdata, .core).
For each file, detect its language from the file extension using DetectLanguage().
Filter the rule set to only rules targeting that language.
Build a language-scoped Matcher and run it against the file content.

Supported language extensions:

Extension	Language
`.go`	go
`.php`	php
`.ts`, `.tsx`	ts
`.js`, `.jsx`	js
`.cpp`, `.cc`, `.c`, `.h`	cpp
`.py`	py

Finding

A Finding is the output of a match:

type Finding struct {
    RuleID   string `json:"rule_id"`
    Title    string `json:"title"`
    Severity string `json:"severity"`
    File     string `json:"file"`
    Line     int    `json:"line"`
    Match    string `json:"match"`
    Fix      string `json:"fix"`
    Repo     string `json:"repo,omitempty"`
}

Report

The report.go file provides three output formats:

WriteText(w, findings) -- human-readable: file:line [severity] title (rule-id)
WriteJSON(w, findings) -- pretty-printed JSON array
WriteJSONL(w, findings) -- newline-delimited JSON (one object per line)

Summarise(findings) aggregates counts by severity.

Data Flow

A typical scan follows this path:

YAML files ──> LoadFS() ──> Catalog{Rules}
                                |
                     ForLanguage() / AtSeverity()
                                |
                           []Rule (filtered)
                                |
                          NewScanner(rules)
                                |
                  ScanDir(root) / ScanFile(path)
                                |
                ┌───────────────┼───────────────┐
                │  Walk tree    │  Detect lang   │
                │  Skip dirs    │  Filter rules  │
                │               │  NewMatcher()  │
                │               │  Match()       │
                └───────────────┴───────────────┘
                                |
                          []Finding
                                |
              WriteText() / WriteJSON() / WriteJSONL()

Cyclomatic Complexity Analysis (pkg/lint/complexity.go)

The module includes a native Go AST-based cyclomatic complexity analyser. It uses go/parser and go/ast -- no external tools required.

results, err := lint.AnalyseComplexity(lint.ComplexityConfig{
    Threshold: 15,
    Path:      "./pkg/...",
})

Complexity is calculated by starting at 1 and incrementing for each branching construct:

if, for, range, case (non-default), comm (non-default)
&&, || binary expressions
type switch, select

There is also AnalyseComplexitySource(src, filename, threshold) for testing without file I/O.

Coverage Tracking (pkg/lint/coverage.go)

The coverage subsystem supports:

Parsing Go coverage output (ParseCoverProfile for -coverprofile format, ParseCoverOutput for -cover output)
Snapshotting via CoverageSnapshot (timestamp, per-package percentages, metadata)
Persistence via CoverageStore (JSON file-backed append-only store)
Regression detection via CompareCoverage(previous, current) which returns a CoverageComparison with regressions, improvements, new packages, and removed packages

Vulnerability Checking (pkg/lint/vulncheck.go)

VulnCheck wraps govulncheck -json and parses its newline-delimited JSON output into structured VulnFinding objects. The parser handles three message types from govulncheck's wire format:

config -- extracts the module path
osv -- stores vulnerability metadata (ID, aliases, summary, affected ranges)
finding -- maps OSV IDs to call traces and affected packages

Toolkit (pkg/lint/tools.go)

The Toolkit struct wraps common developer commands into structured Go APIs. It executes subprocesses and parses their output:

Method	Wraps	Returns
`FindTODOs(dir)`	`git grep`	`[]TODO`
`Lint(pkg)`	`go vet`	`[]ToolFinding`
`Coverage(pkg)`	`go test -cover`	`[]CoverageReport`
`RaceDetect(pkg)`	`go test -race`	`[]RaceCondition`
`AuditDeps()`	`govulncheck` (text)	`[]Vulnerability`
`ScanSecrets(dir)`	`gitleaks`	`[]SecretLeak`
`GocycloComplexity(threshold)`	`gocyclo`	`[]ComplexFunc`
`DepGraph(pkg)`	`go mod graph`	`*Graph`
`GitLog(n)`	`git log`	`[]Commit`
`DiffStat()`	`git diff --stat`	`DiffSummary`
`UncommittedFiles()`	`git status`	`[]string`
`Build(targets...)`	`go build`	`[]BuildResult`
`TestCount(pkg)`	`go test -list`	`int`
`CheckPerms(dir)`	`filepath.Walk`	`[]PermIssue`
`ModTidy()`	`go mod tidy`	`error`

All methods use the Run(name, args...) helper which captures stdout, stderr, and exit code.

PHP Quality Pipeline (pkg/php)

The pkg/php package provides structured wrappers around PHP ecosystem tools. Each tool has:

Detection -- checks for config files and vendor binaries (e.g., DetectAnalyser, DetectPsalm, DetectRector)
Options struct -- configures the tool run
Execution function -- builds the command, runs it, and returns structured results

Supported Tools

Function	Tool	Purpose
`Format()`	Laravel Pint	Code style formatting
`Analyse()`	PHPStan / Larastan	Static analysis
`RunPsalm()`	Psalm	Type-level static analysis
`RunAudit()`	Composer audit + npm audit	Dependency vulnerability scanning
`RunSecurityChecks()`	Built-in checks	.env exposure, debug mode, filesystem security
`RunRector()`	Rector	Automated code refactoring
`RunInfection()`	Infection	Mutation testing
`RunTests()`	Pest / PHPUnit	Test execution

QA Pipeline

The pipeline system (pipeline.go + runner.go) organises checks into three stages:

Quick -- audit, fmt, stan (fast, run on every push)
Standard -- psalm (if available), test
Full -- rector, infection (slow, run in full QA)

The QARunner builds process.RunSpec objects with dependency ordering (e.g., stan runs after fmt, test runs after stan). This allows future parallelisation while respecting ordering constraints.

Project Detection (pkg/detect)

The detect package identifies project types by checking for marker files:

go.mod present => Go project
composer.json present => PHP project

DetectAll(dir) returns all detected types, enabling polyglot project support.

QA Command Layer (cmd/qa)

The cmd/qa package provides workflow-level commands that integrate with GitHub via the gh CLI:

watch -- polls GitHub Actions for a specific commit, shows real-time status, drills into failure details (failed job, step, error line from logs)
review -- fetches open PRs, analyses CI status, review decisions, and merge readiness, suggests next actions
health -- scans all repos in a repos.yaml registry, reports aggregate CI health with pass rates
issues -- fetches issues across repos, categorises them (needs response, ready, blocked, triage), prioritises by labels and activity
docblock -- parses Go source with go/ast, counts exported symbols with and without doc comments, enforces a coverage threshold

Commands register themselves via cli.RegisterCommands in an init() function, making them available when the package is imported.

Extension Points

Adding New Rules

Create a new YAML file in catalog/ following the schema:

- id: go-xxx-001
  title: "Description of the issue"
  severity: medium             # info, low, medium, high, critical
  languages: [go]
  tags: [security]
  pattern: 'regex-pattern'
  exclude_pattern: 'false-positive-filter'
  fix: "How to fix the issue"
  detection: regex
  auto_fixable: false
  example_bad: 'problematic code'
  example_good: 'corrected code'

The file will be embedded automatically on the next build.

Adding New Detection Types

The Detection field on Rule currently supports "regex". The Matcher skips non-regex rules, so adding a new detection type (e.g., "ast" for Go AST patterns) requires:

Adding the new type to the Validate() method
Creating a new matcher implementation
Integrating it into Scanner.ScanDir()

Loading External Catalogs

Use LoadDir(path) to load rules from a directory on disk rather than the embedded catalog:

cat, err := lintpkg.LoadDir("/path/to/custom/rules")

This allows organisations to maintain private rule sets alongside the built-in catalog.

12 KiB Raw Blame History