go-scm/docs/security/scan-attack-vector-mapping-report.md
Virgil a54bd834ff fix(agentci): restore security helpers and map attack vectors
Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-24 13:15:55 +00:00

22 KiB

Security Attack Vector Mapping Report

Execution plan in docs/plans/2026-03-24-security-attack-vector-mapping.md, results in docs/security/scan-attack-vector-mapping-report.md.

  • Issue: [scan] Security attack vector mapping
  • Repo: core/go-scm
  • Branch: agent/create-an-implementation-plan-for-this-s
  • Date: 2026-03-24
  • Agent: Codex

Findings Matrix

Status meanings:

  • open: validation gap or exploitable flow remains.
  • validated: explicit hardening exists and was verified in code.
  • mapped: flow is real, but the current risk is limited or mostly delegated to downstream tools.
file:line | input source | flows into | validation | attack vector | severity | confidence | verification
agentci/config.go:34 | config file `agentci.agents.*` (`host`, `queue_dir`, `forgejo_user`, models, runner) | `LoadAgents()` -> `Spinner.FindByForgejoUser()` -> `jobrunner/handlers.DispatchHandler` SSH target selection | only presence/default checks; queue dir is validated later, host is never allowlisted | SSH target manipulation / unauthorised agent routing if config is attacker-controlled | medium | med | mapped
cmd/collect/cmd.go:56 | CLI flag `--output` | `newConfig()` -> `collect.NewConfigWithMedium()` -> `cfg.OutputDir` and `.collect-state.json` | no canonicalisation or root policy | arbitrary local file write/read root for all collectors and processors | low | high | open
collect/bitcointalk.go:64 | positional `<topic-id>` via `cmd/collect/cmd_bitcointalk.go` | `BitcoinTalkCollector.Collect()` -> `filepath.Join(cfg.OutputDir, "bitcointalk", topicID, "posts")` -> `EnsureDir`/`Write` | only non-empty topic ID required | path traversal / arbitrary file write | high | high | open
collect/bitcointalk.go:89 | positional `<topic-id>` via `cmd/collect/cmd_bitcointalk.go` | `fmt.Sprintf("https://bitcointalk.org/index.php?topic=%s.%d", topicID, offset)` -> `http.NewRequestWithContext` | no query escaping or numeric check | malformed request / same-host request steering | medium | med | open
collect/github.go:123 | CLI `org/repo`, `--org`, or `excavate <project>` fallback org | `exec.CommandContext("gh", "repo", "list", g.Org, ...)` | no org format validation; `exec.CommandContext` blocks shell metachar expansion | scope expansion / excessive API enumeration; low injection surface | low | med | mapped
collect/github.go:179 | CLI `org/repo` and GitHub repo names from `gh repo list` | `filepath.Join(cfg.OutputDir, "github", g.Org, repo, "issues")` -> `EnsureDir`/`Write` | no path-segment validation on `g.Org` or `repo` | path traversal / arbitrary file write | high | high | open
collect/github.go:238 | CLI `org/repo` and GitHub repo names from `gh repo list` | `filepath.Join(cfg.OutputDir, "github", g.Org, repo, "pulls")` -> `EnsureDir`/`Write` | no path-segment validation on `g.Org` or `repo` | path traversal / arbitrary file write | high | high | open
collect/github.go:284 | GitHub API `issue.body` / `pr.body` | `formatIssueMarkdown()` -> markdown files under `cfg.Output` | no sanitisation before persistence | stored markdown/HTML/prompt injection in downstream renderers or LLM ingestion | medium | high | open
collect/market.go:84 | CLI `<coin>` via `cmd/collect/cmd_market.go` | `filepath.Join(cfg.OutputDir, "market", m.CoinID)` -> `EnsureDir` | only non-empty `CoinID` check | path traversal / arbitrary file write | high | high | open
collect/market.go:132 | CLI `<coin>` via `cmd/collect/cmd_market.go` | `fmt.Sprintf("%s/coins/%s", coinGeckoBaseURL, m.CoinID)` -> `fetchJSON()` | no path escaping or allowlist for `CoinID` | same-host endpoint injection / SSRF-style path steering | medium | high | open
collect/market.go:184 | CLI `<coin>` and `--from` | `fmt.Sprintf("%s/coins/%s/market_chart?vs_currency=usd&days=%s", ...)` -> `fetchJSON()` | `FromDate` parsing bounds `days`; `CoinID` remains unescaped | same-host endpoint injection / excessive upstream query shaping | medium | high | open
collect/papers.go:113 | CLI `--query` | `url.QueryEscape(p.Query)` -> IACR search request | query is escaped before request construction | query injection blocked; residual risk is upstream content trust only | low | high | validated
collect/papers.go:145 | IACR HTML `href` values parsed from remote search results | `parseIACREntry()` -> derived `ppr.ID` -> `filepath.Join(baseDir, ppr.ID+".md")` | no filename sanitisation on remote-derived ID | path traversal / arbitrary file write if upstream HTML is hostile or compromised | high | med | open
collect/papers.go:199 | CLI `--query` and `--category` | `url.QueryEscape(...)` -> arXiv API request, then `arxivEntryToPaper()` replaces `/` and `:` in IDs | query and category are escaped; arXiv IDs are partially canonicalised | request injection largely blocked; residual risk is remote content trust | low | high | validated
collect/process.go:50 | CLI `<dir>` via `cmd/collect/cmd_process.go` | `cfg.Output.List(p.Dir)` and `cfg.Output.Read(filepath.Join(p.Dir, name))` | no root containment or canonicalisation of `p.Dir` | arbitrary local read/list when using `io.Local` medium | high | high | open
collect/process.go:55 | CLI `<source>` via `cmd/collect/cmd_process.go` | `filepath.Join(cfg.OutputDir, "processed", p.Source)` -> `EnsureDir`/`Write` | no path validation on `p.Source` | path traversal / arbitrary file write | high | high | open
collect/process.go:214 | source HTML `href` attributes from files under `p.Dir` | `nodeToMarkdown()` -> `[text](href)` persisted to markdown | no scheme allowlist or URL sanitisation | stored `javascript:` / `data:` link injection | medium | high | open
collect/ratelimit.go:44 | concurrent caller-controlled source keys | `RateLimiter.Wait()` -> shared `last[source]` map | mutex released before slot reservation; no second eligibility check after sleep | burst bypass / API quota exhaustion / availability degradation | medium | high | open
forge/config.go:40 | config file `forge.url`, `forge.token`, env `FORGE_URL`, `FORGE_TOKEN`, CLI overrides | `ResolveConfig()` -> `forge.New()` -> Forgejo SDK client | token required, URL not allowlisted or canonicalised | SSRF / credential exfiltration to attacker-controlled Forgejo endpoint | high | high | open
cmd/forge/cmd_migrate.go:46 | CLI `<clone-url>`, `--service`, `--token`, `--org` | `runMigrate()` -> `forge.Client.MigrateRepo()` with `CloneAddr` and `AuthToken` | repo name is derived from URL tail only; clone URL itself is not constrained | server-side clone SSRF / internal repo import / credential misuse | high | high | open
cmd/forge/cmd_sync.go:95 | CLI repo args and `--base-path` | `syncRepoNameFromArg()` + `ResolvePathWithinRoot()` -> `git -C <localPath>` / `git push` | repo names validated via `ValidatePathElement`; local path confined to base path | path traversal mitigated; residual risk limited to untrusted Forge URL config | low | high | validated
gitea/config.go:40 | config file `gitea.url`, `gitea.token`, env `GITEA_URL`, `GITEA_TOKEN`, CLI overrides | `ResolveConfig()` -> `gitea.New()` -> Gitea SDK client | token required, URL not allowlisted or canonicalised | SSRF / credential exfiltration to attacker-controlled Gitea endpoint | high | high | open
cmd/gitea/cmd_mirror.go:45 | CLI `<github-owner/repo>` and optional GitHub token | `runMirror()` -> fixed `https://github.com/<owner>/<repo>.git` -> `gitea.CreateMirror()` | owner/repo are split, but not allowlisted; source token may be forwarded | server-side clone of attacker-chosen repo and token relay to remote Gitea | high | med | open
cmd/gitea/cmd_sync.go:95 | CLI repo args and `--base-path` | `repoNameFromArg()` + `ResolvePathWithinRoot()` -> `git -C <localPath>` / `git push` | repo names validated via `ValidatePathElement`; local path confined to base path | path traversal mitigated; residual risk limited to untrusted Gitea URL config | low | high | validated
jobrunner/forgejo/signals.go:16 | epic issue body checklist text | `parseEpicChildren()` -> child issue numbers -> `pollRepo()` dispatch decisions | regex only checks `#<digits>`; no repository or ownership cross-check | workflow spoofing / unintended child issue dispatch | medium | high | open
jobrunner/forgejo/signals.go:36 | PR body text | `findLinkedPR()` -> child issue association -> downstream handler execution | first `#<n>` match wins; no keyword, owner, or repo validation | PR-body spoofing causes wrong issue state transitions | medium | high | open
jobrunner/forgejo/source.go:72 | handler `result.Error` strings | `Report()` -> Forgejo issue comment code block | no sanitisation, truncation, or mention filtering | comment injection / mention spam / log forging of pipeline output | medium | high | open
jobrunner/handlers/dispatch.go:89 | agent config `queue_dir` plus `signal.RepoOwner` / `signal.RepoName` | `Execute()` -> remote queue path building and Forgejo label/comment operations | remote dir is validated; repo owner/name still pass through `SanitizePath(filepath.Base(...))` | retargeting/confusion if a forged signal supplies slash-containing names | medium | med | open
jobrunner/handlers/dispatch.go:151 | issue title/body, assignee, repo metadata from `PipelineSignal` | `DispatchTicket` JSON -> remote agent queue file | JSON encoding only; content is not filtered, signed, or policy-checked | prompt injection / remote agent control-surface abuse | high | high | open
jobrunner/handlers/dispatch.go:212 | local Forge token | `.env.<ticketID>` content -> remote worker filesystem | remote path validated and file mode set to `0600` after transfer | credential exposure on remote worker host, snapshots, or backups | medium | high | mapped
jobrunner/handlers/dispatch.go:255 | runtime `reason` strings and raw `signal.RepoOwner` / `signal.RepoName` | `failDispatch()` -> labels and issue comments on Forgejo | no sanitisation; fail path uses raw signal repo fields | comment injection and repo-retargeting if upstream signal is forged | medium | med | open
jobrunner/handlers/completion.go:58 | agent completion `signal.Message` / `signal.Error` | `CreateIssueComment()` on child issue | no sanitisation, truncation, or mention filtering | stored comment injection / mention spam / log forging | medium | high | open
jobrunner/journal.go:130 | `signal.RepoOwner` / `signal.RepoName` | `Append()` -> `<base>/<owner>/<repo>/<date>.jsonl` | strict component regex, absolute-prefix containment check, and mutexed writes | path traversal blocked; safe write pattern verified | low | high | validated
git/service.go:72 | concurrent query/task caller inputs `Paths` and service calls | `handleQuery()` / `handleTask()` -> shared `s.lastStatus` cache | no mutex or atomic protection around `lastStatus` | data race / stale or torn state under concurrent queries | medium | high | open
cmd/scm/cmd_compile.go:54 | CLI `--sign-key` hex string | `hex.DecodeString()` -> `manifest.Compile()` -> `manifest.Sign()` -> `ed25519.Sign()` | hex decoding only; no private key length check | panic-triggered DoS via invalid Ed25519 private key length | high | high | open
cmd/scm/cmd_export.go:28 | CLI `--dir` | `io.NewSandboxed(dir)` -> `manifest.LoadCompiled()` / `manifest.Load()` -> stdout | sandboxed medium confines access to the chosen root | local file disclosure limited to caller-selected directory | low | high | mapped
cmd/scm/cmd_index.go:39 | CLI `--dir`, `--output`, `--base-url`, `--org` | `marketplace.Builder.BuildFromDirs()` / `marketplace.WriteIndex()` | no path or URL validation beyond OS errors | arbitrary local read/write and untrusted repo URL generation in produced index | medium | high | open
pkg/api/provider.go:202 | HTTP query params `q` and `category` | `listMarketplace()` -> in-memory `Index.Search()` / `ByCategory()` | no server-side escaping needed; no filesystem or outbound request sink | low-risk search-scope manipulation only | low | high | mapped
pkg/api/provider.go:466 | HTTP path param `:code` | `marketplaceCodeParam()` -> install/update/remove lookups | `PathUnescape()` + `ValidatePathElement()` | path traversal blocked for marketplace code paths | low | high | validated
pkg/api/provider.go:309 | HTTP JSON body `public_key` | `hex.DecodeString()` -> `manifest.Verify()` -> `ed25519.Verify()` | hex decoding only; no public key length check | panic-triggered DoS via malformed Ed25519 public key | high | high | open
pkg/api/provider.go:346 | HTTP JSON body `private_key` | `hex.DecodeString()` -> `manifest.Sign()` -> `ed25519.Sign()` | hex decoding only; no private key length check | panic-triggered DoS via malformed Ed25519 private key | high | high | open
marketplace/installer.go:50 | marketplace index `Module.Code` and `Module.Repo` | `resolveModulePath()` -> `git clone <repo> <dest>` -> manifest load/store | module code is now confined with `ResolvePathWithinRoot`; repo URL is unvalidated | arbitrary git transport / SSRF / local-path clone from untrusted marketplace index | high | high | open
marketplace/installer.go:195 | marketplace index `Module.SignKey` | `hex.DecodeString()` -> `manifest.LoadVerified()` -> `ed25519.Verify()` | hex decoding only; no public key length check | panic-triggered DoS during install/update from malformed marketplace metadata | high | high | open
plugin/installer.go:181 | plugin source `org/repo@version` | `ParseSource()` -> registry name/path selection -> clone arguments | org and repo validated as path elements; version only checked for non-empty string | branch/ref abuse and unexpected checkout target selection | medium | med | mapped
plugin/installer.go:165 | plugin `@version` suffix | `gh repo clone <org/repo> <dest> -- --branch <version>` | `exec.CommandContext` blocks shell injection; no ref allowlist | malicious ref names can force unexpected refs/tags or failure loops | medium | med | open
plugin/loader.go:51 | caller-supplied plugin name | `filepath.Join(baseDir, name, "plugin.json")` -> `LoadManifest()` | no validation on `name` | path traversal / arbitrary local file read | high | high | open
repos/registry.go:66 | caller-supplied registry path | `io.Medium.Read(path)` -> YAML parse | no path validation | arbitrary local file read via caller-chosen registry path | medium | high | open
repos/registry.go:81 | YAML `base_path` and per-repo `path` values | expanded `Repo.Path` -> later `Exists()`, `IsGitRepo()`, git config probing | tilde expansion only; no root confinement | workspace escape / probing of arbitrary directories from untrusted registry | medium | high | open
repos/kbconfig.go:69 | caller-supplied workspace root | `.core/kb.yaml` read/write | no canonicalisation beyond join | arbitrary local read/write under caller-selected root | low | high | mapped
repos/kbconfig.go:110 | KB config `Wiki.Remote`, `Wiki.Dir`, and caller-supplied `repoName` | `WikiRepoURL()` and `WikiLocalPath()` | no validation of remote URL, local subdir, or repo name | path traversal in local wiki path and clone redirection to attacker-chosen remote | medium | high | open

Deduplicated Attack Vectors

  • high Path traversal and arbitrary local read/write from unsanitised path joins in collect/, plugin/loader.go, repos/registry.go, and repos/kbconfig.go.
  • high SSRF, server-side clone, and credential exfiltration via unvalidated Forgejo/Gitea base URLs, migration clone URLs, and marketplace repo URLs.
  • high Panic-triggered availability loss from unchecked Ed25519 key lengths in CLI and HTTP signing/verification flows.
  • high Prompt injection and remote worker influence via raw issue title/body content written into dispatch tickets.
  • medium Stored markdown/comment injection via GitHub bodies, processed HTML links, jobrunner error comments, and completion messages.
  • medium Workflow spoofing from weak issue/PR text parsing in jobrunner/forgejo/signals.go.
  • medium Branch/ref steering and unexpected checkout behaviour in plugin installation.
  • medium Concurrency races in collect/ratelimit.go and git/service.go.
  • low Broad CLI-selected filesystem scope in cmd/collect, cmd/scm export, and repos/kbconfig.go remains expected but should not be delegated to lower-trust callers.
  • low Verified mitigations exist for journal path safety, marketplace code path validation, and sync local-path confinement.

Verification Status

  • Total mapped flows: 50
  • open: 37
  • validated: 6
  • mapped: 7
  • High severity: 18
  • Medium severity: 20
  • Low severity: 12

Subsystem Summary

  • P1 collect/: the highest concentration of exploitable issues is in collector output path construction and persistence of raw third-party content. collect/github.go, collect/market.go, collect/bitcointalk.go, collect/papers.go, and collect/process.go all need path and content hardening.
  • P1 forge/gitea/marketplace/pkg/api: base URL trust, migration/mirror clone sources, and Ed25519 key handling are the main confidentiality and availability risks. These are central control-plane paths and should be fixed before widening API exposure.
  • P2 jobrunner/: epic/PR text parsing, remote dispatch ticket generation, and comment reporting create workflow spoofing and prompt-injection surfaces. The journal implementation is the notable exception and is already hardened.
  • P2 workspace/plugin/repos/git: plugin loading, registry path handling, wiki path generation, and shared git-service state carry medium-risk filesystem and race-condition exposure.
  • P3 validated controls: jobrunner/journal.go, cmd/forge/cmd_sync.go, cmd/gitea/cmd_sync.go, and pkg/api/provider.go marketplace code normalisation are solid reference implementations for future hardening work.

High-Priority Follow-Up

  • Replace every collector and loader filepath.Join(..., userValue, ...) call with strict path-element validation before directory creation or file reads/writes.
  • Add scheme, host, and canonicalisation allowlists for Forgejo/Gitea base URLs, migration clone URLs, marketplace repo URLs, and wiki remotes.
  • Reject invalid Ed25519 key lengths before calling ed25519.Sign or ed25519.Verify in CLI, HTTP, and installer code paths.
  • Treat dispatch tickets as untrusted agent prompts: strip or isolate raw issue bodies, add provenance metadata, and avoid shipping long-lived Forge tokens to remote workers where possible.
  • Harden jobrunner/forgejo linking logic so epic children and PR references must match explicit repo/issue relationships, not bare #123 text.
  • Sanitize or encode user-visible comments and stored markdown links before writing them to Forgejo or output files.
  • Fix collect.RateLimiter.Wait and git.Service.lastStatus with proper reservation and synchronisation.

Scan Coverage

The full scan set from the issue plan was reviewed in package order. Files below either map directly to the matrix above or are forward-only wrappers/helpers with no additional unique external entry point beyond the flows already captured.

  • agentci: clotho.go (strategy helper only), config.go (matrix), security.go (validation utilities only).
  • cmd/collect: cmd.go (matrix), cmd_bitcointalk.go, cmd_excavate.go, cmd_github.go, cmd_market.go, cmd_papers.go, cmd_process.go (all covered by collect/* rows), cmd_dispatch.go (local event enum validation only; no command execution sink).
  • cmd/forge: cmd_auth.go and cmd_config.go feed the forge/config.go row; cmd_migrate.go and cmd_sync.go are in the matrix; cmd_forge.go, cmd_issues.go, cmd_labels.go, cmd_orgs.go, cmd_prs.go, cmd_repos.go, cmd_status.go, helpers.go are CLI/API wrappers only.
  • cmd/gitea: cmd_config.go feeds the gitea/config.go row; cmd_mirror.go and cmd_sync.go are in the matrix; cmd_gitea.go, cmd_issues.go, cmd_prs.go, cmd_repos.go are CLI/API wrappers only.
  • cmd/scm: cmd_compile.go, cmd_export.go, cmd_index.go are in the matrix; cmd_scm.go is a root command wrapper only.
  • collect: bitcointalk.go, collect.go, github.go, market.go, papers.go, process.go, ratelimit.go are in the matrix; events.go and excavate.go add no new sink beyond already-mapped config and collector flows; state.go is subsumed by the shared --output path row.
  • forge: client.go, config.go, repos.go are in the matrix; issues.go, labels.go, meta.go, orgs.go, prs.go, webhooks.go forward typed inputs into the SDK without additional local command/path construction.
  • git: service.go is in the matrix; git.go is command-wrapper logic without additional external entry surface beyond caller-supplied repo paths.
  • gitea: client.go, config.go, repos.go are in the matrix; issues.go and meta.go are SDK wrappers only.
  • jobrunner/forgejo: signals.go and source.go are in the matrix.
  • jobrunner/handlers: completion.go and dispatch.go are in the matrix; enable_auto_merge.go, publish_draft.go, resolve_threads.go, send_fix_command.go, and tick_parent.go consume already-mapped PipelineSignal fields and emit fixed API calls/messages only.
  • jobrunner: journal.go is in the matrix; poller.go and types.go define orchestration and data structures without introducing new external input parsing.
  • locales: embed.go contains no runtime external input handling.
  • manifest: compile.go and sign.go are covered through cmd/scm and pkg/api/provider.go; loader.go and manifest.go parse fixed project-local files only.
  • marketplace: builder.go and installer.go are in the matrix; discovery.go and marketplace.go are read-only helpers around already-mapped local filesystem or in-memory index flows.
  • pkg/api: provider.go is in the matrix; embed.go contains no external input handling.
  • plugin: installer.go, loader.go, and registry.go are in the matrix; config.go, manifest.go, and plugin.go define data structures and validation helpers only.
  • repos: kbconfig.go and registry.go are in the matrix; gitstate.go and workconfig.go are config/struct helpers without new sink construction.