[scan] Security attack vector mapping #8

New issue

Open

opened 2026-03-23 12:54:01 +00:00 by Virgil · 1 comment

Virgil commented

2026-03-23 12:54:01 +00:00

Member

Map every external input entry point: function, file:line, input source, flows into, validation, attack vector.

Implementation Plan (Spark)

[scan] Security attack vector mapping — implementation plan

Use this issue body as the execution plan for the scan.

1) Files to scan

Scope includes all source and test files in the repository, plus CLI/CI entry files that can change execution behaviour:

cmd/security/cmd_security.go
cmd/security/cmd_scan.go
cmd/security/cmd_jobs.go
cmd/security/cmd_alerts.go
cmd/security/cmd_deps.go
cmd/security/cmd_secrets.go
cmd/security/cmd.go
cmd/metrics/cmd.go
cmd/metrics/cmd_test.go
cmd/rag/cmd.go
cmd/lab/cmd_lab.go
cmd/embed-bench/main.go
ai/ai.go
ai/metrics.go
ai/rag.go
ai/metrics_test.go
ai/metrics_bench_test.go
.forgejo/workflows/security-scan.yml

2) What to inspect per file

For each file, map:

external input sources (CLI flags, environment-driven config, file paths, HTTP endpoints, command arguments, fixed data from external services)
validation gaps (length checks, enum checks, format checks, allow/deny lists)
injection vectors (command execution, network URL/path injection, log/format injection, header/body poisoning)
race conditions (shared state, non-locking read/write, mutable globals, shared clients)
data-flow paths from input to sensitive sink (process execution, file writes, HTTP calls, output rendering)

3) Output format for every finding

Add one row per finding in this order:

file:line	input source	flows into	validation status	attack vector
`cmd/security/cmd_security.go:16`	shared globals (`securityTarget`, `securityRepo`, `securitySeverity`, `securityRegistryPath`, `securityJSON`)	all security subcommands	Observed (via function-specific flag checks only)	global mutable state can leak between runs if commands are reused in tests/tools
`cmd/security/cmd_scan.go:27`	`--registry`, `--repo`, `--severity`, `--tool`, `--target`, `--json` flags	`loadRegistry`, filtering, API fetches, terminal output	Partial (basic parsing, no enum/regex constraints for repo/target/severity)	invalid owner/repo can poison endpoint construction and output context
`cmd/security/cmd_alerts.go:21`	same flags as above for `alerts`	alerts fetchers and table output	Partial	same as scan path; secret/code/dependabot alert processing
`cmd/security/cmd_deps.go:21`	same flags as above for `deps`	Dependabot fetcher and upgrade summary output	Partial	package/version strings can influence downstream output formatting
`cmd/security/cmd_secrets.go:21`	same flags as above for `secrets`	secret alert fetcher and CLI output	Partial	secret metadata and state strings from API are rendered unsanitised
`cmd/security/cmd_jobs.go:22`	`--targets`, `--issue-repo`, `--dry-run`, `--copies`	issue creation (`gh issue create`) and metric writes	Partial (`target` format check only)	command arg injection through crafted issue-repo and owner/repo labels; large `targets` payload can exhaust command output/report size
`cmd/security/cmd_security.go:130`	`gh` binary path lookup + external command execution	`runGHAPI` runs `gh api` with `endpoint` arg	Partial (no command output sanitisation)	endpoint path includes user-controlled repo name; command execution boundary should be threat-modeled
`cmd/security/cmd_security.go:108`	`securityRegistryPath` string flag	`repos.LoadRegistry(io.Local, registryPath)` and fallback discovery	Partial	path traversal/symlink and arbitrary file read concerns in registry loading path
`cmd/security/cmd_security.go:194`	`target` string (owner/repo) split validation	all `run*ForTarget` entrypoints and API endpoint builders	Weak (`owner/repo` only split, no char/class checks)	malformed targets can influence external tool input (`gh api`, issue labels) and logs
`cmd/security/cmd_security.go:300`	`repoFullName` in GitHub endpoint builders	fetch functions `fetchDependabotAlerts`/`fetchCodeScanningAlerts`/`fetchSecretScanningAlerts`	Weak	endpoint query parameters are hardcoded but path segment is untrusted input
`cmd/lab/cmd_lab.go:42`	`--bind` CLI flag	`http.Server{Addr: cfg.Addr}` and route handlers	Partial (no address validation)	bind to unexpected interfaces/ports, local-privilege boundary exposure if cmd used in shared hosts
`cmd/lab/cmd_lab.go:24`	`cfg` from runtime config (`cfg.ForgeURL`, tokens, intervals, etc.)	handler wiring, external collectors, HTTP polling intervals	Unknown (external package implementation)	trust boundary extends through external collectors and tokens (inspect downstream integrations separately)
`cmd/embed-bench/main.go:28`	`--ollama` URL flag	HTTP client + URL concatenation for `/api/embeddings`, `/api/tags`	Weak (no scheme/host allowlist, no host lock)	SSRF/LAN pivot + plaintext credentials exposure if non-TLS endpoint is provided
`cmd/embed-bench/main.go:223`	TLS config on shared HTTP client (`InsecureSkipVerify`)	all outbound requests	Weak	MITM risk and cert validation bypass for embedding calls
`cmd/embed-bench/main.go:238`	text payload from `queries`/memory arrays	JSON POST body + decoding response	None for memory strings; constant data	low immediate risk in this file; still inspect response decode and size handling
`cmd/metrics/cmd.go:31`	`--since` string	`parseDuration` -> `time.Now().Add(-since)`	Partial	unsupported token/large unit handling and duration overflow edge cases should be confirmed
`cmd/metrics/cmd.go:105`	`--since` value format	`parseDuration` numeric parsing and unit switch	Good for basic format, no bounds	integer overflow and negative/zero edge path tested but should re-check for very large inputs
`ai/metrics.go:46`	`metricsSince` runtime-derived file path/time window	`Record`/`ReadEvents` file writes and reads	Weak	write lock only for `Record`; concurrent `ReadEvents` may observe partially written lines without lock
`ai/metrics.go:87`	path date iteration from `time.Time` input	`readMetricsFile` scanner loop	Good (time-bounded)	file lock semantics and scanner token limits should be reviewed under high-volume or malformed JSONL
`ai/rag.go:22`	`TaskInfo.Title` and `TaskInfo.Description`	concatenated query -> RAG clients (`rag.Query`)	No input validation in this package	prompt/injection style input to model/RAG service; verify client side escaping expectations
`ai/metrics_test.go` / `ai/metrics_bench_test.go` / `cmd/metrics/cmd_test.go`	test data and temp env vars	all public metric APIs	N/A	ensure test helpers cannot influence production code paths through build tags or shared env assumptions

4) Where to write the report

Write the completed mapping results to:
- SECURITY_ATTACK_VECTOR_MAPPING_REPORT.md
Keep rows in the schema above.
Include an executive summary first, then:
- High
- Medium
- Low
- No issue

Execution steps (for Codex agent)

Start at cmd/security/cmd_security.go, then expand into each security subcommand file and shared helpers.
Validate every exec.Command call boundary (gh, lab config flow, issue creation) and enumerate argument construction.
Verify path/registry loading and target parsing for format and traversal/command/endpoint poisoning.
Inspect outbound network boundaries (embed-bench, ai/rag, lab collectors via cfg) for trust boundaries and hardening.
Review concurrency and shared state: metricsMu, shared CLI variables, and test overrides.
Populate the report table for every confirmed external input point and include evidence references (file:line + snippet context).
Confirm output format and severity in final issue body, then submit as a follow-up comment/reference to this issue.

Map every external input entry point: function, file:line, input source, flows into, validation, attack vector. --- ## Implementation Plan (Spark) # [scan] Security attack vector mapping — implementation plan Use this issue body as the execution plan for the scan. ## 1) Files to scan Scope includes all source and test files in the repository, plus CLI/CI entry files that can change execution behaviour: - `cmd/security/cmd_security.go` - `cmd/security/cmd_scan.go` - `cmd/security/cmd_jobs.go` - `cmd/security/cmd_alerts.go` - `cmd/security/cmd_deps.go` - `cmd/security/cmd_secrets.go` - `cmd/security/cmd.go` - `cmd/metrics/cmd.go` - `cmd/metrics/cmd_test.go` - `cmd/rag/cmd.go` - `cmd/lab/cmd_lab.go` - `cmd/embed-bench/main.go` - `ai/ai.go` - `ai/metrics.go` - `ai/rag.go` - `ai/metrics_test.go` - `ai/metrics_bench_test.go` - `.forgejo/workflows/security-scan.yml` ## 2) What to inspect per file For each file, map: - external input sources (CLI flags, environment-driven config, file paths, HTTP endpoints, command arguments, fixed data from external services) - validation gaps (length checks, enum checks, format checks, allow/deny lists) - injection vectors (command execution, network URL/path injection, log/format injection, header/body poisoning) - race conditions (shared state, non-locking read/write, mutable globals, shared clients) - data-flow paths from input to sensitive sink (process execution, file writes, HTTP calls, output rendering) ## 3) Output format for every finding Add one row per finding in this order: | file:line | input source | flows into | validation status | attack vector | |---|---|---|---|---| | `cmd/security/cmd_security.go:16` | shared globals (`securityTarget`, `securityRepo`, `securitySeverity`, `securityRegistryPath`, `securityJSON`) | all security subcommands | **Observed** (via function-specific flag checks only) | global mutable state can leak between runs if commands are reused in tests/tools | | `cmd/security/cmd_scan.go:27` | `--registry`, `--repo`, `--severity`, `--tool`, `--target`, `--json` flags | `loadRegistry`, filtering, API fetches, terminal output | **Partial** (basic parsing, no enum/regex constraints for repo/target/severity) | invalid owner/repo can poison endpoint construction and output context | | `cmd/security/cmd_alerts.go:21` | same flags as above for `alerts` | alerts fetchers and table output | **Partial** | same as scan path; secret/code/dependabot alert processing | | `cmd/security/cmd_deps.go:21` | same flags as above for `deps` | Dependabot fetcher and upgrade summary output | **Partial** | package/version strings can influence downstream output formatting | | `cmd/security/cmd_secrets.go:21` | same flags as above for `secrets` | secret alert fetcher and CLI output | **Partial** | secret metadata and state strings from API are rendered unsanitised | | `cmd/security/cmd_jobs.go:22` | `--targets`, `--issue-repo`, `--dry-run`, `--copies` | issue creation (`gh issue create`) and metric writes | **Partial** (`target` format check only) | command arg injection through crafted issue-repo and owner/repo labels; large `targets` payload can exhaust command output/report size | | `cmd/security/cmd_security.go:130` | `gh` binary path lookup + external command execution | `runGHAPI` runs `gh api` with `endpoint` arg | **Partial** (no command output sanitisation) | endpoint path includes user-controlled repo name; command execution boundary should be threat-modeled | | `cmd/security/cmd_security.go:108` | `securityRegistryPath` string flag | `repos.LoadRegistry(io.Local, registryPath)` and fallback discovery | **Partial** | path traversal/symlink and arbitrary file read concerns in registry loading path | | `cmd/security/cmd_security.go:194` | `target` string (owner/repo) split validation | all `run*ForTarget` entrypoints and API endpoint builders | **Weak** (`owner/repo` only split, no char/class checks) | malformed targets can influence external tool input (`gh api`, issue labels) and logs | | `cmd/security/cmd_security.go:300` | `repoFullName` in GitHub endpoint builders | fetch functions `fetchDependabotAlerts`/`fetchCodeScanningAlerts`/`fetchSecretScanningAlerts` | **Weak** | endpoint query parameters are hardcoded but path segment is untrusted input | | `cmd/lab/cmd_lab.go:42` | `--bind` CLI flag | `http.Server{Addr: cfg.Addr}` and route handlers | **Partial** (no address validation) | bind to unexpected interfaces/ports, local-privilege boundary exposure if cmd used in shared hosts | | `cmd/lab/cmd_lab.go:24` | `cfg` from runtime config (`cfg.ForgeURL`, tokens, intervals, etc.) | handler wiring, external collectors, HTTP polling intervals | **Unknown** (external package implementation) | trust boundary extends through external collectors and tokens (inspect downstream integrations separately) | | `cmd/embed-bench/main.go:28` | `--ollama` URL flag | HTTP client + URL concatenation for `/api/embeddings`, `/api/tags` | **Weak** (no scheme/host allowlist, no host lock) | SSRF/LAN pivot + plaintext credentials exposure if non-TLS endpoint is provided | | `cmd/embed-bench/main.go:223` | TLS config on shared HTTP client (`InsecureSkipVerify`) | all outbound requests | **Weak** | MITM risk and cert validation bypass for embedding calls | | `cmd/embed-bench/main.go:238` | text payload from `queries`/memory arrays | JSON POST body + decoding response | **None** for memory strings; constant data | low immediate risk in this file; still inspect response decode and size handling | | `cmd/metrics/cmd.go:31` | `--since` string | `parseDuration` -> `time.Now().Add(-since)` | **Partial** | unsupported token/large unit handling and duration overflow edge cases should be confirmed | | `cmd/metrics/cmd.go:105` | `--since` value format | `parseDuration` numeric parsing and unit switch | **Good for basic format**, no bounds | integer overflow and negative/zero edge path tested but should re-check for very large inputs | | `ai/metrics.go:46` | `metricsSince` runtime-derived file path/time window | `Record`/`ReadEvents` file writes and reads | **Weak** | write lock only for `Record`; concurrent `ReadEvents` may observe partially written lines without lock | | `ai/metrics.go:87` | path date iteration from `time.Time` input | `readMetricsFile` scanner loop | **Good** (time-bounded) | file lock semantics and scanner token limits should be reviewed under high-volume or malformed JSONL | | `ai/rag.go:22` | `TaskInfo.Title` and `TaskInfo.Description` | concatenated query -> RAG clients (`rag.Query`) | **No input validation in this package** | prompt/injection style input to model/RAG service; verify client side escaping expectations | | `ai/metrics_test.go` / `ai/metrics_bench_test.go` / `cmd/metrics/cmd_test.go` | test data and temp env vars | all public metric APIs | **N/A** | ensure test helpers cannot influence production code paths through build tags or shared env assumptions | ## 4) Where to write the report - Write the completed mapping results to: - `SECURITY_ATTACK_VECTOR_MAPPING_REPORT.md` - Keep rows in the schema above. - Include an executive summary first, then: - **High** - **Medium** - **Low** - **No issue** ## Execution steps (for Codex agent) 1. Start at `cmd/security/cmd_security.go`, then expand into each security subcommand file and shared helpers. 2. Validate every `exec.Command` call boundary (`gh`, lab config flow, issue creation) and enumerate argument construction. 3. Verify path/registry loading and `target` parsing for format and traversal/command/endpoint poisoning. 4. Inspect outbound network boundaries (`embed-bench`, `ai/rag`, lab collectors via `cfg`) for trust boundaries and hardening. 5. Review concurrency and shared state: `metricsMu`, shared CLI variables, and test overrides. 6. Populate the report table for every confirmed external input point and include evidence references (`file:line` + snippet context). 7. Confirm output format and severity in final issue body, then submit as a follow-up comment/reference to this issue.

Virgil commented

2026-03-23 13:03:33 +00:00

Author

Member