go-scm/docs/architecture.md
Snider 4558504499 docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:02:22 +00:00

16 KiB

go-scm Architecture

Module path: forge.lthn.ai/core/go-scm

go-scm provides SCM integration, CI dispatch automation, and data collection for the Lethean ecosystem. It is composed of six packages, each with a distinct responsibility, and approximately 9,000 lines of Go across roughly 70 source files.


Package Overview

forge.lthn.ai/core/go-scm
├── forge/        Forgejo API client (repos, issues, PRs, labels, webhooks, orgs)
├── gitea/        Gitea API client (repos, issues, meta) for public mirror
├── git/          Multi-repo git operations (status, commit, push, pull)
├── agentci/      Clotho Protocol orchestrator — agent config and security
├── jobrunner/    PR automation pipeline (poll → dispatch → journal)
│   ├── forgejo/  Forgejo signal source (epic issue parsing)
│   └── handlers/ Pipeline action handlers
└── collect/      Data collection (BitcoinTalk, GitHub, market, papers, events)

SCM Abstraction Layer

forge/ — Forgejo API Client

forge/ wraps the Forgejo SDK (codeberg.org/mvdkleijn/forgejo-sdk/forgejo/v2) with config-based authentication and contextual error wrapping. It provides thin, typed wrappers for every API surface used by the Lethean platform.

Client construction:

// Config-resolved client (preferred)
client, err := forge.NewFromConfig(flagURL, flagToken)

// Direct construction
client, err := forge.New(url, token)

Auth resolution follows a fixed priority order:

  1. ~/.core/config.yaml keys forge.url and forge.token (lowest priority)
  2. FORGE_URL and FORGE_TOKEN environment variables
  3. Flag overrides passed at call time (highest priority)
  4. Default URL http://localhost:4000 if nothing is configured

Available operations:

File Operations
repos.go CreateRepo, ListRepos, CreateMirrorRepo, CreateOrgRepo
issues.go GetIssue, CreateIssue, ListIssues, CreateIssueComment, AssignIssue, CloseIssue, EditIssue
prs.go CreatePullRequest, ListPullRequests, MergePullRequest, SetPRDraft, GetCombinedStatus
labels.go CreateLabel, GetLabelByName, EnsureLabel, AddIssueLabels, RemoveIssueLabel
webhooks.go CreateWebhook, ListWebhooks, DeleteWebhook
orgs.go CreateOrg, ListOrgs, ListOrgRepos
meta.go GetVersion

SDK limitation: The Forgejo SDK v2 does not accept context.Context on any API method. All SDK calls are synchronous and blocking. Context propagation through forge/ and gitea/ wrappers is therefore nominal — contexts are accepted at the wrapper boundary but cannot be passed to the SDK. This will be resolved when the SDK adds context support.

gitea/ — Gitea API Client

gitea/ mirrors the structure of forge/ but wraps the Gitea SDK (code.gitea.io/sdk/gitea) for the public mirror instance at git.lthn.ai. The two clients are intentionally structurally identical — same pattern of client.go, config.go, repos.go, issues.go, meta.go — to reduce cognitive load when working across both.

Auth resolution follows the same priority order as forge/, using GITEA_URL/GITEA_TOKEN environment variables and gitea.url/gitea.token config keys. The default URL is https://gitea.snider.dev.

Infrastructure split:

  • forge.lthn.ai — production Forgejo instance, source of truth, full IP/research data
  • git.lthn.ai — public Gitea mirror with sensitive data stripped, breach-safe

git/ — Multi-Repo Git Operations

git/ provides context-aware git operations across multiple repositories. Unlike the API clients, all operations in this package propagate context.Context correctly via exec.CommandContext.

Core types:

type RepoStatus struct {
    Name, Path, Branch string
    Modified, Untracked, Staged int  // working tree counts
    Ahead, Behind               int  // commits vs upstream
    Error                       error
}

func (s *RepoStatus) IsDirty() bool    { ... }
func (s *RepoStatus) HasUnpushed() bool { ... }
func (s *RepoStatus) HasUnpulled() bool { ... }

Parallel status across repos:

statuses := git.Status(ctx, git.StatusOptions{
    Paths: []string{"/path/to/repo-a", "/path/to/repo-b"},
    Names: map[string]string{"/path/to/repo-a": "repo-a"},
})

Status checks run in parallel via goroutines. Push and pull operations are sequential because SSH passphrase prompts require terminal interaction.

Service integration: git.Service embeds framework.ServiceRuntime and registers query/task handlers on the core framework's message bus. Queries (QueryStatus, QueryDirtyRepos, QueryAheadRepos) return from a cached lastStatus field. Tasks (TaskPush, TaskPull, TaskPushMultiple) execute immediately.


AgentCI Dispatch Pipeline

Overview

The AgentCI pipeline automates the lifecycle of issues assigned to AI agents: detecting unstarted work, dispatching tickets to agent machines, monitoring PR state, and updating the parent epic on merge.

Forgejo instance
      │
      │ poll (epic issues, child PRs, combined status)
      ▼
ForgejoSource.Poll()
      │
      │ []PipelineSignal
      ▼
Poller.RunOnce()
      │
      │ Match(signal) → first matching handler
      ├─► DispatchHandler     — NeedsCoding=true, known agent assignee
      ├─► TickParentHandler   — PRState=MERGED
      ├─► EnableAutoMerge     — checks passing, mergeable
      ├─► PublishDraft        — draft PR ready
      ├─► SendFixCommand      — checks failing
      └─► CompletionHandler   — agent completion signal
      │
      │ ActionResult
      ▼
Journal.Append()   — JSONL audit trail
ForgejoSource.Report()  — comment on epic issue

jobrunner/ — Poller and Interfaces

jobrunner/ defines the interfaces and orchestration loop shared by all pipeline participants.

Interfaces:

type JobSource interface {
    Name() string
    Poll(ctx context.Context) ([]*PipelineSignal, error)
    Report(ctx context.Context, result *ActionResult) error
}

type JobHandler interface {
    Name() string
    Match(signal *PipelineSignal) bool
    Execute(ctx context.Context, signal *PipelineSignal) (*ActionResult, error)
}

PipelineSignal carries the full structural snapshot of a child issue/PR at the moment of polling:

type PipelineSignal struct {
    EpicNumber, ChildNumber, PRNumber int
    RepoOwner, RepoName               string
    PRState                           string  // OPEN, MERGED, CLOSED
    IsDraft                           bool
    Mergeable                         string  // MERGEABLE, CONFLICTING, UNKNOWN
    CheckStatus                       string  // SUCCESS, FAILURE, PENDING
    ThreadsTotal, ThreadsResolved     int
    NeedsCoding                       bool    // true if no PR exists yet
    Assignee                          string  // Forgejo username
    IssueTitle, IssueBody             string  // for dispatch prompt
}

Poller runs a blocking poll-dispatch loop. On each tick it snapshots sources and handlers (under a read lock), calls each source's Poll, matches the first applicable handler per signal, executes it, appends to the journal, and calls Report on the source. Dry-run mode logs what would execute without running handlers.

poller := jobrunner.NewPoller(jobrunner.PollerConfig{
    Sources:      []jobrunner.JobSource{forgejoSrc},
    Handlers:     []jobrunner.JobHandler{dispatch, tickParent, autoMerge},
    Journal:      journal,
    PollInterval: 60 * time.Second,
})
poller.Run(ctx)  // blocks until ctx cancelled

jobrunner/forgejo/ — Signal Source

ForgejoSource polls a list of repositories for epic issues (labelled epic, state open). For each epic, it parses the issue body for unchecked task list items (- [ ] #N), then for each unchecked child either:

  • Builds a PipelineSignal with PR state, draft status, check status, and thread counts (if a linked PR exists), or
  • Builds a NeedsCoding=true signal carrying the child issue title and body (if no PR exists and the issue has an assignee)

Combined commit status is fetched per head SHA via forge.GetCombinedStatus.

jobrunner/handlers/ — Action Handlers

Handler Match condition Action
DispatchHandler NeedsCoding=true, assignee is a known agent Build DispatchTicket JSON, transfer via SSH, post comment
TickParentHandler PRState=MERGED Tick checkbox in epic body, close child issue
EnableAutoMergeHandler CheckStatus=SUCCESS, Mergeable=MERGEABLE, not draft Enable auto-merge on PR
PublishDraftHandler Is draft, threads resolved Publish draft PR
SendFixCommandHandler CheckStatus=FAILURE Post fix command comment to agent
CompletionHandler Type=agent_completion Record agent completion result

agentci/ — Clotho Protocol

agentci/ manages agent configuration and the Clotho Protocol for dual-run verification.

Agent configuration is loaded from ~/.core/config.yaml under the agentci.agents key:

agentci:
  clotho:
    strategy: clotho-verified   # or: direct
    validation_threshold: 0.85
  agents:
    charon:
      host: build-server.leth.in
      queue_dir: /home/claude/ai-work/queue
      forgejo_user: charon
      model: sonnet
      runner: claude             # claude, codex, or gemini
      verify_model: gemini-1.5-pro
      dual_run: false
      active: true

Spinner is the Clotho orchestrator. Its DeterminePlan method decides between standard and dual run modes:

  1. If the global strategy is not clotho-verified, always standard.
  2. If the agent's dual_run flag is set, dual.
  3. If the repository name is core or contains security, dual (Axiom 1: critical repos always verified).
  4. Otherwise, standard.

In dual-run mode, DispatchHandler populates DispatchTicket.VerifyModel and DispatchTicket.DualRun=true. The agent runner is responsible for executing both the primary and verifier models and calling Spinner.Weave to compare outputs. Weave currently performs a byte-equal comparison; semantic diff logic is reserved for a future phase.

Security functions in agentci/security.go:

  • SanitizePath(input string) — returns filepath.Base(input) after validating against ^[a-zA-Z0-9\-\_\.]+$. Protects against path traversal by stripping directory components rather than rejecting the input.
  • EscapeShellArg(arg string) — wraps a string in single quotes with internal single-quote escaping, for safe insertion into SSH remote commands.
  • SecureSSHCommandContext(ctx, host, cmd string) — constructs an exec.Cmd with StrictHostKeyChecking=yes, BatchMode=yes, and ConnectTimeout=10.
  • MaskToken(token string) — returns a masked version safe for logging.

Dispatch ticket transfer:

DispatchHandler.Execute()
  ├── SanitizePath(owner), SanitizePath(repo)
  ├── EnsureLabel(in-progress)
  ├── Check issue not already in-progress or completed
  ├── AssignIssue, AddIssueLabels
  ├── DeterminePlan(signal, agentName) → runMode
  ├── Marshal DispatchTicket to JSON
  ├── ticketExists() via SSH (dedup check)
  ├── secureTransfer(ticket JSON, 0644)  ← cat > path via SSH stdin
  ├── secureTransfer(.env with FORGE_TOKEN, 0600)
  └── CreateIssueComment (dispatch confirmation)

The Forge token is written as a separate .env.$ticketID file with 0600 permissions rather than embedded in the ticket JSON, to avoid the token appearing in queue directory listings.

Journal

jobrunner.Journal writes append-only JSONL files partitioned by date and repository:

{baseDir}/{owner}/{repo}/2026-02-20.jsonl

Each line is a JournalEntry with a signal snapshot (PR state at time of action) and a result snapshot (success, error, duration). Path components are validated against a strict regex and resolved to absolute paths to prevent traversal. Writes are mutex-protected for concurrent safety.

Replay filtering (via journal_replay_test.go patterns, not yet a public API): entries can be filtered by action name, repo full name, and time range by scanning the JSONL file.


Data Collection

collect/ — Collection Pipeline

collect/ provides a pluggable pipeline for gathering data from external sources.

Collector interface:

type Collector interface {
    Name() string
    Collect(ctx context.Context, cfg *Config) (*Result, error)
}

Available collectors:

File Source Rate limit
bitcointalk.go BitcoinTalk forum (HTTP scraping) 2 s per request
github.go GitHub API via gh CLI 500 ms, pauses at 75% usage
market.go CoinGecko market data 1.5 s per request
papers.go IACR and arXiv research papers 1 s per request
events.go Event tracking

Excavator orchestrates sequential execution of multiple collectors with state-based resume support:

exc := &collect.Excavator{
    Collectors: []collect.Collector{githubCollector, marketCollector},
    Resume:     true,
}
result, err := exc.Run(ctx, cfg)

If Resume=true, collectors that already have a non-zero item count in the persisted state file are skipped. Context cancellation is checked between collectors.

Rate limiter tracks per-source last-request timestamps. Wait(ctx, source) blocks for the configured delay minus elapsed time, then releases. The mutex is released during the wait to avoid holding it across a timer. GitHub rate limiting queries the gh api rate_limit endpoint and automatically increases the GitHub delay to 5 s when usage exceeds 75%.

State persists collection progress to a JSON file via an io.Medium abstraction, enabling incremental runs. Each StateEntry stores the last run timestamp, item count, and an opaque cursor for pagination resumption.

Process pipeline (process.go) handles post-collection transformation. The Dispatcher in events.go emits typed events (start, progress, error, complete) during collection runs.


Dependency Graph

collect/  ─────────────────────────────────────────────┐
                                                        │
git/      ──────────────────────────────────────────┐  │
                                                    │  │
gitea/    ────────────────────────────────────┐     │  │
                                              │     │  │
forge/    ────────────────────────────┐       │     │  │
                                      │       │     │  │
agentci/  ──────────────────────────┐ │       │     │  │
                                    │ │       │     │  │
jobrunner/          ────────────────┘ │       │     │  │
jobrunner/forgejo/  ──────────────────┘       │     │  │
jobrunner/handlers/ ──────────────────────────┘     │  │
                                                    │  │
forge.lthn.ai/core/go  (framework, log, config) ───┴──┘

External SDK dependencies:

  • codeberg.org/mvdkleijn/forgejo-sdk/forgejo/v2 — Forgejo API
  • code.gitea.io/sdk/gitea — Gitea API
  • github.com/stretchr/testify — test assertions
  • golang.org/x/net — HTTP utilities