go-scm/docs/architecture.md
Snider e7db4b163f
Some checks failed
Security Scan / security (push) Failing after 8s
Test / test (push) Failing after 1m19s
refactor: update import path from go-config to core/config
Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-14 10:26:01 +00:00

24 KiB

title description
Architecture Internal design of go-scm -- key types, data flow, and subsystem interaction.

Architecture

go-scm is organised into five major subsystems, each with a clear responsibility boundary:

  1. Forge Clients (forge/, gitea/) -- API wrappers for Forgejo and Gitea
  2. Git Operations (git/) -- multi-repo status, push, pull
  3. AgentCI Pipeline (jobrunner/, agentci/) -- automated PR lifecycle for AI agents
  4. Data Collection (collect/) -- pluggable scrapers with rate limiting and state
  5. Workspace Management (repos/, manifest/, marketplace/, plugin/) -- multi-repo registry, manifests, extensibility

1. Forge Clients

Both forge/ (Forgejo) and gitea/ (Gitea) follow an identical pattern: a thin Client struct wrapping the upstream SDK client with config-based authentication and contextual error handling.

Client Lifecycle

NewFromConfig(flagURL, flagToken)
    |
    v
ResolveConfig()        <- config file -> env vars -> flags
    |
    v
New(url, token)        <- creates SDK client
    |
    v
Client{api, url, token}

Key Types -- forge package

// Client wraps the Forgejo SDK with config-based auth.
type Client struct {
    api   *forgejo.Client
    url   string
    token string
}

// PRMeta holds structural signals from a pull request.
type PRMeta struct {
    Number       int64
    Title        string
    State        string
    Author       string
    Branch       string
    BaseBranch   string
    Labels       []string
    Assignees    []string
    IsMerged     bool
    CreatedAt    time.Time
    UpdatedAt    time.Time
    CommentCount int
}

// ListIssuesOpts configures issue listing.
type ListIssuesOpts struct {
    State  string   // "open", "closed", "all"
    Labels []string
    Page   int
    Limit  int
}

Auth Resolution

Authentication follows a fixed priority order (lowest to highest):

  1. ~/.core/config.yaml keys forge.url and forge.token
  2. FORGE_URL and FORGE_TOKEN environment variables
  3. Flag overrides passed at call time
  4. Default URL http://localhost:4000 if nothing is configured

The gitea/ package mirrors this using GITEA_URL/GITEA_TOKEN and gitea.* config keys, with a default of https://gitea.snider.dev.

Available Operations

forge/

File Operations
client.go New, NewFromConfig, GetCurrentUser, ForkRepo, CreatePullRequest
repos.go ListOrgRepos, ListOrgReposIter, ListUserRepos, ListUserReposIter, GetRepo, CreateOrgRepo, DeleteRepo, MigrateRepo
issues.go ListIssues, GetIssue, CreateIssue, EditIssue, AssignIssue, ListPullRequests, ListPullRequestsIter, GetPullRequest, CreateIssueComment, ListIssueComments, CloseIssue
labels.go ListOrgLabels, ListRepoLabels, CreateRepoLabel, GetLabelByName, EnsureLabel, AddIssueLabels, RemoveIssueLabel
prs.go MergePullRequest, SetPRDraft, ListPRReviews, GetCombinedStatus, DismissReview
webhooks.go CreateRepoWebhook, ListRepoWebhooks
orgs.go ListMyOrgs, GetOrg, CreateOrg
meta.go GetPRMeta, GetCommentBodies, GetIssueBody

Pagination

All list methods handle pagination internally. Slice-returning methods exhaust all pages and return the full collection. Iterator-returning methods (suffixed Iter) yield items lazily via Go iter.Seq2:

// Collects everything into a slice
repos, err := client.ListOrgRepos("core")

// Lazy iteration -- stops early if the consumer breaks
for repo, err := range client.ListOrgReposIter("core") {
    if repo.Name == "go-scm" { break }
}

forge vs gitea

The two packages are structurally parallel but intentionally not unified behind an interface. They wrap different SDK libraries (forgejo-sdk/v2 vs gitea-sdk), and the Forgejo client has additional capabilities not present in the Gitea client:

  • Labels management (create, ensure, add, remove)
  • Organisation creation
  • Webhooks
  • PR merge, draft status, reviews, combined status, review dismissal
  • Repository migration (full import with issues/labels/PRs)

The Gitea client has a CreateMirror method for setting up pull mirrors from GitHub -- a capability specific to the public mirror workflow.

SDK limitation: The Forgejo SDK v2 does not accept context.Context on API methods. All SDK calls are synchronous. Context propagation through the wrapper layer is nominal -- contexts are accepted at the boundary but cannot be forwarded.


2. Git Operations

The git/ package provides two layers: stateless functions and a DI-integrated service.

Functions (Stateless)

// Parallel status check across many repos
statuses := git.Status(ctx, git.StatusOptions{Paths: paths, Names: names})

// Push/pull a single repo (interactive -- attaches to terminal for SSH prompts)
git.Push(ctx, "/path/to/repo")
git.Pull(ctx, "/path/to/repo")

// Sequential multi-push with iterator
for result := range git.PushMultipleIter(ctx, paths, names) {
    fmt.Println(result.Name, result.Success)
}

Status checks run in parallel via goroutines, one per repository. Each goroutine shells out to git status --porcelain and git rev-list --count via exec.CommandContext. Push and pull operations are sequential because SSH passphrase prompts require terminal interaction -- Stdin, Stdout, and Stderr are connected to the process terminal.

RepoStatus

type RepoStatus struct {
    Name      string
    Path      string
    Modified  int    // Working tree modifications
    Untracked int    // Untracked files
    Staged    int    // Index changes
    Ahead     int    // Commits ahead of upstream
    Behind    int    // Commits behind upstream
    Branch    string
    Error     error
}

func (s *RepoStatus) IsDirty() bool     // Modified > 0 || Untracked > 0 || Staged > 0
func (s *RepoStatus) HasUnpushed() bool  // Ahead > 0
func (s *RepoStatus) HasUnpulled() bool  // Behind > 0

GitError

type GitError struct {
    Err    error
    Stderr string
}

All git command errors wrap stderr output for diagnostics. The IsNonFastForward helper checks error text for common rejection patterns.

Service (DI-Integrated)

The Service struct integrates with the Core DI framework via ServiceRuntime[ServiceOptions]. On startup it registers query and task handlers:

Message Type Struct Behaviour
Query QueryStatus Runs parallel status check, caches result
Query QueryDirtyRepos Filters cached status for dirty repos
Query QueryAheadRepos Filters cached status for repos with unpushed commits
Task TaskPush Pushes a single repo
Task TaskPull Pulls a single repo
Task TaskPushMultiple Pushes multiple repos sequentially

3. AgentCI Pipeline

The AgentCI subsystem automates the lifecycle of AI-agent-generated pull requests. It follows a poll-dispatch-journal architecture.

Data Flow

[Forgejo API]
      |
      v
  ForgejoSource.Poll()      <- Finds epic issues, parses checklists, resolves linked PRs
      |
      v
  []PipelineSignal           <- One signal per unchecked child issue
      |
      v
  Poller.RunOnce()           <- For each signal, find first matching handler
      |
      v
  Handler.Execute()          <- Performs the action (merge, comment, dispatch, etc.)
      |
      v
  Journal.Append()           <- JSONL audit log, date-partitioned by repo
      |
      v
  Source.Report()            <- Posts result as comment on the epic issue

PipelineSignal

The central data carrier. It captures the structural state of a child issue and its linked PR at poll time:

type PipelineSignal struct {
    EpicNumber      int
    ChildNumber     int
    PRNumber        int
    RepoOwner       string
    RepoName        string
    PRState         string    // OPEN, MERGED, CLOSED
    IsDraft         bool
    Mergeable       string    // MERGEABLE, CONFLICTING, UNKNOWN
    CheckStatus     string    // SUCCESS, FAILURE, PENDING
    ThreadsTotal    int
    ThreadsResolved int
    LastCommitSHA   string
    LastCommitAt    time.Time
    LastReviewAt    time.Time
    NeedsCoding     bool      // true if no PR exists yet
    Assignee        string    // Forgejo username
    IssueTitle      string
    IssueBody       string
    Type            string    // e.g. "agent_completion"
    Success         bool
    Error           string
    Message         string
}

Epic Issue Structure

The ForgejoSource expects epic issues labelled epic with a Markdown checklist body:

- [ ] #42   <- unchecked = work needed
- [x] #43   <- checked = completed
- [ ] #44

Each unchecked child is polled. If the child has a linked PR (body references #42), a signal with PR metadata is emitted. If no PR exists but the issue is assigned to a known agent, a NeedsCoding signal is emitted instead.

Interfaces

type JobSource interface {
    Name() string
    Poll(ctx context.Context) ([]*PipelineSignal, error)
    Report(ctx context.Context, result *ActionResult) error
}

type JobHandler interface {
    Name() string
    Match(signal *PipelineSignal) bool
    Execute(ctx context.Context, signal *PipelineSignal) (*ActionResult, error)
}

Poller

The Poller runs a blocking poll-dispatch loop. On each tick it snapshots sources and handlers (under a mutex), calls each source's Poll, matches the first applicable handler per signal, executes it, appends to the journal, and calls Report on the source. Dry-run mode logs what would execute without running handlers.

poller := jobrunner.NewPoller(jobrunner.PollerConfig{
    Sources:      []jobrunner.JobSource{forgejoSrc},
    Handlers:     []jobrunner.JobHandler{dispatch, tickParent, autoMerge},
    Journal:      journal,
    PollInterval: 60 * time.Second,
})
poller.Run(ctx)  // blocks until ctx cancelled

Sources and handlers can be added dynamically via AddSource and AddHandler.

Handlers

Handlers are checked in registration order. The first match wins.

Handler Match Condition Action
DispatchHandler NeedsCoding=true, assignee is a known agent Build DispatchTicket JSON, transfer via SSH to agent queue, add in-progress label
CompletionHandler Type="agent_completion" Update labels (agent-completed or agent-failed), post status comment
PublishDraftHandler Draft PR, checks passing Remove draft status via raw HTTP PATCH
EnableAutoMergeHandler Open, mergeable, checks passing, no unresolved threads Squash-merge the PR
DismissReviewsHandler Open, has unresolved threads Dismiss stale "request changes" reviews
SendFixCommandHandler Open, conflicting or failing with unresolved threads Post comment asking for fixes
TickParentHandler PRState=MERGED Tick checkbox in epic body (- [ ] #N to - [x] #N), close child issue

Journal

Journal writes append-only JSONL files partitioned by date and repository:

{baseDir}/{owner}/{repo}/2026-03-11.jsonl

Each line is a JournalEntry with a signal snapshot (PR state, check status, mergeability) and a result snapshot (success, error, duration in milliseconds). Path components are validated against ^[a-zA-Z0-9][a-zA-Z0-9._-]*$ and resolved to absolute paths to prevent traversal. Writes are mutex-protected.

Clotho Protocol

The agentci.Spinner orchestrator determines whether a dispatch should use standard or dual-run verification mode.

Agent configuration lives in ~/.core/config.yaml:

agentci:
  clotho:
    strategy: clotho-verified   # or: direct
    validation_threshold: 0.85
  agents:
    charon:
      host: build-server.leth.in
      queue_dir: /home/claude/ai-work/queue
      forgejo_user: charon
      model: sonnet
      runner: claude
      verify_model: gemini-1.5-pro
      dual_run: false
      active: true

DeterminePlan decides between ModeStandard and ModeDual:

  1. If the global strategy is not clotho-verified, always standard.
  2. If the agent's dual_run flag is set, dual.
  3. If the repository name is core or contains security, dual (Axiom 1: critical repos always verified).
  4. Otherwise, standard.

In dual-run mode, DispatchHandler populates DispatchTicket.VerifyModel and DispatchTicket.DualRun=true. The Weave method compares primary and verifier outputs for convergence (currently byte-equal; semantic diff reserved for a future phase).

Dispatch Ticket Transfer

DispatchHandler.Execute()
  +-- SanitizePath(owner), SanitizePath(repo)
  +-- EnsureLabel(in-progress), check not already dispatched
  +-- AssignIssue, AddIssueLabels(in-progress), RemoveIssueLabel(agent-ready)
  +-- DeterminePlan(signal, agentName) -> runMode
  +-- Marshal DispatchTicket to JSON
  +-- ticketExists() via SSH (dedup check across queue/active/done)
  +-- secureTransfer(ticket JSON, mode 0644) via SSH stdin
  +-- secureTransfer(.env with FORGE_TOKEN, mode 0600) via SSH stdin
  +-- CreateIssueComment (dispatch confirmation)

The Forge token is transferred as a separate .env.$ticketID file with 0600 permissions, never embedded in the ticket JSON.

Security Functions

Function Purpose
SanitizePath(input) Returns filepath.Base(input) after validating against ^[a-zA-Z0-9\-\_\.]+$
EscapeShellArg(arg) Wraps in single quotes with internal quote escaping
SecureSSHCommand(host, cmd) SSH with StrictHostKeyChecking=yes, BatchMode=yes, ConnectTimeout=10
MaskToken(token) Returns first 4 + **** + last 4 characters

4. Data Collection

The collect/ package provides a pluggable framework for gathering data from external sources.

Collector Interface

type Collector interface {
    Name() string
    Collect(ctx context.Context, cfg *Config) (*Result, error)
}

Built-in Collectors

Collector Source Method Rate Limit
GitHubCollector GitHub issues and PRs gh CLI 500ms, auto-pauses at 75% API usage
BitcoinTalkCollector Forum topic pages HTTP scraping + HTML parse 2s
MarketCollector CoinGecko current + historical data HTTP JSON API 1.5s
PapersCollector IACR ePrint + arXiv HTTP (HTML scrape + Atom XML) 1s
Processor Local HTML/JSON/Markdown files Filesystem None

All collectors write Markdown output files, organised by source under the configured output directory:

{outputDir}/github/{org}/{repo}/issues/42.md
{outputDir}/bitcointalk/{topicID}/posts/1.md
{outputDir}/market/{coinID}/current.json
{outputDir}/market/{coinID}/summary.md
{outputDir}/papers/iacr/{id}.md
{outputDir}/papers/arxiv/{id}.md
{outputDir}/processed/{source}/{file}.md

Excavator

The Excavator orchestrates multiple collectors sequentially:

excavator := &collect.Excavator{
    Collectors: []collect.Collector{github, market, papers},
    Resume:     true,   // skip previously completed collectors
    ScanOnly:   false,  // true = report what would run without executing
}
result, err := excavator.Run(ctx, cfg)

Features:

  • Rate limit respect between API calls
  • Incremental state tracking (skip previously completed collectors on resume)
  • Context cancellation between collectors
  • Aggregated results via MergeResults

Config

type Config struct {
    Output     io.Medium      // Storage backend (filesystem abstraction)
    OutputDir  string         // Base directory for all output
    Limiter    *RateLimiter   // Per-source rate limits
    State      *State         // Incremental run tracking
    Dispatcher *Dispatcher    // Event dispatch for progress reporting
    Verbose    bool
    DryRun     bool           // Simulate without writing
}

Rate Limiting

The RateLimiter tracks per-source last-request timestamps. Wait(ctx, source) blocks for the configured delay minus elapsed time. The mutex is released during the wait to avoid holding it across a timer.

Default delays:

Source Delay
github 500ms
bitcointalk 2s
coingecko 1.5s
iacr 1s
arxiv 1s

The CheckGitHubRateLimitCtx method queries gh api rate_limit and automatically increases the GitHub delay to 5 seconds when usage exceeds 75%.

Events

The Dispatcher provides synchronous event dispatch with five event types:

Constant Meaning
EventStart Collector begins its run
EventProgress Incremental progress update
EventItem Single item collected
EventError Error during collection
EventComplete Collector finished

Register handlers with dispatcher.On(eventType, handler). Convenience methods EmitStart, EmitProgress, EmitItem, EmitError, EmitComplete are provided.

State Persistence

The State tracker serialises per-source progress to .collect-state.json via an io.Medium backend. Each StateEntry records:

  • Source name
  • Last run timestamp
  • Last item ID (opaque)
  • Total items collected
  • Pagination cursor (opaque)

Thread-safe via mutex. Returns copies from Get to prevent callers mutating internal state.


5. Workspace Management

repos.yaml Registry

The repos/ package reads a repos.yaml file defining a multi-repo workspace:

version: 1
org: core
base_path: ~/Code/core
defaults:
  ci: forgejo
  license: EUPL-1.2
  branch: main
repos:
  go-scm:
    type: module
    depends_on: [go-io, go-log, config]
    description: SCM integration
  go-ai:
    type: module
    depends_on: [go-ml, go-rag]

Repository types: foundation, module, product, template.

The Registry provides:

  • Lookups: List(), Get(name), ByType(t)
  • Dependency sorting: TopologicalOrder() -- returns repos in dependency order, detects cycles
  • Discovery: FindRegistry(medium) searches cwd, parent directories, and well-known home paths
  • Fallback: ScanDirectory(medium, dir) scans for .git directories when no repos.yaml exists

Each Repo struct has computed fields (Path, Name) and methods (Exists(), IsGitRepo()). The Clone field (pointer to bool) allows excluding repos from cloning operations (nil defaults to true).

WorkConfig and GitState

Workspace sync behaviour is split into two files:

File Scope Git-tracked?
.core/work.yaml Sync policy (intervals, auto-pull/push, agent heartbeats) Yes
.core/git.yaml Per-machine state (last pull/push times, agent presence) No (.gitignored)

WorkConfig controls:

type SyncConfig struct {
    Interval     time.Duration  // How often to sync
    AutoPull     bool           // Pull automatically
    AutoPush     bool           // Push automatically
    CloneMissing bool           // Clone repos not yet present
}

type AgentPolicy struct {
    Heartbeat     time.Duration  // How often agents check in
    StaleAfter    time.Duration  // When to consider an agent stale
    WarnOnOverlap bool           // Warn if multiple agents touch same repo
}

GitState tracks:

  • Per-repo: last pull/push timestamps, branch, remote, ahead/behind counts
  • Per-agent: last seen timestamp, list of active repos
  • Methods: TouchPull, TouchPush, UpdateRepo, Heartbeat, StaleAgents, ActiveAgentsFor, NeedsPull

KBConfig

The .core/kb.yaml file configures a knowledge base layer:

type KBConfig struct {
    Wiki   WikiConfig  // Local wiki mirroring from Forgejo
    Search KBSearch    // Vector search via Qdrant + Ollama embeddings
}

The WikiRepoURL and WikiLocalPath methods compute clone URLs and local paths for wiki repos.

Manifest

The manifest/ package handles .core/manifest.yaml files describing application modules:

code: my-module
name: My Module
version: 1.0.0
permissions:
  read: ["/data"]
  write: ["/output"]
  net: ["api.example.com"]
  run: ["./worker"]
modules: [dep-a, dep-b]
daemons:
  worker:
    binary: ./worker
    args: ["--port", "8080"]
    health: http://localhost:8080/health
    default: true

Key operations:

Function Purpose
Parse(data) Decode YAML bytes into a Manifest
Load(medium, root) Read .core/manifest.yaml from a directory
LoadVerified(medium, root, pubKey) Load and verify ed25519 signature
Sign(manifest, privKey) Compute ed25519 signature, store as base64 in Sign field
Verify(manifest, pubKey) Check the Sign field against the public key
SlotNames() Deduplicated component names from the slots map
DefaultDaemon() Resolve the default daemon (explicit Default: true or sole daemon)

Signing works by zeroing the Sign field, marshalling to YAML, and computing ed25519.Sign over the canonical bytes. The base64-encoded signature is stored back in Sign.

Marketplace

The marketplace/ package provides a module catalogue and installer:

// Catalogue
index, _ := marketplace.ParseIndex(jsonData)
results := index.Search("analytics")
byCategory := index.ByCategory("monitoring")
mod, found := index.Find("my-module")

// Installation
installer := marketplace.NewInstaller("/path/to/modules", store)
installer.Install(ctx, mod)     // Clone, verify manifest, register
installer.Update(ctx, "code")   // Pull, re-verify, update metadata
installer.Remove("code")        // Delete files and store entry
installed, _ := installer.Installed()  // List all installed modules

The installer:

  1. Clones the module repo with --depth=1
  2. Loads the manifest via a sandboxed io.Medium
  3. If a SignKey is present on the catalogue entry, verifies the ed25519 signature
  4. Registers metadata (code, name, version, permissions, entry point) in a store.Store
  5. Cleans up the cloned directory on any failure after clone

Plugin System

The plugin/ package provides a CLI extension mechanism:

type Plugin interface {
    Name() string
    Version() string
    Init(ctx context.Context) error
    Start(ctx context.Context) error
    Stop(ctx context.Context) error
}

BasePlugin provides a default (no-op) implementation for embedding.

Components:

Type Purpose
Manifest plugin.json with name, version, description, author, entrypoint, dependencies
Registry JSON-backed store of installed plugins (registry.json)
Loader Discovers plugins by scanning directories for plugin.json
Installer Clones from GitHub via gh, validates manifest, registers

Source format: org/repo or org/repo@v1.0. The ParseSource function splits these into organisation, repository, and version components.


Dependency Graph

                                     forge.lthn.ai/core/go  (DI, log, config, io)
                                           |
                    +----------------------+----------------------+
                    |                      |                      |
                forge/                  gitea/                  git/
                    |                      |                      |
            +-------+-------+              |                      |
            |               |              |                      |
        agentci/        jobrunner/         |                      |
            |           |       |          |                      |
            |   forgejo/source  |          |                      |
            |           |       |          |                      |
            +-----------+-------+          |                      |
                    |                      |                      |
              handlers/                    |                      |
                                           |                      |
                collect/  -----------------+                      |
                                                                  |
                repos/  ------------------------------------------+
                manifest/
                marketplace/ (depends on manifest/, io/)
                plugin/ (depends on io/)

External SDK dependencies:

  • codeberg.org/mvdkleijn/forgejo-sdk/forgejo/v2 -- Forgejo API
  • code.gitea.io/sdk/gitea -- Gitea API
  • github.com/stretchr/testify -- test assertions
  • golang.org/x/net -- HTML parsing
  • gopkg.in/yaml.v3 -- YAML parsing