From b263a9ea7d833c8648d4975814a18e073063fbb6 Mon Sep 17 00:00:00 2001 From: Snider Date: Wed, 11 Mar 2026 13:02:40 +0000 Subject: [PATCH] docs: add human-friendly documentation Co-Authored-By: Claude Opus 4.6 --- docs/architecture.md | 784 +++++++++++++++++++++++++++++++------------ docs/development.md | 238 +++++++------ docs/index.md | 150 +++++++++ 3 files changed, 858 insertions(+), 314 deletions(-) create mode 100644 docs/index.md diff --git a/docs/architecture.md b/docs/architecture.md index 4b8ac81..074b2a7 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,145 +1,270 @@ -# go-scm Architecture +--- +title: Architecture +description: Internal design of go-scm -- key types, data flow, and subsystem interaction. +--- -Module path: `forge.lthn.ai/core/go-scm` +# Architecture -go-scm provides SCM integration, CI dispatch automation, and data collection for the Lethean ecosystem. It is composed of six packages, each with a distinct responsibility, and approximately 9,000 lines of Go across roughly 70 source files. +go-scm is organised into five major subsystems, each with a clear responsibility boundary: + +1. **Forge Clients** (`forge/`, `gitea/`) -- API wrappers for Forgejo and Gitea +2. **Git Operations** (`git/`) -- multi-repo status, push, pull +3. **AgentCI Pipeline** (`jobrunner/`, `agentci/`) -- automated PR lifecycle for AI agents +4. **Data Collection** (`collect/`) -- pluggable scrapers with rate limiting and state +5. **Workspace Management** (`repos/`, `manifest/`, `marketplace/`, `plugin/`) -- multi-repo registry, manifests, extensibility --- -## Package Overview +## 1. Forge Clients + +Both `forge/` (Forgejo) and `gitea/` (Gitea) follow an identical pattern: a thin `Client` struct wrapping the upstream SDK client with config-based authentication and contextual error handling. + +### Client Lifecycle ``` -forge.lthn.ai/core/go-scm -├── forge/ Forgejo API client (repos, issues, PRs, labels, webhooks, orgs) -├── gitea/ Gitea API client (repos, issues, meta) for public mirror -├── git/ Multi-repo git operations (status, commit, push, pull) -├── agentci/ Clotho Protocol orchestrator — agent config and security -├── jobrunner/ PR automation pipeline (poll → dispatch → journal) -│ ├── forgejo/ Forgejo signal source (epic issue parsing) -│ └── handlers/ Pipeline action handlers -└── collect/ Data collection (BitcoinTalk, GitHub, market, papers, events) +NewFromConfig(flagURL, flagToken) + | + v +ResolveConfig() <- config file -> env vars -> flags + | + v +New(url, token) <- creates SDK client + | + v +Client{api, url, token} ``` ---- - -## SCM Abstraction Layer - -### forge/ — Forgejo API Client - -`forge/` wraps the Forgejo SDK (`codeberg.org/mvdkleijn/forgejo-sdk/forgejo/v2`) with config-based authentication and contextual error wrapping. It provides thin, typed wrappers for every API surface used by the Lethean platform. - -**Client construction:** +### Key Types -- forge package ```go -// Config-resolved client (preferred) -client, err := forge.NewFromConfig(flagURL, flagToken) +// Client wraps the Forgejo SDK with config-based auth. +type Client struct { + api *forgejo.Client + url string + token string +} -// Direct construction -client, err := forge.New(url, token) +// PRMeta holds structural signals from a pull request. +type PRMeta struct { + Number int64 + Title string + State string + Author string + Branch string + BaseBranch string + Labels []string + Assignees []string + IsMerged bool + CreatedAt time.Time + UpdatedAt time.Time + CommentCount int +} + +// ListIssuesOpts configures issue listing. +type ListIssuesOpts struct { + State string // "open", "closed", "all" + Labels []string + Page int + Limit int +} ``` -**Auth resolution** follows a fixed priority order: +### Auth Resolution -1. `~/.core/config.yaml` keys `forge.url` and `forge.token` (lowest priority) +Authentication follows a fixed priority order (lowest to highest): + +1. `~/.core/config.yaml` keys `forge.url` and `forge.token` 2. `FORGE_URL` and `FORGE_TOKEN` environment variables -3. Flag overrides passed at call time (highest priority) +3. Flag overrides passed at call time 4. Default URL `http://localhost:4000` if nothing is configured -**Available operations:** +The `gitea/` package mirrors this using `GITEA_URL`/`GITEA_TOKEN` and `gitea.*` config keys, with a default of `https://gitea.snider.dev`. + +### Available Operations + +**forge/** | File | Operations | |------|-----------| -| `repos.go` | `CreateRepo`, `ListRepos`, `CreateMirrorRepo`, `CreateOrgRepo` | -| `issues.go` | `GetIssue`, `CreateIssue`, `ListIssues`, `CreateIssueComment`, `AssignIssue`, `CloseIssue`, `EditIssue` | -| `prs.go` | `CreatePullRequest`, `ListPullRequests`, `MergePullRequest`, `SetPRDraft`, `GetCombinedStatus` | -| `labels.go` | `CreateLabel`, `GetLabelByName`, `EnsureLabel`, `AddIssueLabels`, `RemoveIssueLabel` | -| `webhooks.go` | `CreateWebhook`, `ListWebhooks`, `DeleteWebhook` | -| `orgs.go` | `CreateOrg`, `ListOrgs`, `ListOrgRepos` | -| `meta.go` | `GetVersion` | +| `client.go` | `New`, `NewFromConfig`, `GetCurrentUser`, `ForkRepo`, `CreatePullRequest` | +| `repos.go` | `ListOrgRepos`, `ListOrgReposIter`, `ListUserRepos`, `ListUserReposIter`, `GetRepo`, `CreateOrgRepo`, `DeleteRepo`, `MigrateRepo` | +| `issues.go` | `ListIssues`, `GetIssue`, `CreateIssue`, `EditIssue`, `AssignIssue`, `ListPullRequests`, `ListPullRequestsIter`, `GetPullRequest`, `CreateIssueComment`, `ListIssueComments`, `CloseIssue` | +| `labels.go` | `ListOrgLabels`, `ListRepoLabels`, `CreateRepoLabel`, `GetLabelByName`, `EnsureLabel`, `AddIssueLabels`, `RemoveIssueLabel` | +| `prs.go` | `MergePullRequest`, `SetPRDraft`, `ListPRReviews`, `GetCombinedStatus`, `DismissReview` | +| `webhooks.go` | `CreateRepoWebhook`, `ListRepoWebhooks` | +| `orgs.go` | `ListMyOrgs`, `GetOrg`, `CreateOrg` | +| `meta.go` | `GetPRMeta`, `GetCommentBodies`, `GetIssueBody` | -**SDK limitation:** The Forgejo SDK v2 does not accept `context.Context` on any API method. All SDK calls are synchronous and blocking. Context propagation through forge/ and gitea/ wrappers is therefore nominal — contexts are accepted at the wrapper boundary but cannot be passed to the SDK. This will be resolved when the SDK adds context support. +### Pagination -### gitea/ — Gitea API Client - -`gitea/` mirrors the structure of `forge/` but wraps the Gitea SDK (`code.gitea.io/sdk/gitea`) for the public mirror instance at `git.lthn.ai`. The two clients are intentionally structurally identical — same pattern of `client.go`, `config.go`, `repos.go`, `issues.go`, `meta.go` — to reduce cognitive load when working across both. - -**Auth resolution** follows the same priority order as forge/, using `GITEA_URL`/`GITEA_TOKEN` environment variables and `gitea.url`/`gitea.token` config keys. The default URL is `https://gitea.snider.dev`. - -**Infrastructure split:** - -- `forge.lthn.ai` — production Forgejo instance, source of truth, full IP/research data -- `git.lthn.ai` — public Gitea mirror with sensitive data stripped, breach-safe - -### git/ — Multi-Repo Git Operations - -`git/` provides context-aware git operations across multiple repositories. Unlike the API clients, all operations in this package propagate `context.Context` correctly via `exec.CommandContext`. - -**Core types:** +All list methods handle pagination internally. Slice-returning methods exhaust all pages and return the full collection. Iterator-returning methods (suffixed `Iter`) yield items lazily via Go `iter.Seq2`: ```go -type RepoStatus struct { - Name, Path, Branch string - Modified, Untracked, Staged int // working tree counts - Ahead, Behind int // commits vs upstream - Error error +// Collects everything into a slice +repos, err := client.ListOrgRepos("core") + +// Lazy iteration -- stops early if the consumer breaks +for repo, err := range client.ListOrgReposIter("core") { + if repo.Name == "go-scm" { break } } - -func (s *RepoStatus) IsDirty() bool { ... } -func (s *RepoStatus) HasUnpushed() bool { ... } -func (s *RepoStatus) HasUnpulled() bool { ... } ``` -**Parallel status across repos:** +### forge vs gitea -```go -statuses := git.Status(ctx, git.StatusOptions{ - Paths: []string{"/path/to/repo-a", "/path/to/repo-b"}, - Names: map[string]string{"/path/to/repo-a": "repo-a"}, -}) -``` +The two packages are structurally parallel but intentionally not unified behind an interface. They wrap different SDK libraries (`forgejo-sdk/v2` vs `gitea-sdk`), and the Forgejo client has additional capabilities not present in the Gitea client: -Status checks run in parallel via goroutines. Push and pull operations are sequential because SSH passphrase prompts require terminal interaction. +- Labels management (create, ensure, add, remove) +- Organisation creation +- Webhooks +- PR merge, draft status, reviews, combined status, review dismissal +- Repository migration (full import with issues/labels/PRs) -**Service integration:** `git.Service` embeds `framework.ServiceRuntime` and registers query/task handlers on the core framework's message bus. Queries (`QueryStatus`, `QueryDirtyRepos`, `QueryAheadRepos`) return from a cached `lastStatus` field. Tasks (`TaskPush`, `TaskPull`, `TaskPushMultiple`) execute immediately. +The Gitea client has a `CreateMirror` method for setting up pull mirrors from GitHub -- a capability specific to the public mirror workflow. + +**SDK limitation:** The Forgejo SDK v2 does not accept `context.Context` on API methods. All SDK calls are synchronous. Context propagation through the wrapper layer is nominal -- contexts are accepted at the boundary but cannot be forwarded. --- -## AgentCI Dispatch Pipeline +## 2. Git Operations -### Overview +The `git/` package provides two layers: stateless functions and a DI-integrated service. -The AgentCI pipeline automates the lifecycle of issues assigned to AI agents: detecting unstarted work, dispatching tickets to agent machines, monitoring PR state, and updating the parent epic on merge. +### Functions (Stateless) -``` -Forgejo instance - │ - │ poll (epic issues, child PRs, combined status) - ▼ -ForgejoSource.Poll() - │ - │ []PipelineSignal - ▼ -Poller.RunOnce() - │ - │ Match(signal) → first matching handler - ├─► DispatchHandler — NeedsCoding=true, known agent assignee - ├─► TickParentHandler — PRState=MERGED - ├─► EnableAutoMerge — checks passing, mergeable - ├─► PublishDraft — draft PR ready - ├─► SendFixCommand — checks failing - └─► CompletionHandler — agent completion signal - │ - │ ActionResult - ▼ -Journal.Append() — JSONL audit trail -ForgejoSource.Report() — comment on epic issue +```go +// Parallel status check across many repos +statuses := git.Status(ctx, git.StatusOptions{Paths: paths, Names: names}) + +// Push/pull a single repo (interactive -- attaches to terminal for SSH prompts) +git.Push(ctx, "/path/to/repo") +git.Pull(ctx, "/path/to/repo") + +// Sequential multi-push with iterator +for result := range git.PushMultipleIter(ctx, paths, names) { + fmt.Println(result.Name, result.Success) +} ``` -### jobrunner/ — Poller and Interfaces +Status checks run in parallel via goroutines, one per repository. Each goroutine shells out to `git status --porcelain` and `git rev-list --count` via `exec.CommandContext`. Push and pull operations are sequential because SSH passphrase prompts require terminal interaction -- `Stdin`, `Stdout`, and `Stderr` are connected to the process terminal. -`jobrunner/` defines the interfaces and orchestration loop shared by all pipeline participants. +### RepoStatus -**Interfaces:** +```go +type RepoStatus struct { + Name string + Path string + Modified int // Working tree modifications + Untracked int // Untracked files + Staged int // Index changes + Ahead int // Commits ahead of upstream + Behind int // Commits behind upstream + Branch string + Error error +} + +func (s *RepoStatus) IsDirty() bool // Modified > 0 || Untracked > 0 || Staged > 0 +func (s *RepoStatus) HasUnpushed() bool // Ahead > 0 +func (s *RepoStatus) HasUnpulled() bool // Behind > 0 +``` + +### GitError + +```go +type GitError struct { + Err error + Stderr string +} +``` + +All git command errors wrap stderr output for diagnostics. The `IsNonFastForward` helper checks error text for common rejection patterns. + +### Service (DI-Integrated) + +The `Service` struct integrates with the Core DI framework via `ServiceRuntime[ServiceOptions]`. On startup it registers query and task handlers: + +| Message Type | Struct | Behaviour | +|-------------|--------|-----------| +| Query | `QueryStatus` | Runs parallel status check, caches result | +| Query | `QueryDirtyRepos` | Filters cached status for dirty repos | +| Query | `QueryAheadRepos` | Filters cached status for repos with unpushed commits | +| Task | `TaskPush` | Pushes a single repo | +| Task | `TaskPull` | Pulls a single repo | +| Task | `TaskPushMultiple` | Pushes multiple repos sequentially | + +--- + +## 3. AgentCI Pipeline + +The AgentCI subsystem automates the lifecycle of AI-agent-generated pull requests. It follows a poll-dispatch-journal architecture. + +### Data Flow + +``` +[Forgejo API] + | + v + ForgejoSource.Poll() <- Finds epic issues, parses checklists, resolves linked PRs + | + v + []PipelineSignal <- One signal per unchecked child issue + | + v + Poller.RunOnce() <- For each signal, find first matching handler + | + v + Handler.Execute() <- Performs the action (merge, comment, dispatch, etc.) + | + v + Journal.Append() <- JSONL audit log, date-partitioned by repo + | + v + Source.Report() <- Posts result as comment on the epic issue +``` + +### PipelineSignal + +The central data carrier. It captures the structural state of a child issue and its linked PR at poll time: + +```go +type PipelineSignal struct { + EpicNumber int + ChildNumber int + PRNumber int + RepoOwner string + RepoName string + PRState string // OPEN, MERGED, CLOSED + IsDraft bool + Mergeable string // MERGEABLE, CONFLICTING, UNKNOWN + CheckStatus string // SUCCESS, FAILURE, PENDING + ThreadsTotal int + ThreadsResolved int + LastCommitSHA string + LastCommitAt time.Time + LastReviewAt time.Time + NeedsCoding bool // true if no PR exists yet + Assignee string // Forgejo username + IssueTitle string + IssueBody string + Type string // e.g. "agent_completion" + Success bool + Error string + Message string +} +``` + +### Epic Issue Structure + +The `ForgejoSource` expects epic issues labelled `epic` with a Markdown checklist body: + +```markdown +- [ ] #42 <- unchecked = work needed +- [x] #43 <- checked = completed +- [ ] #44 +``` + +Each unchecked child is polled. If the child has a linked PR (body references `#42`), a signal with PR metadata is emitted. If no PR exists but the issue is assigned to a known agent, a `NeedsCoding` signal is emitted instead. + +### Interfaces ```go type JobSource interface { @@ -155,24 +280,9 @@ type JobHandler interface { } ``` -**PipelineSignal** carries the full structural snapshot of a child issue/PR at the moment of polling: +### Poller -```go -type PipelineSignal struct { - EpicNumber, ChildNumber, PRNumber int - RepoOwner, RepoName string - PRState string // OPEN, MERGED, CLOSED - IsDraft bool - Mergeable string // MERGEABLE, CONFLICTING, UNKNOWN - CheckStatus string // SUCCESS, FAILURE, PENDING - ThreadsTotal, ThreadsResolved int - NeedsCoding bool // true if no PR exists yet - Assignee string // Forgejo username - IssueTitle, IssueBody string // for dispatch prompt -} -``` - -**Poller** runs a blocking poll-dispatch loop. On each tick it snapshots sources and handlers (under a read lock), calls each source's `Poll`, matches the first applicable handler per signal, executes it, appends to the journal, and calls `Report` on the source. Dry-run mode logs what would execute without running handlers. +The `Poller` runs a blocking poll-dispatch loop. On each tick it snapshots sources and handlers (under a mutex), calls each source's `Poll`, matches the first applicable handler per signal, executes it, appends to the journal, and calls `Report` on the source. Dry-run mode logs what would execute without running handlers. ```go poller := jobrunner.NewPoller(jobrunner.PollerConfig{ @@ -184,31 +294,37 @@ poller := jobrunner.NewPoller(jobrunner.PollerConfig{ poller.Run(ctx) // blocks until ctx cancelled ``` -### jobrunner/forgejo/ — Signal Source +Sources and handlers can be added dynamically via `AddSource` and `AddHandler`. -`ForgejoSource` polls a list of repositories for epic issues (labelled `epic`, state `open`). For each epic, it parses the issue body for unchecked task list items (`- [ ] #N`), then for each unchecked child either: +### Handlers -- Builds a `PipelineSignal` with PR state, draft status, check status, and thread counts (if a linked PR exists), or -- Builds a `NeedsCoding=true` signal carrying the child issue title and body (if no PR exists and the issue has an assignee) +Handlers are checked in registration order. The first match wins. -Combined commit status is fetched per head SHA via `forge.GetCombinedStatus`. - -### jobrunner/handlers/ — Action Handlers - -| Handler | Match condition | Action | +| Handler | Match Condition | Action | |---------|----------------|--------| -| `DispatchHandler` | `NeedsCoding=true`, assignee is a known agent | Build `DispatchTicket` JSON, transfer via SSH, post comment | -| `TickParentHandler` | `PRState=MERGED` | Tick checkbox in epic body, close child issue | -| `EnableAutoMergeHandler` | `CheckStatus=SUCCESS`, `Mergeable=MERGEABLE`, not draft | Enable auto-merge on PR | -| `PublishDraftHandler` | Is draft, threads resolved | Publish draft PR | -| `SendFixCommandHandler` | `CheckStatus=FAILURE` | Post fix command comment to agent | -| `CompletionHandler` | `Type=agent_completion` | Record agent completion result | +| `DispatchHandler` | `NeedsCoding=true`, assignee is a known agent | Build `DispatchTicket` JSON, transfer via SSH to agent queue, add `in-progress` label | +| `CompletionHandler` | `Type="agent_completion"` | Update labels (`agent-completed` or `agent-failed`), post status comment | +| `PublishDraftHandler` | Draft PR, checks passing | Remove draft status via raw HTTP PATCH | +| `EnableAutoMergeHandler` | Open, mergeable, checks passing, no unresolved threads | Squash-merge the PR | +| `DismissReviewsHandler` | Open, has unresolved threads | Dismiss stale "request changes" reviews | +| `SendFixCommandHandler` | Open, conflicting or failing with unresolved threads | Post comment asking for fixes | +| `TickParentHandler` | `PRState=MERGED` | Tick checkbox in epic body (`- [ ] #N` to `- [x] #N`), close child issue | -### agentci/ — Clotho Protocol +### Journal -`agentci/` manages agent configuration and the Clotho Protocol for dual-run verification. +`Journal` writes append-only JSONL files partitioned by date and repository: -**Agent configuration** is loaded from `~/.core/config.yaml` under the `agentci.agents` key: +``` +{baseDir}/{owner}/{repo}/2026-03-11.jsonl +``` + +Each line is a `JournalEntry` with a signal snapshot (PR state, check status, mergeability) and a result snapshot (success, error, duration in milliseconds). Path components are validated against `^[a-zA-Z0-9][a-zA-Z0-9._-]*$` and resolved to absolute paths to prevent traversal. Writes are mutex-protected. + +### Clotho Protocol + +The `agentci.Spinner` orchestrator determines whether a dispatch should use standard or dual-run verification mode. + +**Agent configuration** lives in `~/.core/config.yaml`: ```yaml agentci: @@ -221,67 +337,54 @@ agentci: queue_dir: /home/claude/ai-work/queue forgejo_user: charon model: sonnet - runner: claude # claude, codex, or gemini + runner: claude verify_model: gemini-1.5-pro dual_run: false active: true ``` -**Spinner** is the Clotho orchestrator. Its `DeterminePlan` method decides between `standard` and `dual` run modes: +`DeterminePlan` decides between `ModeStandard` and `ModeDual`: -1. If the global strategy is not `clotho-verified`, always `standard`. -2. If the agent's `dual_run` flag is set, `dual`. -3. If the repository name is `core` or contains `security`, `dual` (Axiom 1: critical repos always verified). -4. Otherwise, `standard`. +1. If the global strategy is not `clotho-verified`, always standard. +2. If the agent's `dual_run` flag is set, dual. +3. If the repository name is `core` or contains `security`, dual (Axiom 1: critical repos always verified). +4. Otherwise, standard. -In dual-run mode, `DispatchHandler` populates `DispatchTicket.VerifyModel` and `DispatchTicket.DualRun=true`. The agent runner is responsible for executing both the primary and verifier models and calling `Spinner.Weave` to compare outputs. `Weave` currently performs a byte-equal comparison; semantic diff logic is reserved for a future phase. +In dual-run mode, `DispatchHandler` populates `DispatchTicket.VerifyModel` and `DispatchTicket.DualRun=true`. The `Weave` method compares primary and verifier outputs for convergence (currently byte-equal; semantic diff reserved for a future phase). -**Security functions** in `agentci/security.go`: - -- `SanitizePath(input string)` — returns `filepath.Base(input)` after validating against `^[a-zA-Z0-9\-\_\.]+$`. Protects against path traversal by stripping directory components rather than rejecting the input. -- `EscapeShellArg(arg string)` — wraps a string in single quotes with internal single-quote escaping, for safe insertion into SSH remote commands. -- `SecureSSHCommandContext(ctx, host, cmd string)` — constructs an `exec.Cmd` with `StrictHostKeyChecking=yes`, `BatchMode=yes`, and `ConnectTimeout=10`. -- `MaskToken(token string)` — returns a masked version safe for logging. - -**Dispatch ticket transfer:** +### Dispatch Ticket Transfer ``` DispatchHandler.Execute() - ├── SanitizePath(owner), SanitizePath(repo) - ├── EnsureLabel(in-progress) - ├── Check issue not already in-progress or completed - ├── AssignIssue, AddIssueLabels - ├── DeterminePlan(signal, agentName) → runMode - ├── Marshal DispatchTicket to JSON - ├── ticketExists() via SSH (dedup check) - ├── secureTransfer(ticket JSON, 0644) ← cat > path via SSH stdin - ├── secureTransfer(.env with FORGE_TOKEN, 0600) - └── CreateIssueComment (dispatch confirmation) + +-- SanitizePath(owner), SanitizePath(repo) + +-- EnsureLabel(in-progress), check not already dispatched + +-- AssignIssue, AddIssueLabels(in-progress), RemoveIssueLabel(agent-ready) + +-- DeterminePlan(signal, agentName) -> runMode + +-- Marshal DispatchTicket to JSON + +-- ticketExists() via SSH (dedup check across queue/active/done) + +-- secureTransfer(ticket JSON, mode 0644) via SSH stdin + +-- secureTransfer(.env with FORGE_TOKEN, mode 0600) via SSH stdin + +-- CreateIssueComment (dispatch confirmation) ``` -The Forge token is written as a separate `.env.$ticketID` file with `0600` permissions rather than embedded in the ticket JSON, to avoid the token appearing in queue directory listings. +The Forge token is transferred as a separate `.env.$ticketID` file with `0600` permissions, never embedded in the ticket JSON. -### Journal +### Security Functions -`jobrunner.Journal` writes append-only JSONL files partitioned by date and repository: - -``` -{baseDir}/{owner}/{repo}/2026-02-20.jsonl -``` - -Each line is a `JournalEntry` with a signal snapshot (PR state at time of action) and a result snapshot (success, error, duration). Path components are validated against a strict regex and resolved to absolute paths to prevent traversal. Writes are mutex-protected for concurrent safety. - -**Replay filtering** (via `journal_replay_test.go` patterns, not yet a public API): entries can be filtered by action name, repo full name, and time range by scanning the JSONL file. +| Function | Purpose | +|----------|---------| +| `SanitizePath(input)` | Returns `filepath.Base(input)` after validating against `^[a-zA-Z0-9\-\_\.]+$` | +| `EscapeShellArg(arg)` | Wraps in single quotes with internal quote escaping | +| `SecureSSHCommand(host, cmd)` | SSH with `StrictHostKeyChecking=yes`, `BatchMode=yes`, `ConnectTimeout=10` | +| `MaskToken(token)` | Returns first 4 + `****` + last 4 characters | --- -## Data Collection +## 4. Data Collection -### collect/ — Collection Pipeline +The `collect/` package provides a pluggable framework for gathering data from external sources. -`collect/` provides a pluggable pipeline for gathering data from external sources. - -**Collector interface:** +### Collector Interface ```go type Collector interface { @@ -290,58 +393,309 @@ type Collector interface { } ``` -**Available collectors:** +### Built-in Collectors -| File | Source | Rate limit | -|------|--------|-----------| -| `bitcointalk.go` | BitcoinTalk forum (HTTP scraping) | 2 s per request | -| `github.go` | GitHub API via `gh` CLI | 500 ms, pauses at 75% usage | -| `market.go` | CoinGecko market data | 1.5 s per request | -| `papers.go` | IACR and arXiv research papers | 1 s per request | -| `events.go` | Event tracking | — | +| Collector | Source | Method | Rate Limit | +|-----------|--------|--------|-----------| +| `GitHubCollector` | GitHub issues and PRs | `gh` CLI | 500ms, auto-pauses at 75% API usage | +| `BitcoinTalkCollector` | Forum topic pages | HTTP scraping + HTML parse | 2s | +| `MarketCollector` | CoinGecko current + historical data | HTTP JSON API | 1.5s | +| `PapersCollector` | IACR ePrint + arXiv | HTTP (HTML scrape + Atom XML) | 1s | +| `Processor` | Local HTML/JSON/Markdown files | Filesystem | None | -**Excavator** orchestrates sequential execution of multiple collectors with state-based resume support: +All collectors write Markdown output files, organised by source under the configured output directory: -```go -exc := &collect.Excavator{ - Collectors: []collect.Collector{githubCollector, marketCollector}, - Resume: true, -} -result, err := exc.Run(ctx, cfg) +``` +{outputDir}/github/{org}/{repo}/issues/42.md +{outputDir}/bitcointalk/{topicID}/posts/1.md +{outputDir}/market/{coinID}/current.json +{outputDir}/market/{coinID}/summary.md +{outputDir}/papers/iacr/{id}.md +{outputDir}/papers/arxiv/{id}.md +{outputDir}/processed/{source}/{file}.md ``` -If `Resume=true`, collectors that already have a non-zero item count in the persisted state file are skipped. Context cancellation is checked between collectors. +### Excavator -**Rate limiter** tracks per-source last-request timestamps. `Wait(ctx, source)` blocks for the configured delay minus elapsed time, then releases. The mutex is released during the wait to avoid holding it across a timer. GitHub rate limiting queries the `gh api rate_limit` endpoint and automatically increases the GitHub delay to 5 s when usage exceeds 75%. +The `Excavator` orchestrates multiple collectors sequentially: -**State** persists collection progress to a JSON file via an `io.Medium` abstraction, enabling incremental runs. Each `StateEntry` stores the last run timestamp, item count, and an opaque cursor for pagination resumption. +```go +excavator := &collect.Excavator{ + Collectors: []collect.Collector{github, market, papers}, + Resume: true, // skip previously completed collectors + ScanOnly: false, // true = report what would run without executing +} +result, err := excavator.Run(ctx, cfg) +``` -**Process pipeline** (`process.go`) handles post-collection transformation. The `Dispatcher` in `events.go` emits typed events (`start`, `progress`, `error`, `complete`) during collection runs. +Features: + +- Rate limit respect between API calls +- Incremental state tracking (skip previously completed collectors on resume) +- Context cancellation between collectors +- Aggregated results via `MergeResults` + +### Config + +```go +type Config struct { + Output io.Medium // Storage backend (filesystem abstraction) + OutputDir string // Base directory for all output + Limiter *RateLimiter // Per-source rate limits + State *State // Incremental run tracking + Dispatcher *Dispatcher // Event dispatch for progress reporting + Verbose bool + DryRun bool // Simulate without writing +} +``` + +### Rate Limiting + +The `RateLimiter` tracks per-source last-request timestamps. `Wait(ctx, source)` blocks for the configured delay minus elapsed time. The mutex is released during the wait to avoid holding it across a timer. + +Default delays: + +| Source | Delay | +|--------|-------| +| `github` | 500ms | +| `bitcointalk` | 2s | +| `coingecko` | 1.5s | +| `iacr` | 1s | +| `arxiv` | 1s | + +The `CheckGitHubRateLimitCtx` method queries `gh api rate_limit` and automatically increases the GitHub delay to 5 seconds when usage exceeds 75%. + +### Events + +The `Dispatcher` provides synchronous event dispatch with five event types: + +| Constant | Meaning | +|----------|---------| +| `EventStart` | Collector begins its run | +| `EventProgress` | Incremental progress update | +| `EventItem` | Single item collected | +| `EventError` | Error during collection | +| `EventComplete` | Collector finished | + +Register handlers with `dispatcher.On(eventType, handler)`. Convenience methods `EmitStart`, `EmitProgress`, `EmitItem`, `EmitError`, `EmitComplete` are provided. + +### State Persistence + +The `State` tracker serialises per-source progress to `.collect-state.json` via an `io.Medium` backend. Each `StateEntry` records: + +- Source name +- Last run timestamp +- Last item ID (opaque) +- Total items collected +- Pagination cursor (opaque) + +Thread-safe via mutex. Returns copies from `Get` to prevent callers mutating internal state. + +--- + +## 5. Workspace Management + +### repos.yaml Registry + +The `repos/` package reads a `repos.yaml` file defining a multi-repo workspace: + +```yaml +version: 1 +org: core +base_path: ~/Code/core +defaults: + ci: forgejo + license: EUPL-1.2 + branch: main +repos: + go-scm: + type: module + depends_on: [go-io, go-log, go-config] + description: SCM integration + go-ai: + type: module + depends_on: [go-ml, go-rag] +``` + +**Repository types:** `foundation`, `module`, `product`, `template`. + +The `Registry` provides: + +- **Lookups:** `List()`, `Get(name)`, `ByType(t)` +- **Dependency sorting:** `TopologicalOrder()` -- returns repos in dependency order, detects cycles +- **Discovery:** `FindRegistry(medium)` searches cwd, parent directories, and well-known home paths +- **Fallback:** `ScanDirectory(medium, dir)` scans for `.git` directories when no `repos.yaml` exists + +Each `Repo` struct has computed fields (`Path`, `Name`) and methods (`Exists()`, `IsGitRepo()`). The `Clone` field (pointer to bool) allows excluding repos from cloning operations (nil defaults to true). + +### WorkConfig and GitState + +Workspace sync behaviour is split into two files: + +| File | Scope | Git-tracked? | +|------|-------|-------------| +| `.core/work.yaml` | Sync policy (intervals, auto-pull/push, agent heartbeats) | Yes | +| `.core/git.yaml` | Per-machine state (last pull/push times, agent presence) | No (.gitignored) | + +**WorkConfig** controls: + +```go +type SyncConfig struct { + Interval time.Duration // How often to sync + AutoPull bool // Pull automatically + AutoPush bool // Push automatically + CloneMissing bool // Clone repos not yet present +} + +type AgentPolicy struct { + Heartbeat time.Duration // How often agents check in + StaleAfter time.Duration // When to consider an agent stale + WarnOnOverlap bool // Warn if multiple agents touch same repo +} +``` + +**GitState** tracks: + +- Per-repo: last pull/push timestamps, branch, remote, ahead/behind counts +- Per-agent: last seen timestamp, list of active repos +- Methods: `TouchPull`, `TouchPush`, `UpdateRepo`, `Heartbeat`, `StaleAgents`, `ActiveAgentsFor`, `NeedsPull` + +### KBConfig + +The `.core/kb.yaml` file configures a knowledge base layer: + +```go +type KBConfig struct { + Wiki WikiConfig // Local wiki mirroring from Forgejo + Search KBSearch // Vector search via Qdrant + Ollama embeddings +} +``` + +The `WikiRepoURL` and `WikiLocalPath` methods compute clone URLs and local paths for wiki repos. + +### Manifest + +The `manifest/` package handles `.core/manifest.yaml` files describing application modules: + +```yaml +code: my-module +name: My Module +version: 1.0.0 +permissions: + read: ["/data"] + write: ["/output"] + net: ["api.example.com"] + run: ["./worker"] +modules: [dep-a, dep-b] +daemons: + worker: + binary: ./worker + args: ["--port", "8080"] + health: http://localhost:8080/health + default: true +``` + +**Key operations:** + +| Function | Purpose | +|----------|---------| +| `Parse(data)` | Decode YAML bytes into a `Manifest` | +| `Load(medium, root)` | Read `.core/manifest.yaml` from a directory | +| `LoadVerified(medium, root, pubKey)` | Load and verify ed25519 signature | +| `Sign(manifest, privKey)` | Compute ed25519 signature, store as base64 in `Sign` field | +| `Verify(manifest, pubKey)` | Check the `Sign` field against the public key | +| `SlotNames()` | Deduplicated component names from the slots map | +| `DefaultDaemon()` | Resolve the default daemon (explicit `Default: true` or sole daemon) | + +Signing works by zeroing the `Sign` field, marshalling to YAML, and computing `ed25519.Sign` over the canonical bytes. The base64-encoded signature is stored back in `Sign`. + +### Marketplace + +The `marketplace/` package provides a module catalogue and installer: + +```go +// Catalogue +index, _ := marketplace.ParseIndex(jsonData) +results := index.Search("analytics") +byCategory := index.ByCategory("monitoring") +mod, found := index.Find("my-module") + +// Installation +installer := marketplace.NewInstaller("/path/to/modules", store) +installer.Install(ctx, mod) // Clone, verify manifest, register +installer.Update(ctx, "code") // Pull, re-verify, update metadata +installer.Remove("code") // Delete files and store entry +installed, _ := installer.Installed() // List all installed modules +``` + +The installer: + +1. Clones the module repo with `--depth=1` +2. Loads the manifest via a sandboxed `io.Medium` +3. If a `SignKey` is present on the catalogue entry, verifies the ed25519 signature +4. Registers metadata (code, name, version, permissions, entry point) in a `store.Store` +5. Cleans up the cloned directory on any failure after clone + +### Plugin System + +The `plugin/` package provides a CLI extension mechanism: + +```go +type Plugin interface { + Name() string + Version() string + Init(ctx context.Context) error + Start(ctx context.Context) error + Stop(ctx context.Context) error +} +``` + +`BasePlugin` provides a default (no-op) implementation for embedding. + +**Components:** + +| Type | Purpose | +|------|---------| +| `Manifest` | `plugin.json` with name, version, description, author, entrypoint, dependencies | +| `Registry` | JSON-backed store of installed plugins (`registry.json`) | +| `Loader` | Discovers plugins by scanning directories for `plugin.json` | +| `Installer` | Clones from GitHub via `gh`, validates manifest, registers | + +Source format: `org/repo` or `org/repo@v1.0`. The `ParseSource` function splits these into organisation, repository, and version components. --- ## Dependency Graph ``` -collect/ ─────────────────────────────────────────────┐ - │ -git/ ──────────────────────────────────────────┐ │ - │ │ -gitea/ ────────────────────────────────────┐ │ │ - │ │ │ -forge/ ────────────────────────────┐ │ │ │ - │ │ │ │ -agentci/ ──────────────────────────┐ │ │ │ │ - │ │ │ │ │ -jobrunner/ ────────────────┘ │ │ │ │ -jobrunner/forgejo/ ──────────────────┘ │ │ │ -jobrunner/handlers/ ──────────────────────────┘ │ │ - │ │ -forge.lthn.ai/core/go (framework, log, config) ───┴──┘ + forge.lthn.ai/core/go (DI, log, config, io) + | + +----------------------+----------------------+ + | | | + forge/ gitea/ git/ + | | | + +-------+-------+ | | + | | | | + agentci/ jobrunner/ | | + | | | | | + | forgejo/source | | | + | | | | | + +-----------+-------+ | | + | | | + handlers/ | | + | | + collect/ -----------------+ | + | + repos/ ------------------------------------------+ + manifest/ + marketplace/ (depends on manifest/, io/) + plugin/ (depends on io/) ``` External SDK dependencies: -- `codeberg.org/mvdkleijn/forgejo-sdk/forgejo/v2` — Forgejo API -- `code.gitea.io/sdk/gitea` — Gitea API -- `github.com/stretchr/testify` — test assertions -- `golang.org/x/net` — HTTP utilities + +- `codeberg.org/mvdkleijn/forgejo-sdk/forgejo/v2` -- Forgejo API +- `code.gitea.io/sdk/gitea` -- Gitea API +- `github.com/stretchr/testify` -- test assertions +- `golang.org/x/net` -- HTML parsing +- `gopkg.in/yaml.v3` -- YAML parsing diff --git a/docs/development.md b/docs/development.md index 57991fb..957c850 100644 --- a/docs/development.md +++ b/docs/development.md @@ -1,14 +1,19 @@ -# go-scm Development Guide +--- +title: Development Guide +description: How to build, test, and contribute to go-scm. +--- + +# Development Guide --- ## Prerequisites -- Go 1.25 or later -- Git (for `git/` package integration tests) -- `gh` CLI (for `collect/github.go` and rate limit checking — not required for unit tests) -- SSH access to agent machines (for `agentci/` integration — not required for unit tests) -- Access to `forge.lthn.ai/core/go` for the framework dependency +- **Go 1.26** or later +- **Git** (for `git/` package tests) +- **`gh` CLI** (for `collect/github.go` and rate limit checking -- not required for unit tests) +- SSH access to agent machines (for `agentci/` integration -- not required for unit tests) +- Access to `forge.lthn.ai/core/go` and sibling modules for the framework dependency --- @@ -16,29 +21,45 @@ ``` go-scm/ -├── go.mod Module definition (forge.lthn.ai/core/go-scm) -├── forge/ Forgejo API client + tests -├── gitea/ Gitea API client + tests -├── git/ Multi-repo git operations + tests -├── agentci/ Clotho Protocol + security + tests -├── jobrunner/ Poller, journal, types + tests -│ ├── forgejo/ Forgejo signal source + tests -│ └── handlers/ Pipeline handlers + tests -├── collect/ Data collection pipeline + tests -└── docs/ Architecture, development, history ++-- go.mod Module definition (forge.lthn.ai/core/go-scm) ++-- forge/ Forgejo API client + tests ++-- gitea/ Gitea API client + tests ++-- git/ Multi-repo git operations + tests ++-- agentci/ Clotho Protocol, agent config, security + tests ++-- jobrunner/ Poller, journal, types + tests +| +-- forgejo/ Forgejo signal source + tests +| +-- handlers/ Pipeline handlers + tests ++-- collect/ Data collection pipeline + tests ++-- manifest/ Application manifests, ed25519 signing + tests ++-- marketplace/ Module catalogue and installer + tests ++-- plugin/ CLI plugin system + tests ++-- repos/ Workspace registry, work config, git state + tests ++-- cmd/ +| +-- forge/ CLI commands for `core forge` +| +-- gitea/ CLI commands for `core gitea` +| +-- collect/ CLI commands for data collection ++-- docs/ Documentation ++-- .core/ Build and release configuration ``` --- ## Building -This module has no binary targets — it is a library. Build validation is via tests and `go vet`: +This module is primarily a library. Build validation: ```bash go build ./... # Compile all packages go vet ./... # Static analysis ``` +If using the `core` CLI with a `.core/build.yaml` present: + +```bash +core go qa # fmt + vet + lint + test +core go qa full # + race, vuln, security +``` + --- ## Testing @@ -62,7 +83,7 @@ go test -v ./agentci/... go test -race ./... ``` -Race detection is particularly important for `git/` (parallel status), `jobrunner/` (concurrent poller cycles), and `collect/` (concurrent rate limiter). +Race detection is particularly important for `git/` (parallel status), `jobrunner/` (concurrent poller cycles), and `collect/` (concurrent rate limiter access). ### Coverage @@ -71,50 +92,36 @@ go test -coverprofile=cover.out ./... go tool cover -html=cover.out ``` -Current coverage targets (as of Phase 3 completion): - -| Package | Coverage | -|---------|---------| -| forge/ | 91.2% | -| gitea/ | 89.2% | -| git/ | 96.7% | -| agentci/ | 94.5% | -| jobrunner/ | 86.4% | -| jobrunner/forgejo/ | 95.0% | -| jobrunner/handlers/ | 83.8% | -| collect/ | 83.0% | - --- ## Local Dependencies -`go-scm` depends on `forge.lthn.ai/core/go` for the framework, logging, config, and IO abstractions. The `go.mod` file includes a `replace` directive pointing to a sibling directory: +`go-scm` depends on several `forge.lthn.ai/core/*` modules. The recommended approach is to use a Go workspace file: -``` -replace forge.lthn.ai/core/go => ../go -``` - -**Preferred approach:** use a `go.work` file in your workspace root to avoid editing `go.mod` for local development: - -``` -// go.work -go 1.25 +```go +// ~/Code/go.work +go 1.26 use ( - ./go - ./go-scm + ./core/go + ./core/go-io + ./core/go-log + ./core/go-config + ./core/go-scm + ./core/go-i18n + ./core/go-crypt ) ``` -With a workspace file in place, the `replace` directive in `go.mod` is superseded and can be left as a fallback. +With a workspace file in place, `replace` directives in `go.mod` are superseded and local edits across modules work seamlessly. --- ## Test Patterns -### forge/ and gitea/ — httptest mock server +### forge/ and gitea/ -- httptest Mock Server -Both SDK wrappers require a live HTTP server because the Forgejo SDK makes an HTTP GET to `/api/v1/version` during client construction. Use `net/http/httptest` for all tests: +Both SDK wrappers require a live HTTP server because the Forgejo/Gitea SDKs make an HTTP GET to `/api/v1/version` during client construction. Use `net/http/httptest`: ```go func setupServer(t *testing.T) (*forge.Client, *httptest.Server) { @@ -132,12 +139,7 @@ func setupServer(t *testing.T) (*forge.Client, *httptest.Server) { } ``` -SDK route patterns sometimes differ from the public API documentation. Notable divergences discovered during test construction: - -- `CreateOrgRepo` uses `/api/v1/org/{name}/repos` (singular `org`) -- `ListOrgRepos` uses `/api/v1/orgs/{name}/repos` (plural `orgs`) - -**Config isolation:** always isolate the config file from the real development machine during tests: +**Config isolation** -- always isolate the config file from the real machine: ```go t.Setenv("HOME", t.TempDir()) @@ -145,11 +147,14 @@ t.Setenv("FORGE_TOKEN", "test-token") t.Setenv("FORGE_URL", srv.URL) ``` -**Gitea mirror validation:** `CreateMirrorRepo` with `Service: GitServiceGithub` requires a non-empty `AuthToken`. The SDK rejects the request locally before sending to the server if the token is absent. +**SDK route divergences** discovered during testing: -### git/ — real git repositories +- `CreateOrgRepo` uses `/api/v1/org/{name}/repos` (singular `org`) +- `ListOrgRepos` uses `/api/v1/orgs/{name}/repos` (plural `orgs`) -`git/` tests use real temporary git repositories rather than mocks. The standard setup pattern: +### git/ -- Real Git Repositories + +`git/` tests use real temporary git repos rather than mocks: ```go func setupRepo(t *testing.T) string { @@ -176,9 +181,7 @@ clone := t.TempDir() exec.Command("git", "clone", bare, clone).Run() ``` -**Service layer:** `git.Service`, `OnStartup`, `handleQuery`, and `handleTask` depend on `framework.Core`. Test these indirectly through `DirtyRepos()`, `AheadRepos()`, and `Status()` by setting `lastStatus` directly, or via integration tests. - -### agentci/ — unit tests only +### agentci/ -- Unit Tests Only `agentci/` functions are pure (no I/O except SSH exec construction) and test without mocks: @@ -190,11 +193,9 @@ func TestSanitizePath_Good(t *testing.T) { } ``` -`SanitizePath("../secret")` returns `"secret"` — it strips the directory component via `filepath.Base` rather than rejecting the input. This is the documented, correct behaviour. +### jobrunner/ -- Table-Driven Handler Tests -### jobrunner/ — table-driven handler tests - -Handler tests use the `JobHandler` interface directly with a mock `forge.Client` constructed via `httptest`. The preferred pattern is table-driven: +Handler tests use the `JobHandler` interface directly with a mock `forge.Client`: ```go tests := []struct { @@ -212,11 +213,9 @@ for _, tt := range tests { } ``` -Integration tests in `handlers/integration_test.go` test the full signal-to-result flow: construct a Poller, register sources and handlers, call `RunOnce`, verify journal entries and Forgejo API calls on the mock server. +### collect/ -- Mixed Unit and HTTP Mock -### collect/ — mixed unit and HTTP mock - -Pure functions (state management, rate limiter logic, event dispatch) test without I/O. HTTP-dependent collectors (`Collect` methods for BitcoinTalk, GitHub, IACR, arXiv) require mock HTTP servers: +Pure functions (state, rate limiter, events) test without I/O. HTTP-dependent collectors use mock servers: ```go srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { @@ -226,40 +225,70 @@ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.R t.Cleanup(srv.Close) ``` +The `SetHTTPClient` function allows injecting a custom HTTP client for tests. + +### manifest/, marketplace/, plugin/ -- io.Medium Mocks + +These packages use the `io.Medium` abstraction. Tests use `io.NewMockMedium()` to avoid filesystem interaction: + +```go +m := io.NewMockMedium() +m.Write(".core/manifest.yaml", yamlContent) +manifest, err := manifest.Load(m, ".") +``` + +### repos/ -- io.Medium with Seed Data + +```go +m := io.NewMockMedium() +m.Write("repos.yaml", registryYAML) +reg, err := repos.LoadRegistry(m, "repos.yaml") +``` + +--- + +## Test Naming Convention + +Tests use the `_Good` / `_Bad` / `_Ugly` suffix pattern: + +| Suffix | Meaning | +|--------|---------| +| `_Good` | Happy path -- expected success | +| `_Bad` | Expected error conditions | +| `_Ugly` | Panic, edge cases, malformed input | + --- ## Coding Standards ### Language -Use UK English throughout: colour, organisation, centre, licence (noun), authorise, behaviour. Never American spellings. +Use **UK English** throughout: colour, organisation, centre, licence (noun), authorise, behaviour. Never American spellings. -### Go style +### Go Style - All parameters and return types must have explicit type declarations. -- Error strings follow `"package.Function: context: %w"` for wrapped errors, `"package.Function: message"` for sentinel errors. No bare `fmt.Errorf("something failed")` strings. - Import groups: stdlib, then `forge.lthn.ai/...`, then third-party, each separated by a blank line. -- Use `testify/require` for fatal assertions, `testify/assert` for non-fatal. Prefer `require.NoError` over `assert.NoError` when subsequent test steps depend on the result. -- Test naming convention: `_Good` (happy path), `_Bad` (expected error), `_Ugly` (panic/edge case). +- Use `testify/require` for fatal assertions, `testify/assert` for non-fatal. Prefer `require.NoError` when subsequent steps depend on the result. -### Error wrapping +### Error Wrapping ```go -// Correct — contextual prefix with package.Function -return nil, fmt.Errorf("forge.CreateRepo: marshal options: %w", err) - -// Correct — using the log.E helper from core/go +// Correct -- using the log.E helper from core/go-log return nil, log.E("forge.CreateRepo", "failed to create repository", err) -// Incorrect — bare error with no context +// Correct -- contextual prefix with package.Function +return nil, fmt.Errorf("forge.CreateRepo: marshal options: %w", err) + +// Incorrect -- bare error with no context return nil, fmt.Errorf("failed") ``` -### Context propagation +### Context Propagation - `git/` and `collect/` propagate context correctly via `exec.CommandContext`. -- `forge/` and `gitea/` accept context at the wrapper boundary but cannot pass it to the SDK (SDK limitation, see architecture.md). -- `agentci/` uses `SecureSSHCommandContext` for all SSH operations — never use `SecureSSHCommand` (deprecated). +- `forge/` and `gitea/` accept context at the wrapper boundary but cannot pass it to the SDK (SDK limitation). +- `agentci/` uses `SecureSSHCommand` for all SSH operations. --- @@ -272,6 +301,7 @@ feat(forge): add GetCombinedStatus wrapper fix(jobrunner): prevent double-dispatch on in-progress issues test(git): add ahead/behind with bare remote docs(agentci): document Clotho dual-run flow +refactor(collect): extract common HTTP fetch into generic function ``` Valid types: `feat`, `fix`, `test`, `docs`, `refactor`, `chore`. @@ -284,9 +314,33 @@ Co-Authored-By: Virgil --- -## Licence +## Adding a New Package -EUPL-1.2. All source files must carry the EUPL-1.2 licence header if one is added to the project. The licence is compatible with GPL v2/v3 and AGPL v3. +1. Create the package directory under the module root. +2. Add `package ` with a doc comment describing the package's purpose. +3. Follow the existing `client.go` / `config.go` / `types.go` naming pattern where applicable. +4. Write tests from the start -- avoid creating packages without at least a skeleton test file. +5. Add the package to the architecture documentation. +6. Maintain import group ordering: stdlib, then `forge.lthn.ai/...`, then third-party. + +## Adding a New Handler + +1. Create `jobrunner/handlers/.go` with a struct implementing `jobrunner.JobHandler`. +2. `Name()` returns a lowercase identifier (e.g. `"tick_parent"`). +3. `Match(signal)` should be narrow -- handlers are checked in registration order and the first match wins. +4. `Execute(ctx, signal)` must always return an `*ActionResult`, even on partial failure. +5. Add a corresponding `_test.go` with at minimum one `_Good` and one `_Bad` test. +6. Register the handler in `Poller` configuration alongside existing handlers. + +## Adding a New Collector + +1. Create a new file in `collect/` (e.g. `collect/mynewsource.go`). +2. Implement the `Collector` interface (`Name()` and `Collect(ctx, cfg)`). +3. Use `cfg.Limiter.Wait(ctx, "source-name")` before each HTTP request. +4. Emit events via `cfg.Dispatcher` for progress reporting. +5. Write output via `cfg.Output` (the `io.Medium`), not directly to the filesystem. +6. Honour `cfg.DryRun` -- log what would be done without writing. +7. Return a `*Result` with accurate `Items`, `Errors`, `Skipped`, and `Files` counts. --- @@ -299,24 +353,10 @@ git push origin main # Remote: ssh://git@forge.lthn.ai:2223/core/go-scm.git ``` -HTTPS authentication to `forge.lthn.ai` is not configured — always use SSH. The SSH port is 2223. +HTTPS authentication to `forge.lthn.ai` is not configured -- always use SSH on port 2223. --- -## Adding a New Package +## Licence -1. Create the package directory under the module root. -2. Add `package ` with a doc comment describing the package's purpose. -3. Follow the existing `client.go` / `config.go` / `types.go` naming pattern where applicable. -4. Write tests from the start — avoid creating packages without at least a skeleton test file. -5. Add the package to the dependency graph in `docs/architecture.md`. -6. Import groups must be maintained: stdlib, then `forge.lthn.ai/...`, then third-party. - -## Adding a New Handler - -1. Create `jobrunner/handlers/.go` with a struct implementing `jobrunner.JobHandler`. -2. `Name()` returns a lowercase identifier (e.g. `"tick_parent"`). -3. `Match(signal)` should be narrow — handlers are checked in registration order and the first match wins. -4. `Execute(ctx, signal)` must always return an `*ActionResult`, even on partial failure. -5. Add a corresponding `_test.go` with at minimum one `_Good` and one `_Bad` test. -6. Register the handler in `Poller` configuration alongside existing handlers. +EUPL-1.2. The licence is compatible with GPL v2/v3 and AGPL v3. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..c33903e --- /dev/null +++ b/docs/index.md @@ -0,0 +1,150 @@ +--- +title: go-scm +description: SCM integration, AgentCI automation, and data collection for the Lethean ecosystem. +--- + +# go-scm + +`go-scm` provides source control management integration for the Lethean ecosystem. It wraps the Forgejo and Gitea APIs behind ergonomic Go clients, runs an automated PR pipeline for AI agent workflows, collects data from external sources, and manages multi-repo workspaces via a declarative registry. + +**Module path:** `forge.lthn.ai/core/go-scm` +**Go version:** 1.26 +**Licence:** EUPL-1.2 + +## Quick Start + +### Forgejo API Client + +```go +import "forge.lthn.ai/core/go-scm/forge" + +// Create a client from config file / env / flags +client, err := forge.NewFromConfig("", "") + +// List open issues +issues, err := client.ListIssues("core", "go-scm", forge.ListIssuesOpts{ + State: "open", +}) + +// List repos in an organisation (paginated iterator) +for repo, err := range client.ListOrgReposIter("core") { + fmt.Println(repo.Name) +} +``` + +### Multi-Repo Git Status + +```go +import "forge.lthn.ai/core/go-scm/git" + +statuses := git.Status(ctx, git.StatusOptions{ + Paths: []string{"/home/dev/core/go-scm", "/home/dev/core/go-ai"}, + Names: map[string]string{"/home/dev/core/go-scm": "go-scm"}, +}) + +for _, s := range statuses { + if s.IsDirty() { + fmt.Printf("%s: %d modified, %d untracked\n", s.Name, s.Modified, s.Untracked) + } +} +``` + +### AgentCI Poll-Dispatch Loop + +```go +import ( + "forge.lthn.ai/core/go-scm/jobrunner" + "forge.lthn.ai/core/go-scm/jobrunner/forgejo" + "forge.lthn.ai/core/go-scm/jobrunner/handlers" +) + +source := forgejo.New(forgejo.Config{Repos: []string{"core/go-scm"}}, forgeClient) +poller := jobrunner.NewPoller(jobrunner.PollerConfig{ + Sources: []jobrunner.JobSource{source}, + Handlers: []jobrunner.JobHandler{ + handlers.NewDispatchHandler(forgeClient, forgeURL, token, spinner), + handlers.NewTickParentHandler(forgeClient), + handlers.NewEnableAutoMergeHandler(forgeClient), + }, + PollInterval: 60 * time.Second, +}) +poller.Run(ctx) +``` + +### Data Collection + +```go +import "forge.lthn.ai/core/go-scm/collect" + +cfg := collect.NewConfig("/tmp/collected") +excavator := &collect.Excavator{ + Collectors: []collect.Collector{ + &collect.GitHubCollector{Org: "lethean-io"}, + &collect.MarketCollector{CoinID: "lethean", Historical: true}, + &collect.PapersCollector{Source: "all", Query: "cryptography VPN"}, + }, + Resume: true, +} +result, err := excavator.Run(ctx, cfg) +``` + +## Package Layout + +| Package | Import Path | Description | +|---------|-------------|-------------| +| `forge` | `go-scm/forge` | Forgejo API client -- repos, issues, PRs, labels, webhooks, organisations, PR metadata | +| `gitea` | `go-scm/gitea` | Gitea API client -- repos, issues, PRs, mirroring, PR metadata | +| `git` | `go-scm/git` | Multi-repo git operations -- parallel status checks, push, pull; Core DI service | +| `jobrunner` | `go-scm/jobrunner` | AgentCI pipeline engine -- signal types, poller loop, JSONL audit journal | +| `jobrunner/forgejo` | `go-scm/jobrunner/forgejo` | Forgejo job source -- polls epic issues for unchecked children, builds signals | +| `jobrunner/handlers` | `go-scm/jobrunner/handlers` | Pipeline handlers -- dispatch, completion, auto-merge, publish-draft, dismiss-reviews, fix-command, tick-parent | +| `agentci` | `go-scm/agentci` | Clotho Protocol orchestrator -- agent config, SSH security helpers, dual-run verification | +| `collect` | `go-scm/collect` | Data collection framework -- collector interface, rate limiting, state tracking, event dispatch | +| `manifest` | `go-scm/manifest` | Application manifest -- YAML parsing, ed25519 signing and verification | +| `marketplace` | `go-scm/marketplace` | Module marketplace -- catalogue index, search, git-based installer with signature verification | +| `plugin` | `go-scm/plugin` | CLI plugin system -- plugin interface, JSON registry, loader, GitHub-based installer | +| `repos` | `go-scm/repos` | Workspace management -- `repos.yaml` registry, topological sorting, work config, git state, KB config | +| `cmd/forge` | `go-scm/cmd/forge` | CLI commands for the `core forge` subcommand | +| `cmd/gitea` | `go-scm/cmd/gitea` | CLI commands for the `core gitea` subcommand | +| `cmd/collect` | `go-scm/cmd/collect` | CLI commands for data collection | + +## Dependencies + +### Direct + +| Module | Purpose | +|--------|---------| +| `codeberg.org/mvdkleijn/forgejo-sdk/forgejo/v2` | Forgejo API SDK | +| `code.gitea.io/sdk/gitea` | Gitea API SDK | +| `forge.lthn.ai/core/cli` | CLI framework (Cobra, TUI) | +| `forge.lthn.ai/core/go-config` | Layered config (`~/.core/config.yaml`) | +| `forge.lthn.ai/core/go-io` | Filesystem abstraction (Medium, Sandbox, Store) | +| `forge.lthn.ai/core/go-log` | Structured logging and contextual error helper | +| `forge.lthn.ai/core/go-i18n` | Internationalisation | +| `github.com/stretchr/testify` | Test assertions | +| `golang.org/x/net` | HTML parsing for collectors | +| `gopkg.in/yaml.v3` | YAML parsing for manifests and registries | + +### Indirect + +The module transitively pulls in `forge.lthn.ai/core/go` (DI framework) via `go-config`, plus `spf13/viper`, `spf13/cobra`, Charmbracelet TUI libraries, and Go standard library extensions. + +## Configuration + +Authentication for both Forgejo and Gitea is resolved through a three-tier priority chain: + +1. **Config file** -- `~/.core/config.yaml` keys `forge.url`, `forge.token` (or `gitea.*`) +2. **Environment variables** -- `FORGE_URL`, `FORGE_TOKEN` (or `GITEA_URL`, `GITEA_TOKEN`) +3. **CLI flags** -- `--url`, `--token` (highest priority) + +Set credentials once: + +```bash +core forge config --url https://forge.lthn.ai --token +core gitea config --url https://gitea.snider.dev --token +``` + +## Further Reading + +- [Architecture](architecture.md) -- internal design, key types, data flow +- [Development Guide](development.md) -- building, testing, contributing