Snider b19cdd91a5 refactor: replace os file ops and fmt.Errorf/errors.New with go-io/go-log equivalents

Replace all os.ReadFile/os.WriteFile/os.MkdirAll in production code with
coreio.Local equivalents (Read, Write, EnsureDir) from go-io. Replace all
fmt.Errorf and errors.New with coreerr.E() from go-log, adding structured
operation context to all error returns. Promote go-log from indirect to
direct dependency in go.mod.

Co-Authored-By: Virgil <virgil@lethean.io>

2026-03-16 19:17:34 +00:00

19 KiB

Raw Blame History

Host Hub Task Protocol

Version: 2.1 Created: 2026-01-01 Updated: 2026-01-16 Purpose: Ensure agent work is verified before being marked complete, and provide patterns for efficient parallel implementation.

Lesson learned (Jan 2026): Task files written as checklists without implementation evidence led to 6+ "complete" tasks that were actually 70-85% done. Planning ≠ implementation. Evidence required.

The Problem

Agents optimise for conversation completion, not task completion. Saying "done" is computationally cheaper than doing the work. Context compaction loses task state. Nobody verifies output against spec.

The Solution

Separation of concerns:

Planning Agent — writes the spec
Implementation Agent — does the work
Verification Agent — checks the work against spec
Human — approves or rejects based on verification

Directory Structure

doc/
├── TASK_PROTOCOL.md          # This file
└── ...                       # Reference documentation

tasks/
├── TODO.md                   # Active task summary
├── TASK-XXX-feature.md       # Active task specs
├── agentic-tasks/            # Agentic system tasks
└── future-products/          # Parked product plans

archive/
├── released/                 # Completed tasks (for reference)
└── ...                       # Historical snapshots

Task File Schema

Every task file follows this structure:

# TASK-XXX: [Short Title]

**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Assignee:** [agent session or human]
**Verifier:** [different agent session]

---

## Objective

[One paragraph: what does "done" look like?]

---

## Acceptance Criteria

- [ ] AC1: [Specific, verifiable condition]
- [ ] AC2: [Specific, verifiable condition]
- [ ] AC3: [Specific, verifiable condition]

Each criterion must be:
- Binary (yes/no, not "mostly")
- Verifiable by code inspection or test
- Independent (can check without context)

---

## Implementation Checklist

- [ ] File: `path/to/file.php` — [what it should contain]
- [ ] File: `path/to/other.php` — [what it should contain]
- [ ] Test: `tests/Feature/XxxTest.php` passes
- [ ] Migration: runs without error

---

## Verification Results

### Check 1: [Date] by [Agent]

| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ❌ FAIL | Missing method Y in class Z |
| AC3 | ⚠️ PARTIAL | 3 of 5 tests pass |

**Verdict:** FAIL — AC2 not met

### Check 2: [Date] by [Agent]

| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ✅ PASS | Method Y added, verified |
| AC3 | ✅ PASS | All 5 tests pass |

**Verdict:** PASS — ready for human approval

---

## Notes

[Any context, blockers, decisions made during implementation]

Implementation Evidence (Required)

A checklist is not evidence. Prove the work exists.

Every completed phase MUST include:

1. Git Evidence

**Commits:**
- `abc123` - Add Domain model and migration
- `def456` - Add DomainController with CRUD
- `ghi789` - Add 28 domain tests

2. Test Count

**Tests:** 28 passing (run: `php artisan test app/Mod/Bio/Tests/Feature/DomainTest.php`)

3. File Manifest

**Files created/modified:**
- `app/Mod/Bio/Models/Domain.php` (new)
- `app/Mod/Bio/Http/Controllers/DomainController.php` (new)
- `database/migrations/2026_01_16_create_domains_table.php` (new)
- `app/Mod/Bio/Tests/Feature/DomainTest.php` (new)

4. "What Was Built" Summary

**Summary:** Custom domain management with DNS verification. Users can add domains,
system generates TXT record for verification, background job checks DNS propagation.
Includes SSL provisioning via Caddy API.

Why This Matters

In Jan 2026, an audit found:

Commerce Matrix Plan marked "95% done" was actually 75%
Internal WAF section was skipped entirely (extracted to Core Bouncer)
Warehouse/fulfillment (6 features) listed as "one item" in TODO
Task files read like planning documents, not completion logs

Without evidence, "done" means nothing.

Workflow

1. Task Creation

Human or planning agent creates task file in tasks/:

Status: draft
Must have clear acceptance criteria
Must have implementation checklist

2. Task Ready

Human reviews and sets:

Status: ready
Assignee: next available agent

3. Implementation

Implementation agent:

Sets status: in_progress
Works through implementation checklist
Checks boxes as work is done
When complete, sets status: needs_verification
MUST NOT mark acceptance criteria as passed

4. Verification

Different agent (verification agent):

Reads the task file
Independently checks each acceptance criterion
Records evidence in Verification Results section
Sets verdict: PASS or FAIL
If PASS: status → verified, move to archive/released/
If FAIL: status → in_progress, back to implementation agent

5. Human Approval

Human reviews verified task:

Spot-check the evidence
If satisfied: status → approved, can delete or keep in archive
If not: back to needs_verification with notes

Agent Instructions

For Implementation Agents

You are implementing TASK-XXX.

1. Read the full task file
2. Set status to "in_progress"
3. Work through the implementation checklist
4. Check boxes ONLY for work you have completed
5. When done, set status to "needs_verification"
6. DO NOT check acceptance criteria boxes
7. DO NOT mark the task as complete
8. Update "Last Updated" with current timestamp

Your job is to do the work, not to verify it.

For Verification Agents

You are verifying TASK-XXX.

1. Read the full task file
2. For EACH acceptance criterion:
   a. Check the codebase independently
   b. Record what you found (file paths, line numbers, test output)
   c. Mark as PASS, FAIL, or PARTIAL with evidence
3. Add a new "Verification Results" section with today's date
4. Set verdict: PASS or FAIL
5. If PASS: move file to archive/released/
6. If FAIL: set status back to "in_progress"
7. Update "Last Updated" with current timestamp

You are the gatekeeper. Be thorough. Trust nothing the implementation agent said.

Status Flow

draft → ready → in_progress → needs_verification → verified → approved
                     ↑                    │
                     └────────────────────┘
                        (if verification fails)

Phase-Based Decomposition

Large tasks should be decomposed into independent phases that can be executed in parallel by multiple agents. This dramatically reduces implementation time.

Phase Independence Rules

No shared state — Each phase writes to different files/tables
No blocking dependencies — Phase 3 shouldn't wait for Phase 2's output
Clear boundaries — Each phase has its own acceptance criteria
Testable isolation — Phase tests don't require other phases

Example Decomposition

A feature like "BioHost Missing Features" might decompose into:

Phase	Focus	Can Parallel With
1	Domain Management	2, 3, 4
2	Project System	1, 3, 4
3	Analytics Core	1, 2, 4
4	Form Submissions	1, 2, 3
5	Link Scheduling	1, 2, 3, 4
...	...	...
12	MCP Tools (polish)	After 1-11
13	Admin UI (polish)	After 1-11

Phase Sizing

Target: 4-8 acceptance criteria per phase
Estimated time: 2-4 hours per phase
Test count: 15-40 tests per phase
File count: 3-10 files modified per phase

Standard Phase Types

Every large task should include these phase types:

Core Implementation Phases (1-N)

The main feature work. Group by:

Resource type (domains, projects, analytics)
Functional area (CRUD, scheduling, notifications)
Data flow (input, processing, output)

Polish Phase: MCP Tools

Always include as second-to-last phase.

Exposes all implemented features to AI agents via MCP protocol.

Standard acceptance criteria:

MCP tool class exists at app/Mcp/Tools/{Feature}Tools.php
All CRUD operations exposed as actions
Tool includes prompts for common workflows
Tool includes resources for data access
Tests verify all MCP actions return expected responses
Tool registered in MCP service provider

Polish Phase: Admin UI Integration

Always include as final phase.

Integrates features into the admin dashboard.

Standard acceptance criteria:

Sidebar navigation updated with feature section
Index/list page with filtering and search
Detail/edit pages for resources
Bulk actions where appropriate
Breadcrumb navigation
Role-based access control
Tests verify all admin routes respond correctly

Parallel Agent Execution

Firing Multiple Agents

When phases are independent, fire agents simultaneously:

Human: "Implement phases 1-4 in parallel"

Agent fires 4 Task tools simultaneously:
- Task(Phase 1: Domain Management)
- Task(Phase 2: Project System)
- Task(Phase 3: Analytics Core)
- Task(Phase 4: Form Submissions)

Agent Prompt Template

You are implementing Phase X of TASK-XXX: [Task Title]

Read the task file at: tasks/TASK-XXX-feature-name.md

Your phase covers acceptance criteria ACxx through ACyy.

Implementation requirements:
1. Create all files listed in the Phase X implementation checklist
2. Write comprehensive Pest tests (target: 20-40 tests)
3. Follow existing codebase patterns
4. Use workspace-scoped multi-tenancy
5. Check entitlements for tier-gated features

When complete:
1. Update the task file marking Phase X checklist items done
2. Report: files created, test count, any blockers

Do NOT mark acceptance criteria as passed — verification agent does that.

Coordination Rules

Linter accepts all — Configure to auto-accept agent file modifications
No merge conflicts — Phases write to different files
Collect results — Wait for all agents, then fire next wave
Wave pattern — Group dependent phases into waves

Wave Execution Example

Wave 1 (parallel): Phases 1, 2, 3, 4
  ↓ (all complete)
Wave 2 (parallel): Phases 5, 6, 7, 8
  ↓ (all complete)
Wave 3 (parallel): Phases 9, 10, 11
  ↓ (all complete)
Wave 4 (sequential): Phase 12 (MCP), then Phase 13 (UI)

Task File Schema (Extended)

For large phased tasks, extend the schema:

# TASK-XXX: [Feature Name]

**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Complexity:** small (1-3 phases) | medium (4-8 phases) | large (9+ phases)
**Estimated Phases:** N
**Completed Phases:** M/N

---

## Objective

[One paragraph: what does "done" look like?]

---

## Scope

- **Models:** X new, Y modified
- **Migrations:** Z new tables
- **Livewire Components:** A new
- **Tests:** B target test count
- **Estimated Hours:** C-D hours

---

## Phase Overview

| Phase | Name | Status | ACs | Tests |
|-------|------|--------|-----|-------|
| 1 | Domain Management | ✅ Done | AC1-5 | 28 |
| 2 | Project System | ✅ Done | AC6-10 | 32 |
| 3 | Analytics Core | 🔄 In Progress | AC11-16 | - |
| ... | ... | ... | ... | ... |
| 12 | MCP Tools | ⏳ Pending | AC47-53 | - |
| 13 | Admin UI | ⏳ Pending | AC54-61 | - |

---

## Acceptance Criteria

### Phase 1: Domain Management

- [ ] AC1: [Criterion]
- [ ] AC2: [Criterion]
...

### Phase 12: MCP Tools (Standard)

- [ ] AC47: MCP tool class exists with all feature actions
- [ ] AC48: CRUD operations for all resources exposed
- [ ] AC49: Bulk operations exposed (where applicable)
- [ ] AC50: Query/filter operations exposed
- [ ] AC51: MCP prompts created for common workflows
- [ ] AC52: MCP resources expose read-only data access
- [ ] AC53: Tests verify all MCP actions

### Phase 13: Admin UI Integration (Standard)

- [ ] AC54: Sidebar updated with feature navigation
- [ ] AC55: Feature has expandable submenu (if 3+ pages)
- [ ] AC56: Index pages with DataTable/filtering
- [ ] AC57: Create/Edit forms with validation
- [ ] AC58: Detail views with related data
- [ ] AC59: Bulk action support
- [ ] AC60: Breadcrumb navigation
- [ ] AC61: Role-based visibility

---

## Implementation Checklist

### Phase 1: Domain Management
- [ ] File: `app/Models/...`
- [ ] File: `app/Livewire/...`
- [ ] Test: `tests/Feature/...`

### Phase 12: MCP Tools
- [ ] File: `app/Mcp/Tools/{Feature}Tools.php`
- [ ] File: `app/Mcp/Prompts/{Feature}Prompts.php` (optional)
- [ ] File: `app/Mcp/Resources/{Feature}Resources.php` (optional)
- [ ] Test: `tests/Feature/Mcp/{Feature}ToolsTest.php`

### Phase 13: Admin UI
- [ ] File: `resources/views/admin/components/sidebar.blade.php` (update)
- [ ] File: `app/Livewire/Admin/{Feature}/Index.php`
- [ ] File: `resources/views/livewire/admin/{feature}/index.blade.php`
- [ ] Test: `tests/Feature/Admin/{Feature}Test.php`

---

## Verification Results

[Same as before]

---

## Phase Completion Log

### Phase 1: Domain Management
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 28 passing
**Files:** 8 created/modified
**Notes:** [Any context]

### Phase 2: Project System
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 32 passing
...

MCP Endpoint (Future)

When implemented, the MCP endpoint will expose:

GET  /tasks                    # List all tasks with status
GET  /tasks/{id}               # Get task details
POST /tasks/{id}/claim         # Agent claims a task
POST /tasks/{id}/complete      # Agent marks ready for verification
POST /tasks/{id}/verify        # Verification agent submits results
GET  /tasks/next               # Get next unclaimed task
GET  /tasks/verify-queue       # Get tasks needing verification
POST /tasks/{id}/phases/{n}/claim    # Claim specific phase
POST /tasks/{id}/phases/{n}/complete # Complete specific phase
GET  /tasks/{id}/phases              # List phase status

Metrics to Track

Tasks created vs completed (per week)
Verification pass rate on first attempt
Average time from ready → approved
Most common failure reasons

Cross-Cutting Concerns

When a feature applies to multiple modules, extract it.

Example: Core Bouncer

The Commerce Matrix Plan included an "Internal WAF" section — a request whitelisting system with training mode. During audit, we realised:

It's not commerce-specific
It applies to all admin routes, all API endpoints
It should be in Core/, not Commerce/

Action: Extracted to CORE_BOUNCER_PLAN.md as a framework-level concern.

Signs to Extract

Feature name doesn't include the module name naturally
You'd copy-paste it to other modules
It's about infrastructure, not business logic
Multiple modules would benefit independently

How to Extract

Create new task file for the cross-cutting concern
Add note to original plan: > **EXTRACTED:** Section moved to X
Update TODO.md with the new task
Don't delete from original — leave the note for context

Retrospective Audits

Periodically audit archived tasks against actual implementation.

When to Audit

Before starting dependent work
When resuming a project after a break
When something "complete" seems broken
Monthly for active projects

Audit Process

Read the archived task file
Check each acceptance criterion against codebase
Run the tests mentioned in the task
Document gaps found

Audit Template

## Audit: TASK-XXX
**Date:** YYYY-MM-DD
**Auditor:** [human/agent]

| Claimed | Actual | Gap |
|---------|--------|-----|
| Phase 1 complete | ✅ Verified | None |
| Phase 2 complete | ⚠️ Partial | Missing X service |
| Phase 3 complete | ❌ Not done | Only stubs exist |

**Action items:**
- [ ] Create TASK-YYY for Phase 2 gap
- [ ] Move Phase 3 back to TODO as incomplete

Anti-Patterns to Avoid

General

Same agent implements and verifies — defeats the purpose
Vague acceptance criteria — "it works" is not verifiable
Skipping verification — the whole point is independent checking
Bulk marking as done — verify one task at a time
Human approving without spot-check — trust but verify

Evidence & Documentation

Checklist without evidence — planning ≠ implementation
Skipping "What Was Built" summary — context lost on compaction
No test count — can't verify without knowing what to run
Marking section "done" without implementation — major gaps discovered in audits
Vague TODO items — "Warehouse system" hides 6 distinct features

Parallel Execution

Phases with shared files — causes merge conflicts
Sequential dependencies in same wave — blocks parallelism
Skipping polish phases — features hidden from agents and admins
Too many phases per wave — diminishing returns past 4-5 agents
No wave boundaries — chaos when phases actually do depend

MCP Tools

Exposing without testing — broken tools waste agent time
Missing bulk operations — agents do N calls instead of 1
No error context — agents can't debug failures

Admin UI

Flat navigation for large features — use expandable submenus
Missing breadcrumbs — users get lost
No bulk actions — tedious admin experience

Cross-Cutting Concerns

Burying framework features in module plans — extract them
Assuming module-specific when it's not — ask "would other modules need this?"

Quick Reference: Creating a New Task

Copy the extended schema template
Fill in objective and scope
Decompose into phases (aim for 4-8 ACs each)
Map phase dependencies → wave structure
Check for cross-cutting concerns — extract if needed
Always add Phase N-1: MCP Tools
Always add Phase N: Admin UI Integration
Set status to draft, get human review
When ready, fire Wave 1 agents in parallel
Collect results with evidence (commits, tests, files)
Fire next wave
After all phases, run verification agent
Human approval → move to archive/released/

Quick Reference: Completing a Phase

Do the work
Run the tests
Record evidence:
- Git commits (hashes + messages)
- Test count and command to run them
- Files created/modified
- "What Was Built" summary (2-3 sentences)
Update task file with Phase Completion Log entry
Set phase status to ✅ Done
Move to next phase or request verification

Quick Reference: Auditing Archived Work

Read archive/released/ task file
For each phase marked complete:
- Check files exist
- Run listed tests
- Verify against acceptance criteria
Document gaps using Audit Template
Create new tasks for missing work
Update TODO.md with accurate status

This protocol exists because agents lie (unintentionally). The system catches the lies. Parallel execution makes them lie faster, so we verify more. Evidence requirements ensure lies are caught before archiving.

19 KiB Raw Blame History

Host Hub Task Protocol

The Problem

The Solution

Directory Structure

Task File Schema

Implementation Evidence (Required)

1. Git Evidence

2. Test Count

3. File Manifest

4. "What Was Built" Summary

Why This Matters

Workflow

1. Task Creation

2. Task Ready

3. Implementation

4. Verification

5. Human Approval

Agent Instructions

For Implementation Agents

For Verification Agents

Status Flow

Phase-Based Decomposition

Phase Independence Rules

Example Decomposition

Phase Sizing

Standard Phase Types

Core Implementation Phases (1-N)

Polish Phase: MCP Tools

Polish Phase: Admin UI Integration

Parallel Agent Execution

Firing Multiple Agents

Agent Prompt Template

Coordination Rules

Wave Execution Example

Task File Schema (Extended)

MCP Endpoint (Future)

Metrics to Track

Cross-Cutting Concerns

Example: Core Bouncer

Signs to Extract

How to Extract

Retrospective Audits

When to Audit

Audit Process

Audit Template

Anti-Patterns to Avoid

General

Evidence & Documentation

Parallel Execution

MCP Tools

Admin UI

Cross-Cutting Concerns

Quick Reference: Creating a New Task

Quick Reference: Completing a Phase

Quick Reference: Auditing Archived Work

19 KiB

Raw Blame History