go-container/specs/TASK_PROTOCOL.md
Snider b19cdd91a5 refactor: replace os file ops and fmt.Errorf/errors.New with go-io/go-log equivalents
Replace all os.ReadFile/os.WriteFile/os.MkdirAll in production code with
coreio.Local equivalents (Read, Write, EnsureDir) from go-io. Replace all
fmt.Errorf and errors.New with coreerr.E() from go-log, adding structured
operation context to all error returns. Promote go-log from indirect to
direct dependency in go.mod.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-16 19:17:34 +00:00

19 KiB

Host Hub Task Protocol

Version: 2.1 Created: 2026-01-01 Updated: 2026-01-16 Purpose: Ensure agent work is verified before being marked complete, and provide patterns for efficient parallel implementation.

Lesson learned (Jan 2026): Task files written as checklists without implementation evidence led to 6+ "complete" tasks that were actually 70-85% done. Planning ≠ implementation. Evidence required.


The Problem

Agents optimise for conversation completion, not task completion. Saying "done" is computationally cheaper than doing the work. Context compaction loses task state. Nobody verifies output against spec.

The Solution

Separation of concerns:

  1. Planning Agent — writes the spec
  2. Implementation Agent — does the work
  3. Verification Agent — checks the work against spec
  4. Human — approves or rejects based on verification

Directory Structure

doc/
├── TASK_PROTOCOL.md          # This file
└── ...                       # Reference documentation

tasks/
├── TODO.md                   # Active task summary
├── TASK-XXX-feature.md       # Active task specs
├── agentic-tasks/            # Agentic system tasks
└── future-products/          # Parked product plans

archive/
├── released/                 # Completed tasks (for reference)
└── ...                       # Historical snapshots

Task File Schema

Every task file follows this structure:

# TASK-XXX: [Short Title]

**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Assignee:** [agent session or human]
**Verifier:** [different agent session]

---

## Objective

[One paragraph: what does "done" look like?]

---

## Acceptance Criteria

- [ ] AC1: [Specific, verifiable condition]
- [ ] AC2: [Specific, verifiable condition]
- [ ] AC3: [Specific, verifiable condition]

Each criterion must be:
- Binary (yes/no, not "mostly")
- Verifiable by code inspection or test
- Independent (can check without context)

---

## Implementation Checklist

- [ ] File: `path/to/file.php` — [what it should contain]
- [ ] File: `path/to/other.php` — [what it should contain]
- [ ] Test: `tests/Feature/XxxTest.php` passes
- [ ] Migration: runs without error

---

## Verification Results

### Check 1: [Date] by [Agent]

| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ❌ FAIL | Missing method Y in class Z |
| AC3 | ⚠️ PARTIAL | 3 of 5 tests pass |

**Verdict:** FAIL — AC2 not met

### Check 2: [Date] by [Agent]

| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ✅ PASS | Method Y added, verified |
| AC3 | ✅ PASS | All 5 tests pass |

**Verdict:** PASS — ready for human approval

---

## Notes

[Any context, blockers, decisions made during implementation]

Implementation Evidence (Required)

A checklist is not evidence. Prove the work exists.

Every completed phase MUST include:

1. Git Evidence

**Commits:**
- `abc123` - Add Domain model and migration
- `def456` - Add DomainController with CRUD
- `ghi789` - Add 28 domain tests

2. Test Count

**Tests:** 28 passing (run: `php artisan test app/Mod/Bio/Tests/Feature/DomainTest.php`)

3. File Manifest

**Files created/modified:**
- `app/Mod/Bio/Models/Domain.php` (new)
- `app/Mod/Bio/Http/Controllers/DomainController.php` (new)
- `database/migrations/2026_01_16_create_domains_table.php` (new)
- `app/Mod/Bio/Tests/Feature/DomainTest.php` (new)

4. "What Was Built" Summary

**Summary:** Custom domain management with DNS verification. Users can add domains,
system generates TXT record for verification, background job checks DNS propagation.
Includes SSL provisioning via Caddy API.

Why This Matters

In Jan 2026, an audit found:

  • Commerce Matrix Plan marked "95% done" was actually 75%
  • Internal WAF section was skipped entirely (extracted to Core Bouncer)
  • Warehouse/fulfillment (6 features) listed as "one item" in TODO
  • Task files read like planning documents, not completion logs

Without evidence, "done" means nothing.


Workflow

1. Task Creation

Human or planning agent creates task file in tasks/:

  • Status: draft
  • Must have clear acceptance criteria
  • Must have implementation checklist

2. Task Ready

Human reviews and sets:

  • Status: ready
  • Assignee: next available agent

3. Implementation

Implementation agent:

  • Sets status: in_progress
  • Works through implementation checklist
  • Checks boxes as work is done
  • When complete, sets status: needs_verification
  • MUST NOT mark acceptance criteria as passed

4. Verification

Different agent (verification agent):

  • Reads the task file
  • Independently checks each acceptance criterion
  • Records evidence in Verification Results section
  • Sets verdict: PASS or FAIL
  • If PASS: status → verified, move to archive/released/
  • If FAIL: status → in_progress, back to implementation agent

5. Human Approval

Human reviews verified task:

  • Spot-check the evidence
  • If satisfied: status → approved, can delete or keep in archive
  • If not: back to needs_verification with notes

Agent Instructions

For Implementation Agents

You are implementing TASK-XXX.

1. Read the full task file
2. Set status to "in_progress"
3. Work through the implementation checklist
4. Check boxes ONLY for work you have completed
5. When done, set status to "needs_verification"
6. DO NOT check acceptance criteria boxes
7. DO NOT mark the task as complete
8. Update "Last Updated" with current timestamp

Your job is to do the work, not to verify it.

For Verification Agents

You are verifying TASK-XXX.

1. Read the full task file
2. For EACH acceptance criterion:
   a. Check the codebase independently
   b. Record what you found (file paths, line numbers, test output)
   c. Mark as PASS, FAIL, or PARTIAL with evidence
3. Add a new "Verification Results" section with today's date
4. Set verdict: PASS or FAIL
5. If PASS: move file to archive/released/
6. If FAIL: set status back to "in_progress"
7. Update "Last Updated" with current timestamp

You are the gatekeeper. Be thorough. Trust nothing the implementation agent said.

Status Flow

draft → ready → in_progress → needs_verification → verified → approved
                     ↑                    │
                     └────────────────────┘
                        (if verification fails)

Phase-Based Decomposition

Large tasks should be decomposed into independent phases that can be executed in parallel by multiple agents. This dramatically reduces implementation time.

Phase Independence Rules

  1. No shared state — Each phase writes to different files/tables
  2. No blocking dependencies — Phase 3 shouldn't wait for Phase 2's output
  3. Clear boundaries — Each phase has its own acceptance criteria
  4. Testable isolation — Phase tests don't require other phases

Example Decomposition

A feature like "BioHost Missing Features" might decompose into:

Phase Focus Can Parallel With
1 Domain Management 2, 3, 4
2 Project System 1, 3, 4
3 Analytics Core 1, 2, 4
4 Form Submissions 1, 2, 3
5 Link Scheduling 1, 2, 3, 4
... ... ...
12 MCP Tools (polish) After 1-11
13 Admin UI (polish) After 1-11

Phase Sizing

  • Target: 4-8 acceptance criteria per phase
  • Estimated time: 2-4 hours per phase
  • Test count: 15-40 tests per phase
  • File count: 3-10 files modified per phase

Standard Phase Types

Every large task should include these phase types:

Core Implementation Phases (1-N)

The main feature work. Group by:

  • Resource type (domains, projects, analytics)
  • Functional area (CRUD, scheduling, notifications)
  • Data flow (input, processing, output)

Polish Phase: MCP Tools

Always include as second-to-last phase.

Exposes all implemented features to AI agents via MCP protocol.

Standard acceptance criteria:

  • MCP tool class exists at app/Mcp/Tools/{Feature}Tools.php
  • All CRUD operations exposed as actions
  • Tool includes prompts for common workflows
  • Tool includes resources for data access
  • Tests verify all MCP actions return expected responses
  • Tool registered in MCP service provider

Polish Phase: Admin UI Integration

Always include as final phase.

Integrates features into the admin dashboard.

Standard acceptance criteria:

  • Sidebar navigation updated with feature section
  • Index/list page with filtering and search
  • Detail/edit pages for resources
  • Bulk actions where appropriate
  • Breadcrumb navigation
  • Role-based access control
  • Tests verify all admin routes respond correctly

Parallel Agent Execution

Firing Multiple Agents

When phases are independent, fire agents simultaneously:

Human: "Implement phases 1-4 in parallel"

Agent fires 4 Task tools simultaneously:
- Task(Phase 1: Domain Management)
- Task(Phase 2: Project System)
- Task(Phase 3: Analytics Core)
- Task(Phase 4: Form Submissions)

Agent Prompt Template

You are implementing Phase X of TASK-XXX: [Task Title]

Read the task file at: tasks/TASK-XXX-feature-name.md

Your phase covers acceptance criteria ACxx through ACyy.

Implementation requirements:
1. Create all files listed in the Phase X implementation checklist
2. Write comprehensive Pest tests (target: 20-40 tests)
3. Follow existing codebase patterns
4. Use workspace-scoped multi-tenancy
5. Check entitlements for tier-gated features

When complete:
1. Update the task file marking Phase X checklist items done
2. Report: files created, test count, any blockers

Do NOT mark acceptance criteria as passed — verification agent does that.

Coordination Rules

  1. Linter accepts all — Configure to auto-accept agent file modifications
  2. No merge conflicts — Phases write to different files
  3. Collect results — Wait for all agents, then fire next wave
  4. Wave pattern — Group dependent phases into waves

Wave Execution Example

Wave 1 (parallel): Phases 1, 2, 3, 4
  ↓ (all complete)
Wave 2 (parallel): Phases 5, 6, 7, 8
  ↓ (all complete)
Wave 3 (parallel): Phases 9, 10, 11
  ↓ (all complete)
Wave 4 (sequential): Phase 12 (MCP), then Phase 13 (UI)

Task File Schema (Extended)

For large phased tasks, extend the schema:

# TASK-XXX: [Feature Name]

**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Complexity:** small (1-3 phases) | medium (4-8 phases) | large (9+ phases)
**Estimated Phases:** N
**Completed Phases:** M/N

---

## Objective

[One paragraph: what does "done" look like?]

---

## Scope

- **Models:** X new, Y modified
- **Migrations:** Z new tables
- **Livewire Components:** A new
- **Tests:** B target test count
- **Estimated Hours:** C-D hours

---

## Phase Overview

| Phase | Name | Status | ACs | Tests |
|-------|------|--------|-----|-------|
| 1 | Domain Management | ✅ Done | AC1-5 | 28 |
| 2 | Project System | ✅ Done | AC6-10 | 32 |
| 3 | Analytics Core | 🔄 In Progress | AC11-16 | - |
| ... | ... | ... | ... | ... |
| 12 | MCP Tools | ⏳ Pending | AC47-53 | - |
| 13 | Admin UI | ⏳ Pending | AC54-61 | - |

---

## Acceptance Criteria

### Phase 1: Domain Management

- [ ] AC1: [Criterion]
- [ ] AC2: [Criterion]
...

### Phase 12: MCP Tools (Standard)

- [ ] AC47: MCP tool class exists with all feature actions
- [ ] AC48: CRUD operations for all resources exposed
- [ ] AC49: Bulk operations exposed (where applicable)
- [ ] AC50: Query/filter operations exposed
- [ ] AC51: MCP prompts created for common workflows
- [ ] AC52: MCP resources expose read-only data access
- [ ] AC53: Tests verify all MCP actions

### Phase 13: Admin UI Integration (Standard)

- [ ] AC54: Sidebar updated with feature navigation
- [ ] AC55: Feature has expandable submenu (if 3+ pages)
- [ ] AC56: Index pages with DataTable/filtering
- [ ] AC57: Create/Edit forms with validation
- [ ] AC58: Detail views with related data
- [ ] AC59: Bulk action support
- [ ] AC60: Breadcrumb navigation
- [ ] AC61: Role-based visibility

---

## Implementation Checklist

### Phase 1: Domain Management
- [ ] File: `app/Models/...`
- [ ] File: `app/Livewire/...`
- [ ] Test: `tests/Feature/...`

### Phase 12: MCP Tools
- [ ] File: `app/Mcp/Tools/{Feature}Tools.php`
- [ ] File: `app/Mcp/Prompts/{Feature}Prompts.php` (optional)
- [ ] File: `app/Mcp/Resources/{Feature}Resources.php` (optional)
- [ ] Test: `tests/Feature/Mcp/{Feature}ToolsTest.php`

### Phase 13: Admin UI
- [ ] File: `resources/views/admin/components/sidebar.blade.php` (update)
- [ ] File: `app/Livewire/Admin/{Feature}/Index.php`
- [ ] File: `resources/views/livewire/admin/{feature}/index.blade.php`
- [ ] Test: `tests/Feature/Admin/{Feature}Test.php`

---

## Verification Results

[Same as before]

---

## Phase Completion Log

### Phase 1: Domain Management
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 28 passing
**Files:** 8 created/modified
**Notes:** [Any context]

### Phase 2: Project System
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 32 passing
...

MCP Endpoint (Future)

When implemented, the MCP endpoint will expose:

GET  /tasks                    # List all tasks with status
GET  /tasks/{id}               # Get task details
POST /tasks/{id}/claim         # Agent claims a task
POST /tasks/{id}/complete      # Agent marks ready for verification
POST /tasks/{id}/verify        # Verification agent submits results
GET  /tasks/next               # Get next unclaimed task
GET  /tasks/verify-queue       # Get tasks needing verification
POST /tasks/{id}/phases/{n}/claim    # Claim specific phase
POST /tasks/{id}/phases/{n}/complete # Complete specific phase
GET  /tasks/{id}/phases              # List phase status

Metrics to Track

  • Tasks created vs completed (per week)
  • Verification pass rate on first attempt
  • Average time from ready → approved
  • Most common failure reasons

Cross-Cutting Concerns

When a feature applies to multiple modules, extract it.

Example: Core Bouncer

The Commerce Matrix Plan included an "Internal WAF" section — a request whitelisting system with training mode. During audit, we realised:

  • It's not commerce-specific
  • It applies to all admin routes, all API endpoints
  • It should be in Core/, not Commerce/

Action: Extracted to CORE_BOUNCER_PLAN.md as a framework-level concern.

Signs to Extract

  • Feature name doesn't include the module name naturally
  • You'd copy-paste it to other modules
  • It's about infrastructure, not business logic
  • Multiple modules would benefit independently

How to Extract

  1. Create new task file for the cross-cutting concern
  2. Add note to original plan: > **EXTRACTED:** Section moved to X
  3. Update TODO.md with the new task
  4. Don't delete from original — leave the note for context

Retrospective Audits

Periodically audit archived tasks against actual implementation.

When to Audit

  • Before starting dependent work
  • When resuming a project after a break
  • When something "complete" seems broken
  • Monthly for active projects

Audit Process

  1. Read the archived task file
  2. Check each acceptance criterion against codebase
  3. Run the tests mentioned in the task
  4. Document gaps found

Audit Template

## Audit: TASK-XXX
**Date:** YYYY-MM-DD
**Auditor:** [human/agent]

| Claimed | Actual | Gap |
|---------|--------|-----|
| Phase 1 complete | ✅ Verified | None |
| Phase 2 complete | ⚠️ Partial | Missing X service |
| Phase 3 complete | ❌ Not done | Only stubs exist |

**Action items:**
- [ ] Create TASK-YYY for Phase 2 gap
- [ ] Move Phase 3 back to TODO as incomplete

Anti-Patterns to Avoid

General

  1. Same agent implements and verifies — defeats the purpose
  2. Vague acceptance criteria — "it works" is not verifiable
  3. Skipping verification — the whole point is independent checking
  4. Bulk marking as done — verify one task at a time
  5. Human approving without spot-check — trust but verify

Evidence & Documentation

  1. Checklist without evidence — planning ≠ implementation
  2. Skipping "What Was Built" summary — context lost on compaction
  3. No test count — can't verify without knowing what to run
  4. Marking section "done" without implementation — major gaps discovered in audits
  5. Vague TODO items — "Warehouse system" hides 6 distinct features

Parallel Execution

  1. Phases with shared files — causes merge conflicts
  2. Sequential dependencies in same wave — blocks parallelism
  3. Skipping polish phases — features hidden from agents and admins
  4. Too many phases per wave — diminishing returns past 4-5 agents
  5. No wave boundaries — chaos when phases actually do depend

MCP Tools

  1. Exposing without testing — broken tools waste agent time
  2. Missing bulk operations — agents do N calls instead of 1
  3. No error context — agents can't debug failures

Admin UI

  1. Flat navigation for large features — use expandable submenus
  2. Missing breadcrumbs — users get lost
  3. No bulk actions — tedious admin experience

Cross-Cutting Concerns

  1. Burying framework features in module plans — extract them
  2. Assuming module-specific when it's not — ask "would other modules need this?"

Quick Reference: Creating a New Task

  1. Copy the extended schema template
  2. Fill in objective and scope
  3. Decompose into phases (aim for 4-8 ACs each)
  4. Map phase dependencies → wave structure
  5. Check for cross-cutting concerns — extract if needed
  6. Always add Phase N-1: MCP Tools
  7. Always add Phase N: Admin UI Integration
  8. Set status to draft, get human review
  9. When ready, fire Wave 1 agents in parallel
  10. Collect results with evidence (commits, tests, files)
  11. Fire next wave
  12. After all phases, run verification agent
  13. Human approval → move to archive/released/

Quick Reference: Completing a Phase

  1. Do the work
  2. Run the tests
  3. Record evidence:
    • Git commits (hashes + messages)
    • Test count and command to run them
    • Files created/modified
    • "What Was Built" summary (2-3 sentences)
  4. Update task file with Phase Completion Log entry
  5. Set phase status to Done
  6. Move to next phase or request verification

Quick Reference: Auditing Archived Work

  1. Read archive/released/ task file
  2. For each phase marked complete:
    • Check files exist
    • Run listed tests
    • Verify against acceptance criteria
  3. Document gaps using Audit Template
  4. Create new tasks for missing work
  5. Update TODO.md with accurate status

This protocol exists because agents lie (unintentionally). The system catches the lies. Parallel execution makes them lie faster, so we verify more. Evidence requirements ensure lies are caught before archiving.