Replace all os.ReadFile/os.WriteFile/os.MkdirAll in production code with coreio.Local equivalents (Read, Write, EnsureDir) from go-io. Replace all fmt.Errorf and errors.New with coreerr.E() from go-log, adding structured operation context to all error returns. Promote go-log from indirect to direct dependency in go.mod. Co-Authored-By: Virgil <virgil@lethean.io>
19 KiB
Host Hub Task Protocol
Version: 2.1 Created: 2026-01-01 Updated: 2026-01-16 Purpose: Ensure agent work is verified before being marked complete, and provide patterns for efficient parallel implementation.
Lesson learned (Jan 2026): Task files written as checklists without implementation evidence led to 6+ "complete" tasks that were actually 70-85% done. Planning ≠ implementation. Evidence required.
The Problem
Agents optimise for conversation completion, not task completion. Saying "done" is computationally cheaper than doing the work. Context compaction loses task state. Nobody verifies output against spec.
The Solution
Separation of concerns:
- Planning Agent — writes the spec
- Implementation Agent — does the work
- Verification Agent — checks the work against spec
- Human — approves or rejects based on verification
Directory Structure
doc/
├── TASK_PROTOCOL.md # This file
└── ... # Reference documentation
tasks/
├── TODO.md # Active task summary
├── TASK-XXX-feature.md # Active task specs
├── agentic-tasks/ # Agentic system tasks
└── future-products/ # Parked product plans
archive/
├── released/ # Completed tasks (for reference)
└── ... # Historical snapshots
Task File Schema
Every task file follows this structure:
# TASK-XXX: [Short Title]
**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Assignee:** [agent session or human]
**Verifier:** [different agent session]
---
## Objective
[One paragraph: what does "done" look like?]
---
## Acceptance Criteria
- [ ] AC1: [Specific, verifiable condition]
- [ ] AC2: [Specific, verifiable condition]
- [ ] AC3: [Specific, verifiable condition]
Each criterion must be:
- Binary (yes/no, not "mostly")
- Verifiable by code inspection or test
- Independent (can check without context)
---
## Implementation Checklist
- [ ] File: `path/to/file.php` — [what it should contain]
- [ ] File: `path/to/other.php` — [what it should contain]
- [ ] Test: `tests/Feature/XxxTest.php` passes
- [ ] Migration: runs without error
---
## Verification Results
### Check 1: [Date] by [Agent]
| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ❌ FAIL | Missing method Y in class Z |
| AC3 | ⚠️ PARTIAL | 3 of 5 tests pass |
**Verdict:** FAIL — AC2 not met
### Check 2: [Date] by [Agent]
| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ✅ PASS | Method Y added, verified |
| AC3 | ✅ PASS | All 5 tests pass |
**Verdict:** PASS — ready for human approval
---
## Notes
[Any context, blockers, decisions made during implementation]
Implementation Evidence (Required)
A checklist is not evidence. Prove the work exists.
Every completed phase MUST include:
1. Git Evidence
**Commits:**
- `abc123` - Add Domain model and migration
- `def456` - Add DomainController with CRUD
- `ghi789` - Add 28 domain tests
2. Test Count
**Tests:** 28 passing (run: `php artisan test app/Mod/Bio/Tests/Feature/DomainTest.php`)
3. File Manifest
**Files created/modified:**
- `app/Mod/Bio/Models/Domain.php` (new)
- `app/Mod/Bio/Http/Controllers/DomainController.php` (new)
- `database/migrations/2026_01_16_create_domains_table.php` (new)
- `app/Mod/Bio/Tests/Feature/DomainTest.php` (new)
4. "What Was Built" Summary
**Summary:** Custom domain management with DNS verification. Users can add domains,
system generates TXT record for verification, background job checks DNS propagation.
Includes SSL provisioning via Caddy API.
Why This Matters
In Jan 2026, an audit found:
- Commerce Matrix Plan marked "95% done" was actually 75%
- Internal WAF section was skipped entirely (extracted to Core Bouncer)
- Warehouse/fulfillment (6 features) listed as "one item" in TODO
- Task files read like planning documents, not completion logs
Without evidence, "done" means nothing.
Workflow
1. Task Creation
Human or planning agent creates task file in tasks/:
- Status:
draft - Must have clear acceptance criteria
- Must have implementation checklist
2. Task Ready
Human reviews and sets:
- Status:
ready - Assignee:
next available agent
3. Implementation
Implementation agent:
- Sets status:
in_progress - Works through implementation checklist
- Checks boxes as work is done
- When complete, sets status:
needs_verification - MUST NOT mark acceptance criteria as passed
4. Verification
Different agent (verification agent):
- Reads the task file
- Independently checks each acceptance criterion
- Records evidence in Verification Results section
- Sets verdict: PASS or FAIL
- If PASS: status →
verified, move toarchive/released/ - If FAIL: status →
in_progress, back to implementation agent
5. Human Approval
Human reviews verified task:
- Spot-check the evidence
- If satisfied: status →
approved, can delete or keep in archive - If not: back to
needs_verificationwith notes
Agent Instructions
For Implementation Agents
You are implementing TASK-XXX.
1. Read the full task file
2. Set status to "in_progress"
3. Work through the implementation checklist
4. Check boxes ONLY for work you have completed
5. When done, set status to "needs_verification"
6. DO NOT check acceptance criteria boxes
7. DO NOT mark the task as complete
8. Update "Last Updated" with current timestamp
Your job is to do the work, not to verify it.
For Verification Agents
You are verifying TASK-XXX.
1. Read the full task file
2. For EACH acceptance criterion:
a. Check the codebase independently
b. Record what you found (file paths, line numbers, test output)
c. Mark as PASS, FAIL, or PARTIAL with evidence
3. Add a new "Verification Results" section with today's date
4. Set verdict: PASS or FAIL
5. If PASS: move file to archive/released/
6. If FAIL: set status back to "in_progress"
7. Update "Last Updated" with current timestamp
You are the gatekeeper. Be thorough. Trust nothing the implementation agent said.
Status Flow
draft → ready → in_progress → needs_verification → verified → approved
↑ │
└────────────────────┘
(if verification fails)
Phase-Based Decomposition
Large tasks should be decomposed into independent phases that can be executed in parallel by multiple agents. This dramatically reduces implementation time.
Phase Independence Rules
- No shared state — Each phase writes to different files/tables
- No blocking dependencies — Phase 3 shouldn't wait for Phase 2's output
- Clear boundaries — Each phase has its own acceptance criteria
- Testable isolation — Phase tests don't require other phases
Example Decomposition
A feature like "BioHost Missing Features" might decompose into:
| Phase | Focus | Can Parallel With |
|---|---|---|
| 1 | Domain Management | 2, 3, 4 |
| 2 | Project System | 1, 3, 4 |
| 3 | Analytics Core | 1, 2, 4 |
| 4 | Form Submissions | 1, 2, 3 |
| 5 | Link Scheduling | 1, 2, 3, 4 |
| ... | ... | ... |
| 12 | MCP Tools (polish) | After 1-11 |
| 13 | Admin UI (polish) | After 1-11 |
Phase Sizing
- Target: 4-8 acceptance criteria per phase
- Estimated time: 2-4 hours per phase
- Test count: 15-40 tests per phase
- File count: 3-10 files modified per phase
Standard Phase Types
Every large task should include these phase types:
Core Implementation Phases (1-N)
The main feature work. Group by:
- Resource type (domains, projects, analytics)
- Functional area (CRUD, scheduling, notifications)
- Data flow (input, processing, output)
Polish Phase: MCP Tools
Always include as second-to-last phase.
Exposes all implemented features to AI agents via MCP protocol.
Standard acceptance criteria:
- MCP tool class exists at
app/Mcp/Tools/{Feature}Tools.php - All CRUD operations exposed as actions
- Tool includes prompts for common workflows
- Tool includes resources for data access
- Tests verify all MCP actions return expected responses
- Tool registered in MCP service provider
Polish Phase: Admin UI Integration
Always include as final phase.
Integrates features into the admin dashboard.
Standard acceptance criteria:
- Sidebar navigation updated with feature section
- Index/list page with filtering and search
- Detail/edit pages for resources
- Bulk actions where appropriate
- Breadcrumb navigation
- Role-based access control
- Tests verify all admin routes respond correctly
Parallel Agent Execution
Firing Multiple Agents
When phases are independent, fire agents simultaneously:
Human: "Implement phases 1-4 in parallel"
Agent fires 4 Task tools simultaneously:
- Task(Phase 1: Domain Management)
- Task(Phase 2: Project System)
- Task(Phase 3: Analytics Core)
- Task(Phase 4: Form Submissions)
Agent Prompt Template
You are implementing Phase X of TASK-XXX: [Task Title]
Read the task file at: tasks/TASK-XXX-feature-name.md
Your phase covers acceptance criteria ACxx through ACyy.
Implementation requirements:
1. Create all files listed in the Phase X implementation checklist
2. Write comprehensive Pest tests (target: 20-40 tests)
3. Follow existing codebase patterns
4. Use workspace-scoped multi-tenancy
5. Check entitlements for tier-gated features
When complete:
1. Update the task file marking Phase X checklist items done
2. Report: files created, test count, any blockers
Do NOT mark acceptance criteria as passed — verification agent does that.
Coordination Rules
- Linter accepts all — Configure to auto-accept agent file modifications
- No merge conflicts — Phases write to different files
- Collect results — Wait for all agents, then fire next wave
- Wave pattern — Group dependent phases into waves
Wave Execution Example
Wave 1 (parallel): Phases 1, 2, 3, 4
↓ (all complete)
Wave 2 (parallel): Phases 5, 6, 7, 8
↓ (all complete)
Wave 3 (parallel): Phases 9, 10, 11
↓ (all complete)
Wave 4 (sequential): Phase 12 (MCP), then Phase 13 (UI)
Task File Schema (Extended)
For large phased tasks, extend the schema:
# TASK-XXX: [Feature Name]
**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Complexity:** small (1-3 phases) | medium (4-8 phases) | large (9+ phases)
**Estimated Phases:** N
**Completed Phases:** M/N
---
## Objective
[One paragraph: what does "done" look like?]
---
## Scope
- **Models:** X new, Y modified
- **Migrations:** Z new tables
- **Livewire Components:** A new
- **Tests:** B target test count
- **Estimated Hours:** C-D hours
---
## Phase Overview
| Phase | Name | Status | ACs | Tests |
|-------|------|--------|-----|-------|
| 1 | Domain Management | ✅ Done | AC1-5 | 28 |
| 2 | Project System | ✅ Done | AC6-10 | 32 |
| 3 | Analytics Core | 🔄 In Progress | AC11-16 | - |
| ... | ... | ... | ... | ... |
| 12 | MCP Tools | ⏳ Pending | AC47-53 | - |
| 13 | Admin UI | ⏳ Pending | AC54-61 | - |
---
## Acceptance Criteria
### Phase 1: Domain Management
- [ ] AC1: [Criterion]
- [ ] AC2: [Criterion]
...
### Phase 12: MCP Tools (Standard)
- [ ] AC47: MCP tool class exists with all feature actions
- [ ] AC48: CRUD operations for all resources exposed
- [ ] AC49: Bulk operations exposed (where applicable)
- [ ] AC50: Query/filter operations exposed
- [ ] AC51: MCP prompts created for common workflows
- [ ] AC52: MCP resources expose read-only data access
- [ ] AC53: Tests verify all MCP actions
### Phase 13: Admin UI Integration (Standard)
- [ ] AC54: Sidebar updated with feature navigation
- [ ] AC55: Feature has expandable submenu (if 3+ pages)
- [ ] AC56: Index pages with DataTable/filtering
- [ ] AC57: Create/Edit forms with validation
- [ ] AC58: Detail views with related data
- [ ] AC59: Bulk action support
- [ ] AC60: Breadcrumb navigation
- [ ] AC61: Role-based visibility
---
## Implementation Checklist
### Phase 1: Domain Management
- [ ] File: `app/Models/...`
- [ ] File: `app/Livewire/...`
- [ ] Test: `tests/Feature/...`
### Phase 12: MCP Tools
- [ ] File: `app/Mcp/Tools/{Feature}Tools.php`
- [ ] File: `app/Mcp/Prompts/{Feature}Prompts.php` (optional)
- [ ] File: `app/Mcp/Resources/{Feature}Resources.php` (optional)
- [ ] Test: `tests/Feature/Mcp/{Feature}ToolsTest.php`
### Phase 13: Admin UI
- [ ] File: `resources/views/admin/components/sidebar.blade.php` (update)
- [ ] File: `app/Livewire/Admin/{Feature}/Index.php`
- [ ] File: `resources/views/livewire/admin/{feature}/index.blade.php`
- [ ] Test: `tests/Feature/Admin/{Feature}Test.php`
---
## Verification Results
[Same as before]
---
## Phase Completion Log
### Phase 1: Domain Management
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 28 passing
**Files:** 8 created/modified
**Notes:** [Any context]
### Phase 2: Project System
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 32 passing
...
MCP Endpoint (Future)
When implemented, the MCP endpoint will expose:
GET /tasks # List all tasks with status
GET /tasks/{id} # Get task details
POST /tasks/{id}/claim # Agent claims a task
POST /tasks/{id}/complete # Agent marks ready for verification
POST /tasks/{id}/verify # Verification agent submits results
GET /tasks/next # Get next unclaimed task
GET /tasks/verify-queue # Get tasks needing verification
POST /tasks/{id}/phases/{n}/claim # Claim specific phase
POST /tasks/{id}/phases/{n}/complete # Complete specific phase
GET /tasks/{id}/phases # List phase status
Metrics to Track
- Tasks created vs completed (per week)
- Verification pass rate on first attempt
- Average time from ready → approved
- Most common failure reasons
Cross-Cutting Concerns
When a feature applies to multiple modules, extract it.
Example: Core Bouncer
The Commerce Matrix Plan included an "Internal WAF" section — a request whitelisting system with training mode. During audit, we realised:
- It's not commerce-specific
- It applies to all admin routes, all API endpoints
- It should be in
Core/, notCommerce/
Action: Extracted to CORE_BOUNCER_PLAN.md as a framework-level concern.
Signs to Extract
- Feature name doesn't include the module name naturally
- You'd copy-paste it to other modules
- It's about infrastructure, not business logic
- Multiple modules would benefit independently
How to Extract
- Create new task file for the cross-cutting concern
- Add note to original plan:
> **EXTRACTED:** Section moved to X - Update TODO.md with the new task
- Don't delete from original — leave the note for context
Retrospective Audits
Periodically audit archived tasks against actual implementation.
When to Audit
- Before starting dependent work
- When resuming a project after a break
- When something "complete" seems broken
- Monthly for active projects
Audit Process
- Read the archived task file
- Check each acceptance criterion against codebase
- Run the tests mentioned in the task
- Document gaps found
Audit Template
## Audit: TASK-XXX
**Date:** YYYY-MM-DD
**Auditor:** [human/agent]
| Claimed | Actual | Gap |
|---------|--------|-----|
| Phase 1 complete | ✅ Verified | None |
| Phase 2 complete | ⚠️ Partial | Missing X service |
| Phase 3 complete | ❌ Not done | Only stubs exist |
**Action items:**
- [ ] Create TASK-YYY for Phase 2 gap
- [ ] Move Phase 3 back to TODO as incomplete
Anti-Patterns to Avoid
General
- Same agent implements and verifies — defeats the purpose
- Vague acceptance criteria — "it works" is not verifiable
- Skipping verification — the whole point is independent checking
- Bulk marking as done — verify one task at a time
- Human approving without spot-check — trust but verify
Evidence & Documentation
- Checklist without evidence — planning ≠ implementation
- Skipping "What Was Built" summary — context lost on compaction
- No test count — can't verify without knowing what to run
- Marking section "done" without implementation — major gaps discovered in audits
- Vague TODO items — "Warehouse system" hides 6 distinct features
Parallel Execution
- Phases with shared files — causes merge conflicts
- Sequential dependencies in same wave — blocks parallelism
- Skipping polish phases — features hidden from agents and admins
- Too many phases per wave — diminishing returns past 4-5 agents
- No wave boundaries — chaos when phases actually do depend
MCP Tools
- Exposing without testing — broken tools waste agent time
- Missing bulk operations — agents do N calls instead of 1
- No error context — agents can't debug failures
Admin UI
- Flat navigation for large features — use expandable submenus
- Missing breadcrumbs — users get lost
- No bulk actions — tedious admin experience
Cross-Cutting Concerns
- Burying framework features in module plans — extract them
- Assuming module-specific when it's not — ask "would other modules need this?"
Quick Reference: Creating a New Task
- Copy the extended schema template
- Fill in objective and scope
- Decompose into phases (aim for 4-8 ACs each)
- Map phase dependencies → wave structure
- Check for cross-cutting concerns — extract if needed
- Always add Phase N-1: MCP Tools
- Always add Phase N: Admin UI Integration
- Set status to
draft, get human review - When
ready, fire Wave 1 agents in parallel - Collect results with evidence (commits, tests, files)
- Fire next wave
- After all phases, run verification agent
- Human approval → move to
archive/released/
Quick Reference: Completing a Phase
- Do the work
- Run the tests
- Record evidence:
- Git commits (hashes + messages)
- Test count and command to run them
- Files created/modified
- "What Was Built" summary (2-3 sentences)
- Update task file with Phase Completion Log entry
- Set phase status to ✅ Done
- Move to next phase or request verification
Quick Reference: Auditing Archived Work
- Read
archive/released/task file - For each phase marked complete:
- Check files exist
- Run listed tests
- Verify against acceptance criteria
- Document gaps using Audit Template
- Create new tasks for missing work
- Update TODO.md with accurate status
This protocol exists because agents lie (unintentionally). The system catches the lies. Parallel execution makes them lie faster, so we verify more. Evidence requirements ensure lies are caught before archiving.