go-container/specs/TASK_PROTOCOL.md

705 lines
19 KiB
Markdown
Raw Permalink Normal View History

# Host Hub Task Protocol
**Version:** 2.1
**Created:** 2026-01-01
**Updated:** 2026-01-16
**Purpose:** Ensure agent work is verified before being marked complete, and provide patterns for efficient parallel implementation.
> **Lesson learned (Jan 2026):** Task files written as checklists without implementation evidence led to 6+ "complete" tasks that were actually 70-85% done. Planning ≠ implementation. Evidence required.
---
## The Problem
Agents optimise for conversation completion, not task completion. Saying "done" is computationally cheaper than doing the work. Context compaction loses task state. Nobody verifies output against spec.
## The Solution
Separation of concerns:
1. **Planning Agent** — writes the spec
2. **Implementation Agent** — does the work
3. **Verification Agent** — checks the work against spec
4. **Human** — approves or rejects based on verification
---
## Directory Structure
```
doc/
├── TASK_PROTOCOL.md # This file
└── ... # Reference documentation
tasks/
├── TODO.md # Active task summary
├── TASK-XXX-feature.md # Active task specs
├── agentic-tasks/ # Agentic system tasks
└── future-products/ # Parked product plans
archive/
├── released/ # Completed tasks (for reference)
└── ... # Historical snapshots
```
---
## Task File Schema
Every task file follows this structure:
```markdown
# TASK-XXX: [Short Title]
**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Assignee:** [agent session or human]
**Verifier:** [different agent session]
---
## Objective
[One paragraph: what does "done" look like?]
---
## Acceptance Criteria
- [ ] AC1: [Specific, verifiable condition]
- [ ] AC2: [Specific, verifiable condition]
- [ ] AC3: [Specific, verifiable condition]
Each criterion must be:
- Binary (yes/no, not "mostly")
- Verifiable by code inspection or test
- Independent (can check without context)
---
## Implementation Checklist
- [ ] File: `path/to/file.php` — [what it should contain]
- [ ] File: `path/to/other.php` — [what it should contain]
- [ ] Test: `tests/Feature/XxxTest.php` passes
- [ ] Migration: runs without error
---
## Verification Results
### Check 1: [Date] by [Agent]
| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ❌ FAIL | Missing method Y in class Z |
| AC3 | ⚠️ PARTIAL | 3 of 5 tests pass |
**Verdict:** FAIL — AC2 not met
### Check 2: [Date] by [Agent]
| Criterion | Status | Evidence |
|-----------|--------|----------|
| AC1 | ✅ PASS | File exists at path, contains X |
| AC2 | ✅ PASS | Method Y added, verified |
| AC3 | ✅ PASS | All 5 tests pass |
**Verdict:** PASS — ready for human approval
---
## Notes
[Any context, blockers, decisions made during implementation]
```
---
## Implementation Evidence (Required)
**A checklist is not evidence. Prove the work exists.**
Every completed phase MUST include:
### 1. Git Evidence
```markdown
**Commits:**
- `abc123` - Add Domain model and migration
- `def456` - Add DomainController with CRUD
- `ghi789` - Add 28 domain tests
```
### 2. Test Count
```markdown
**Tests:** 28 passing (run: `php artisan test app/Mod/Bio/Tests/Feature/DomainTest.php`)
```
### 3. File Manifest
```markdown
**Files created/modified:**
- `app/Mod/Bio/Models/Domain.php` (new)
- `app/Mod/Bio/Http/Controllers/DomainController.php` (new)
- `database/migrations/2026_01_16_create_domains_table.php` (new)
- `app/Mod/Bio/Tests/Feature/DomainTest.php` (new)
```
### 4. "What Was Built" Summary
```markdown
**Summary:** Custom domain management with DNS verification. Users can add domains,
system generates TXT record for verification, background job checks DNS propagation.
Includes SSL provisioning via Caddy API.
```
### Why This Matters
In Jan 2026, an audit found:
- Commerce Matrix Plan marked "95% done" was actually 75%
- Internal WAF section was skipped entirely (extracted to Core Bouncer)
- Warehouse/fulfillment (6 features) listed as "one item" in TODO
- Task files read like planning documents, not completion logs
**Without evidence, "done" means nothing.**
---
## Workflow
### 1. Task Creation
Human or planning agent creates task file in `tasks/`:
- Status: `draft`
- Must have clear acceptance criteria
- Must have implementation checklist
### 2. Task Ready
Human reviews and sets:
- Status: `ready`
- Assignee: `next available agent`
### 3. Implementation
Implementation agent:
- Sets status: `in_progress`
- Works through implementation checklist
- Checks boxes as work is done
- When complete, sets status: `needs_verification`
- **MUST NOT** mark acceptance criteria as passed
### 4. Verification
Different agent (verification agent):
- Reads the task file
- Independently checks each acceptance criterion
- Records evidence in Verification Results section
- Sets verdict: PASS or FAIL
- If PASS: status → `verified`, move to `archive/released/`
- If FAIL: status → `in_progress`, back to implementation agent
### 5. Human Approval
Human reviews verified task:
- Spot-check the evidence
- If satisfied: status → `approved`, can delete or keep in archive
- If not: back to `needs_verification` with notes
---
## Agent Instructions
### For Implementation Agents
```
You are implementing TASK-XXX.
1. Read the full task file
2. Set status to "in_progress"
3. Work through the implementation checklist
4. Check boxes ONLY for work you have completed
5. When done, set status to "needs_verification"
6. DO NOT check acceptance criteria boxes
7. DO NOT mark the task as complete
8. Update "Last Updated" with current timestamp
Your job is to do the work, not to verify it.
```
### For Verification Agents
```
You are verifying TASK-XXX.
1. Read the full task file
2. For EACH acceptance criterion:
a. Check the codebase independently
b. Record what you found (file paths, line numbers, test output)
c. Mark as PASS, FAIL, or PARTIAL with evidence
3. Add a new "Verification Results" section with today's date
4. Set verdict: PASS or FAIL
5. If PASS: move file to archive/released/
6. If FAIL: set status back to "in_progress"
7. Update "Last Updated" with current timestamp
You are the gatekeeper. Be thorough. Trust nothing the implementation agent said.
```
---
## Status Flow
```
draft → ready → in_progress → needs_verification → verified → approved
↑ │
└────────────────────┘
(if verification fails)
```
---
## Phase-Based Decomposition
Large tasks should be decomposed into independent phases that can be executed in parallel by multiple agents. This dramatically reduces implementation time.
### Phase Independence Rules
1. **No shared state** — Each phase writes to different files/tables
2. **No blocking dependencies** — Phase 3 shouldn't wait for Phase 2's output
3. **Clear boundaries** — Each phase has its own acceptance criteria
4. **Testable isolation** — Phase tests don't require other phases
### Example Decomposition
A feature like "BioHost Missing Features" might decompose into:
| Phase | Focus | Can Parallel With |
|-------|-------|-------------------|
| 1 | Domain Management | 2, 3, 4 |
| 2 | Project System | 1, 3, 4 |
| 3 | Analytics Core | 1, 2, 4 |
| 4 | Form Submissions | 1, 2, 3 |
| 5 | Link Scheduling | 1, 2, 3, 4 |
| ... | ... | ... |
| 12 | MCP Tools (polish) | After 1-11 |
| 13 | Admin UI (polish) | After 1-11 |
### Phase Sizing
- **Target**: 4-8 acceptance criteria per phase
- **Estimated time**: 2-4 hours per phase
- **Test count**: 15-40 tests per phase
- **File count**: 3-10 files modified per phase
---
## Standard Phase Types
Every large task should include these phase types:
### Core Implementation Phases (1-N)
The main feature work. Group by:
- **Resource type** (domains, projects, analytics)
- **Functional area** (CRUD, scheduling, notifications)
- **Data flow** (input, processing, output)
### Polish Phase: MCP Tools
**Always include as second-to-last phase.**
Exposes all implemented features to AI agents via MCP protocol.
Standard acceptance criteria:
- [ ] MCP tool class exists at `app/Mcp/Tools/{Feature}Tools.php`
- [ ] All CRUD operations exposed as actions
- [ ] Tool includes prompts for common workflows
- [ ] Tool includes resources for data access
- [ ] Tests verify all MCP actions return expected responses
- [ ] Tool registered in MCP service provider
### Polish Phase: Admin UI Integration
**Always include as final phase.**
Integrates features into the admin dashboard.
Standard acceptance criteria:
- [ ] Sidebar navigation updated with feature section
- [ ] Index/list page with filtering and search
- [ ] Detail/edit pages for resources
- [ ] Bulk actions where appropriate
- [ ] Breadcrumb navigation
- [ ] Role-based access control
- [ ] Tests verify all admin routes respond correctly
---
## Parallel Agent Execution
### Firing Multiple Agents
When phases are independent, fire agents simultaneously:
```
Human: "Implement phases 1-4 in parallel"
Agent fires 4 Task tools simultaneously:
- Task(Phase 1: Domain Management)
- Task(Phase 2: Project System)
- Task(Phase 3: Analytics Core)
- Task(Phase 4: Form Submissions)
```
### Agent Prompt Template
```
You are implementing Phase X of TASK-XXX: [Task Title]
Read the task file at: tasks/TASK-XXX-feature-name.md
Your phase covers acceptance criteria ACxx through ACyy.
Implementation requirements:
1. Create all files listed in the Phase X implementation checklist
2. Write comprehensive Pest tests (target: 20-40 tests)
3. Follow existing codebase patterns
4. Use workspace-scoped multi-tenancy
5. Check entitlements for tier-gated features
When complete:
1. Update the task file marking Phase X checklist items done
2. Report: files created, test count, any blockers
Do NOT mark acceptance criteria as passed — verification agent does that.
```
### Coordination Rules
1. **Linter accepts all** — Configure to auto-accept agent file modifications
2. **No merge conflicts** — Phases write to different files
3. **Collect results** — Wait for all agents, then fire next wave
4. **Wave pattern** — Group dependent phases into waves
### Wave Execution Example
```
Wave 1 (parallel): Phases 1, 2, 3, 4
↓ (all complete)
Wave 2 (parallel): Phases 5, 6, 7, 8
↓ (all complete)
Wave 3 (parallel): Phases 9, 10, 11
↓ (all complete)
Wave 4 (sequential): Phase 12 (MCP), then Phase 13 (UI)
```
---
## Task File Schema (Extended)
For large phased tasks, extend the schema:
```markdown
# TASK-XXX: [Feature Name]
**Status:** draft | ready | in_progress | needs_verification | verified | approved
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD HH:MM by [agent/human]
**Complexity:** small (1-3 phases) | medium (4-8 phases) | large (9+ phases)
**Estimated Phases:** N
**Completed Phases:** M/N
---
## Objective
[One paragraph: what does "done" look like?]
---
## Scope
- **Models:** X new, Y modified
- **Migrations:** Z new tables
- **Livewire Components:** A new
- **Tests:** B target test count
- **Estimated Hours:** C-D hours
---
## Phase Overview
| Phase | Name | Status | ACs | Tests |
|-------|------|--------|-----|-------|
| 1 | Domain Management | ✅ Done | AC1-5 | 28 |
| 2 | Project System | ✅ Done | AC6-10 | 32 |
| 3 | Analytics Core | 🔄 In Progress | AC11-16 | - |
| ... | ... | ... | ... | ... |
| 12 | MCP Tools | ⏳ Pending | AC47-53 | - |
| 13 | Admin UI | ⏳ Pending | AC54-61 | - |
---
## Acceptance Criteria
### Phase 1: Domain Management
- [ ] AC1: [Criterion]
- [ ] AC2: [Criterion]
...
### Phase 12: MCP Tools (Standard)
- [ ] AC47: MCP tool class exists with all feature actions
- [ ] AC48: CRUD operations for all resources exposed
- [ ] AC49: Bulk operations exposed (where applicable)
- [ ] AC50: Query/filter operations exposed
- [ ] AC51: MCP prompts created for common workflows
- [ ] AC52: MCP resources expose read-only data access
- [ ] AC53: Tests verify all MCP actions
### Phase 13: Admin UI Integration (Standard)
- [ ] AC54: Sidebar updated with feature navigation
- [ ] AC55: Feature has expandable submenu (if 3+ pages)
- [ ] AC56: Index pages with DataTable/filtering
- [ ] AC57: Create/Edit forms with validation
- [ ] AC58: Detail views with related data
- [ ] AC59: Bulk action support
- [ ] AC60: Breadcrumb navigation
- [ ] AC61: Role-based visibility
---
## Implementation Checklist
### Phase 1: Domain Management
- [ ] File: `app/Models/...`
- [ ] File: `app/Livewire/...`
- [ ] Test: `tests/Feature/...`
### Phase 12: MCP Tools
- [ ] File: `app/Mcp/Tools/{Feature}Tools.php`
- [ ] File: `app/Mcp/Prompts/{Feature}Prompts.php` (optional)
- [ ] File: `app/Mcp/Resources/{Feature}Resources.php` (optional)
- [ ] Test: `tests/Feature/Mcp/{Feature}ToolsTest.php`
### Phase 13: Admin UI
- [ ] File: `resources/views/admin/components/sidebar.blade.php` (update)
- [ ] File: `app/Livewire/Admin/{Feature}/Index.php`
- [ ] File: `resources/views/livewire/admin/{feature}/index.blade.php`
- [ ] Test: `tests/Feature/Admin/{Feature}Test.php`
---
## Verification Results
[Same as before]
---
## Phase Completion Log
### Phase 1: Domain Management
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 28 passing
**Files:** 8 created/modified
**Notes:** [Any context]
### Phase 2: Project System
**Completed:** YYYY-MM-DD by [Agent ID]
**Tests:** 32 passing
...
```
---
## MCP Endpoint (Future)
When implemented, the MCP endpoint will expose:
```
GET /tasks # List all tasks with status
GET /tasks/{id} # Get task details
POST /tasks/{id}/claim # Agent claims a task
POST /tasks/{id}/complete # Agent marks ready for verification
POST /tasks/{id}/verify # Verification agent submits results
GET /tasks/next # Get next unclaimed task
GET /tasks/verify-queue # Get tasks needing verification
POST /tasks/{id}/phases/{n}/claim # Claim specific phase
POST /tasks/{id}/phases/{n}/complete # Complete specific phase
GET /tasks/{id}/phases # List phase status
```
---
## Metrics to Track
- Tasks created vs completed (per week)
- Verification pass rate on first attempt
- Average time from ready → approved
- Most common failure reasons
---
## Cross-Cutting Concerns
When a feature applies to multiple modules, extract it.
### Example: Core Bouncer
The Commerce Matrix Plan included an "Internal WAF" section — a request whitelisting system with training mode. During audit, we realised:
- It's not commerce-specific
- It applies to all admin routes, all API endpoints
- It should be in `Core/`, not `Commerce/`
**Action:** Extracted to `CORE_BOUNCER_PLAN.md` as a framework-level concern.
### Signs to Extract
- Feature name doesn't include the module name naturally
- You'd copy-paste it to other modules
- It's about infrastructure, not business logic
- Multiple modules would benefit independently
### How to Extract
1. Create new task file for the cross-cutting concern
2. Add note to original plan: `> **EXTRACTED:** Section moved to X`
3. Update TODO.md with the new task
4. Don't delete from original — leave the note for context
---
## Retrospective Audits
Periodically audit archived tasks against actual implementation.
### When to Audit
- Before starting dependent work
- When resuming a project after a break
- When something "complete" seems broken
- Monthly for active projects
### Audit Process
1. Read the archived task file
2. Check each acceptance criterion against codebase
3. Run the tests mentioned in the task
4. Document gaps found
### Audit Template
```markdown
## Audit: TASK-XXX
**Date:** YYYY-MM-DD
**Auditor:** [human/agent]
| Claimed | Actual | Gap |
|---------|--------|-----|
| Phase 1 complete | ✅ Verified | None |
| Phase 2 complete | ⚠️ Partial | Missing X service |
| Phase 3 complete | ❌ Not done | Only stubs exist |
**Action items:**
- [ ] Create TASK-YYY for Phase 2 gap
- [ ] Move Phase 3 back to TODO as incomplete
```
---
## Anti-Patterns to Avoid
### General
1. **Same agent implements and verifies** — defeats the purpose
2. **Vague acceptance criteria** — "it works" is not verifiable
3. **Skipping verification** — the whole point is independent checking
4. **Bulk marking as done** — verify one task at a time
5. **Human approving without spot-check** — trust but verify
### Evidence & Documentation
6. **Checklist without evidence** — planning ≠ implementation
7. **Skipping "What Was Built" summary** — context lost on compaction
8. **No test count** — can't verify without knowing what to run
9. **Marking section "done" without implementation** — major gaps discovered in audits
10. **Vague TODO items** — "Warehouse system" hides 6 distinct features
### Parallel Execution
11. **Phases with shared files** — causes merge conflicts
12. **Sequential dependencies in same wave** — blocks parallelism
13. **Skipping polish phases** — features hidden from agents and admins
14. **Too many phases per wave** — diminishing returns past 4-5 agents
15. **No wave boundaries** — chaos when phases actually do depend
### MCP Tools
16. **Exposing without testing** — broken tools waste agent time
17. **Missing bulk operations** — agents do N calls instead of 1
18. **No error context** — agents can't debug failures
### Admin UI
19. **Flat navigation for large features** — use expandable submenus
20. **Missing breadcrumbs** — users get lost
21. **No bulk actions** — tedious admin experience
### Cross-Cutting Concerns
22. **Burying framework features in module plans** — extract them
23. **Assuming module-specific when it's not** — ask "would other modules need this?"
---
## Quick Reference: Creating a New Task
1. Copy the extended schema template
2. Fill in objective and scope
3. Decompose into phases (aim for 4-8 ACs each)
4. Map phase dependencies → wave structure
5. Check for cross-cutting concerns — extract if needed
6. **Always add Phase N-1: MCP Tools**
7. **Always add Phase N: Admin UI Integration**
8. Set status to `draft`, get human review
9. When `ready`, fire Wave 1 agents in parallel
10. Collect results with evidence (commits, tests, files)
11. Fire next wave
12. After all phases, run verification agent
13. Human approval → move to `archive/released/`
---
## Quick Reference: Completing a Phase
1. Do the work
2. Run the tests
3. Record evidence:
- Git commits (hashes + messages)
- Test count and command to run them
- Files created/modified
- "What Was Built" summary (2-3 sentences)
4. Update task file with Phase Completion Log entry
5. Set phase status to ✅ Done
6. Move to next phase or request verification
---
## Quick Reference: Auditing Archived Work
1. Read `archive/released/` task file
2. For each phase marked complete:
- Check files exist
- Run listed tests
- Verify against acceptance criteria
3. Document gaps using Audit Template
4. Create new tasks for missing work
5. Update TODO.md with accurate status
---
*This protocol exists because agents lie (unintentionally). The system catches the lies. Parallel execution makes them lie faster, so we verify more. Evidence requirements ensure lies are caught before archiving.*