agent/claude/issues/006-collect-bitcointalk.md
Snider beb24f71d2 docs: add feature request issues for core CLI migration
12 issue files documenting features needed to replace shell scripts:

Claude Code hooks:
- 001: core ai session (state management)
- 002: core ai context (fact capture)
- 003: core ai hook (command validation)
- 004: core qa debug (debug statement detection)

Data collection:
- 005: core collect github (issues/PRs archive)
- 006: core collect bitcointalk (forum threads)
- 007: core collect market (CMC/CoinGecko)
- 008: core collect papers (whitepapers)
- 009: core collect excavate (project archaeology)
- 010: core collect process (HTML→MD)
- 011: core collect dispatch (event hooks)

000: Overview tracking issue

These will be submitted to host-uk/core when rate limit resets.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 18:49:35 +00:00

64 lines
1.5 KiB
Markdown

# feat(collect): Add BitcoinTalk thread collection
## Summary
Add `core collect bitcointalk` command to archive BitcoinTalk forum threads.
## Required Commands
```bash
core collect bitcointalk <topic-id> # Collect full thread
core collect bitcointalk <url> # Collect from URL
core collect bitcointalk <id> --pages=5 # Limit pages
core collect bitcointalk <id> --output=DIR # Custom output dir
```
## Current Shell Script Being Replaced
- `claude/skills/bitcointalk/collect.sh` - 270 lines of bash + embedded Python
## Features
1. **Rate limiting**
- Respectful delay between requests (default 2s)
- Configurable via `--delay=N`
2. **Post type detection**
- ANN: Original announcement (post #1)
- UPDATE: Contains [UPDATE]/[RELEASE]/[ANNOUNCEMENT]
- QUESTION: Contains question mark in first 200 chars
- COMMUNITY: General discussion
3. **Output structure**
```
bitcointalk-{topic}/
├── INDEX.md
├── pages/
│ ├── page-0.html
│ └── page-20.html
└── posts/
├── POST-0001.md
└── POST-0002.md
```
4. **Post metadata**
- Author
- Date
- Post type/score
- Original content
5. **Incremental collection**
- Resume interrupted collections
- Skip already-fetched pages
## Output Format
```json
{
"topic_id": "2769739",
"title": "Lethean - Privacy Blockchain VPN",
"posts": 1247,
"pages": 63,
"output": "bitcointalk-2769739/"
}
```