12 issue files documenting features needed to replace shell scripts: Claude Code hooks: - 001: core ai session (state management) - 002: core ai context (fact capture) - 003: core ai hook (command validation) - 004: core qa debug (debug statement detection) Data collection: - 005: core collect github (issues/PRs archive) - 006: core collect bitcointalk (forum threads) - 007: core collect market (CMC/CoinGecko) - 008: core collect papers (whitepapers) - 009: core collect excavate (project archaeology) - 010: core collect process (HTML→MD) - 011: core collect dispatch (event hooks) 000: Overview tracking issue These will be submitted to host-uk/core when rate limit resets. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1.5 KiB
1.5 KiB
feat(collect): Add collected data processing
Summary
Add core collect process command to convert collected HTML/JSON files into clean markdown.
Required Commands
core collect process <source> <downloads-dir> # Process downloaded files
core collect process bitcointalk ./downloads # BitcoinTalk HTML → MD
core collect process reddit ./downloads # Reddit JSON → MD
core collect process wayback ./downloads # Wayback HTML → MD
core collect process medium ./downloads # Medium RSS → MD
Current Shell Script Being Replaced
claude/skills/job-collector/process.sh- 243 lines of bash + embedded Python
Supported Sources
-
bitcointalk / btt
- Input: HTML pages
- Extract: posts, authors, dates
- Output: POST-NNNN.md files
-
reddit
- Input: JSON from Reddit API
- Extract: posts, comments, scores
- Output: REDDIT-NNNN.md files
-
wayback
- Input: HTML from Wayback Machine
- Extract: title, body text
- Output: {basename}.md files
-
medium
- Input: RSS/XML feed
- Extract: title, author, date, content
- Output: MEDIUM-NNNN.md files
Output Structure
processed/
├── INDEX.md
└── posts/
├── POST-0001.md
├── POST-0002.md
└── ...
Index Generation
Auto-generates INDEX.md with:
- Source metadata
- Post count
- Links to all posts
Output Format
{
"source": "bitcointalk",
"input_files": 15,
"posts_extracted": 347,
"output": "processed/"
}