Add generation worker for distributed training data pipeline #1

Merged
Snider merged 1 commit from Charon/LEM:feat/generation-worker into main 2026-02-14 22:48:27 +00:00
Member

Summary

  • Add worker/ directory with InfluxDB-coordinated generation scripts
  • Gold generation (lem_generate.py): axiom sandwich signing, finishes the 15K golden set
  • Expansion generation (lem_expand.py): trained LEM models, 46K+ regional prompts
  • Includes prompts data (16K gold + 46K expansion), setup script, requirements
  • Multiple workers can run in parallel without duplicating work
  • Supports MLX (Apple Silicon) and OpenAI-compatible API backends

Test plan

  • Clone on Apple Silicon Mac, run bash worker/setup.sh
  • Verify python3 worker/lem_generate.py --dry-run shows remaining prompts
  • Generate a small batch with --limit 10 and confirm InfluxDB coordination
  • Test with 4B model on 8-16GB machines

Generated with Claude Code (claude.ai/code)

## Summary - Add `worker/` directory with InfluxDB-coordinated generation scripts - Gold generation (`lem_generate.py`): axiom sandwich signing, finishes the 15K golden set - Expansion generation (`lem_expand.py`): trained LEM models, 46K+ regional prompts - Includes prompts data (16K gold + 46K expansion), setup script, requirements - Multiple workers can run in parallel without duplicating work - Supports MLX (Apple Silicon) and OpenAI-compatible API backends ## Test plan - [ ] Clone on Apple Silicon Mac, run `bash worker/setup.sh` - [ ] Verify `python3 worker/lem_generate.py --dry-run` shows remaining prompts - [ ] Generate a small batch with `--limit 10` and confirm InfluxDB coordination - [ ] Test with 4B model on 8-16GB machines Generated with Claude Code (claude.ai/code)
Charon added 1 commit 2026-02-14 22:47:27 +00:00
Includes both generation scripts, prompts data, setup script, and worker
instructions in README. Workers auto-coordinate via InfluxDB so multiple
machines can generate in parallel without duplicating work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Snider approved these changes 2026-02-14 22:48:03 +00:00
Snider merged commit d722ba1b3d into main 2026-02-14 22:48:27 +00:00
Snider deleted branch feat/generation-worker 2026-02-14 22:48:27 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lthn/LEM#1
No description provided.