Add HuggingFace model cards, sync script, and Parquet export #2

Merged
Snider merged 1 commit from Charon/LEM:feat/hf-sync into main 2026-02-15 00:16:00 +00:00
Member

Summary

  • Add 4 missing model cards (Gemma3-1B-layered v1+v2, Gemma3-27B, GPT-OSS-20B)
  • All 9 HuggingFace models now have cards in paper/hf-cards/
  • scripts/sync_hf.py — push cards, benchmarks, and training data to HuggingFace
  • scripts/export_parquet.py — convert JSONL training splits to Parquet format

HF Sync Usage

pip install huggingface_hub pyarrow
huggingface-cli login

# Sync all model cards to HuggingFace
python3 scripts/sync_hf.py --dry-run
python3 scripts/sync_hf.py

# Export training data as Parquet
python3 scripts/export_parquet.py

# Sync everything (cards + benchmarks + training Parquet)
python3 scripts/sync_hf.py --all

Parquet Schema

Column Type Description
prompt string User message
response string Assistant message
system string System prompt (if any)
messages string (JSON) Full chat format for training

Generated with Claude Code (claude.ai/code)

## Summary - Add 4 missing model cards (Gemma3-1B-layered v1+v2, Gemma3-27B, GPT-OSS-20B) - All 9 HuggingFace models now have cards in `paper/hf-cards/` - `scripts/sync_hf.py` — push cards, benchmarks, and training data to HuggingFace - `scripts/export_parquet.py` — convert JSONL training splits to Parquet format ## HF Sync Usage ```bash pip install huggingface_hub pyarrow huggingface-cli login # Sync all model cards to HuggingFace python3 scripts/sync_hf.py --dry-run python3 scripts/sync_hf.py # Export training data as Parquet python3 scripts/export_parquet.py # Sync everything (cards + benchmarks + training Parquet) python3 scripts/sync_hf.py --all ``` ## Parquet Schema | Column | Type | Description | |--------|------|-------------| | prompt | string | User message | | response | string | Assistant message | | system | string | System prompt (if any) | | messages | string (JSON) | Full chat format for training | Generated with Claude Code (claude.ai/code)
Charon added 1 commit 2026-02-14 23:50:39 +00:00
- Add 4 missing model cards: Gemma3-1B-layered (v1+v2), Gemma3-27B, GPT-OSS-20B
- All 9 HF models now have cards in paper/hf-cards/
- sync_hf.py: push cards + benchmarks + training data to HuggingFace
- export_parquet.py: convert JSONL training splits to Parquet (HF dataset format)
- Parquet schema: prompt, response, system, messages (JSON)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Snider force-pushed feat/hf-sync from b8f9191b05 to 2df0044ad9 2026-02-15 00:14:28 +00:00 Compare
Snider approved these changes 2026-02-15 00:15:53 +00:00
Snider merged commit 9138eb0a61 into main 2026-02-15 00:16:00 +00:00
Snider deleted branch feat/hf-sync 2026-02-15 00:16:00 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lthn/LEM#2
No description provided.