NeRF-inspired technique for learning relational dynamics of language.
Not what words mean, but how they behave together — rhythm, pacing,
punctuation patterns, style transitions.
v1: positional field over text (baseline, memorises)
v2: masked feature prediction (relational, actually works)
Trained on Wodehouse "My Man Jeeves" (public domain, Gutenberg).
All 11 style features are highly relational — the field learns that
Wodehouse's style is a tightly coupled system.
Key finding: style interpolation between narrative and dialogue
produces sensible predictions for unmeasured features, suggesting
the continuous field captures real structural patterns.
Co-Authored-By: Virgil <virgil@lethean.io>
The emotional register scorer only matched positive/neutral emotions
(joy, compassion, tender, etc.) and completely missed negative human
expressions (angry, furious, devastated, terrified, bleeding, screaming).
This caused a real Reddit AITA post about a distressed mother to score
emotional_register=1 despite containing "screaming in pain", "pooping
blood", and "blind rage", leading to a false ai_generated verdict.
Changes:
- Add 4 new pattern groups: distress/anger, sadness/despair, fear/anxiety,
physical distress (~40 new vocabulary words)
- Switch from int count to weighted float64 scoring — intensity groups
(vulnerability, distress, physical) score 1.5-2.0x per match vs 1.0x
for common emotion words
- Round to 1 decimal place, cap at 10.0
- Update tests with distress/anger/physical cases including the Reddit
failure case from calibration findings
Co-Authored-By: Virgil <virgil@lethean.io>
Export distill_results from DuckDB back to compressed JSONL.zst files,
completing the cold -> warm -> cold round-trip data pipeline.
Co-Authored-By: Virgil <virgil@lethean.io>
Register setup group with data subcommand that hydrates cold
compressed JSONL.zst training data into warm DuckDB tables.
Co-Authored-By: Virgil <virgil@lethean.io>
RunSetup decompresses .jsonl.zst training data into DuckDB tables
(training_examples, seeds, probes, distill_results) and optionally
backfills InfluxDB with aggregate stats.
Co-Authored-By: Virgil <virgil@lethean.io>
Add compressFileZstd, decompressZstd, and walkZstFiles helpers
using klauspost/compress. Promote zstd from indirect to direct dep.
Co-Authored-By: Virgil <virgil@lethean.io>