Commit graph

9 commits

Author SHA1 Message Date
Snider
5fb98dcedd docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:01:55 +00:00
Snider
0c496a0a17 docs: add 1B pre-sort pipeline implementation plan
5-task TDD plan: dependency setup, types+mapper, ClassifyCorpus with
mock tests, integration test with real model, docs update.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:59:41 +00:00
Snider
c028c81c13 docs: add 1B pre-sort pipeline design
Streaming batch classification via go-inference Classify() API.
Package-level ClassifyCorpus() function with configurable batch size,
prompt template, and mock-friendly TextModel interface.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:57:01 +00:00
Snider
24e60104d1 feat: add grammar table spec and French locale
Document full JSON schema for gram.* keys in docs/grammar-table-spec.md.
Add French grammar tables (50 verbs, 24 gendered nouns, signals).
Extend loader to parse article by_gender map. Completes Phase 3.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 17:57:22 +00:00
Snider
d7fc2cda7d docs(reversal): add dual-class disambiguation implementation plan
10-task TDD plan: SignalData loading, dual-class entries, Token
confidence fields, TokeniserOption API, two-pass Tokenise with
7-signal scoring, imprint confidence weighting, multiplier compat,
round-trip tests, and documentation updates.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:38:38 +00:00
Snider
3653383889 docs(reversal): add dual-class word disambiguation design
Multi-signal probabilistic disambiguation with two-pass tokenisation.
Seven weighted signals resolve verb/noun ambiguity for words like
"commit", "run", "test", "check", "file", "build". Confidence scores
flow into imprints for the scoring/comprehension use case.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:34:54 +00:00
Claude
20ab172f5b
docs: add go-i18n reversal + go-html combined design
Bottom-up approach: grammar reversal (Layers 1-2) first,
then go-html HLCRF rendering on top. Both modules share
grammar tables and compose into the same binary.

Phase 1: go-i18n/reversal/ (tokeniser + imprint + multiplier)
Phase 2: go-html (HLCRF parser + Flexy-heritage rendering)
Phase 3: Integration + WASM
Phase 4: CoreDeno + Web Components

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 22:45:18 +00:00
Claude
5877431286
docs: add training data multiplier use case
Grammar engine as zero-cost data augmentation: tense/number/formality
flips across 88K seeds = 528K+ verified training examples with no API
spend. Reversal engine provides automatic QA on transformed variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:07:26 +00:00
Claude
811e8f7502
docs: grammar reversal engine — linguistic hash function concept
Captures the bidirectional grammar engine idea: using go-i18n tables
in reverse as a deterministic parser to extract semantic imprints from
documents without retaining content. Covers TIM/DataNode architecture,
88K seed calibration, Poindexter integration, and privacy properties.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:11:22 +00:00