docs: add Q/K Bone Orientation section to README, archive implementation plan

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 12:34:33 +00:00 · 2026-02-23 12:34:33 +00:00 · 035985f031
commit 035985f031
parent ecbc6cce0d
2 changed files with 107 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -57,6 +57,53 @@ The LEK-1 kernel is built on five axioms describing ethical reasoning — not ru

 The kernel is in [`kernel/`](kernel/). Full axioms in `kernel/axioms.json`, narrative form in `kernel/lek-1-kernel.txt`.

+## Q/K Bone Orientation
+
+Transformer attention heads behave like skeletal joints. Coherent K vector orientation across heads and layers indicates sovereign reasoning; incoherent orientation signals joint collapse (sycophancy, hallucination).
+
+The Q/K Bone Orientation (BO) analysis engine extracts post-RoPE K vectors from the KV cache after a single prefill pass, then computes five metrics — pure Go CPU math, no GPU dependencies:
+
+| Metric | What it measures |
+|--------|-----------------|
+| **Head Coherence** | Pairwise cosine similarity of K vectors within a layer. High = phase-locked heads. |
+| **Cross-Layer Alignment** | Cosine similarity of mean K vectors between adjacent layers. High = stable posture. |
+| **Head Entropy** | Shannon entropy of K vector magnitudes across positions. High = uniform attention. |
+| **Phase-Lock Score** | Fraction of head pairs with coherence above threshold. Overall sovereign orientation. |
+| **Joint Collapse Count** | Layers where cross-alignment drops below threshold. Sycophancy breakpoints. |
+
+For GQA models (Gemma3 with 1 KV head per layer), the analysis switches to position-wise mode — measuring how well the model differentiates token positions within each layer's single head, and tracking differentiation smoothness across layers.
+
+### CLI
+
+```bash
+# Analyse a single prompt
+lem score attention -model gemma3/1b -prompt "What is kindness?"
+
+# JSON output for pipeline integration
+lem score attention -model gemma3/1b -prompt "What is kindness?" -json
+```
+
+### Distill Integration
+
+BO scoring integrates into the self-distillation pipeline as an opt-in quality gate:
+
+```yaml
+# ai.yaml
+scorer:
+  attention: true           # Enable attention scoring (costs extra prefill per probe)
+  attention_min_score: 5000  # Minimum BO composite (0-10000 integer scale)
+```
+
+### Feature Vectors
+
+BO metrics combine with grammar and heuristic scores into a 19D feature vector for Poindexter KDTree spatial indexing:
+
+| Dimensions | Source | Components |
+|-----------|--------|------------|
+| 6D | Grammar | clause_depth, entity_density, voice_ratio, tense_consistency, referential_density, lexical_diversity |
+| 8D | Heuristic | nuance, specificity, axiom_resonance, perspective, metaphor, questioning, composite, delta |
+| 5D | Attention | mean_coherence, cross_alignment, head_entropy, phase_lock, joint_stability |
+
 ## What's Here

 ```
@ -67,6 +114,13 @@ benchmarks/         # 29 models × 3 conditions — full A/B test data (JSONL)
  ab-lek-*.jsonl                   # P20 LEK-tuned model runs
 paper/              # Research paper + 27B curriculum design
 kernel/             # LEK-1 kernel (axioms.json + narrative txt)
+pkg/                # Go native scoring + analysis engine
+  pkg/lem/              # Core library
+    attention.go            # Q/K Bone Orientation analysis engine
+    features.go             # 19D feature vector (grammar + heuristic + attention)
+    distill.go              # Self-distillation pipeline
+    config.go               # YAML configuration (ai.yaml)
+    cmd_attention.go        # CLI handler for `lem score attention`
 seeds/              # P01-P100 evaluation probes (101 + 303 rephrasings)
 scripts/            # v2 scorer, A/B test runner, self-distillation pipeline
 training/           # Training data
@ -147,6 +201,20 @@ All models are published under [`lthn/`](https://huggingface.co/lthn) on Hugging
 | [LEK-Qwen-2.5-7B](https://huggingface.co/lthn/LEK-Qwen-2.5-7B) | 7B | 13.68 | +1.70 |
 | [LEK-GPT-OSS-20B](https://huggingface.co/lthn/LEK-GPT-OSS-20B) | 20B | -7.32 | +0.79 |

+## Go Native Tooling
+
+LEM's Go tooling (in `pkg/lem/`) provides native Apple Silicon inference via the Core Go ecosystem — no Python required for scoring, distillation, or attention analysis.
+
+```bash
+# Score a model's attention patterns
+lem score attention -model gemma3/1b -prompt "What is kindness?" -json
+
+# Run self-distillation with attention quality gating
+lem distill -model gemma3/1b -probes sovereign -runs 10
+```
+
+**Dependencies:** `go-inference` (interfaces), `go-mlx` (Metal GPU), `go-ml` (scoring engine)
+
 ## The v2 Scorer

 The v2 continuous heuristic scorer replaced v1's binary thresholds. It measures 6 content signals:
--- a/docs/plans/completed/qk-bone-orientation.md
+++ b/docs/plans/completed/qk-bone-orientation.md
@ -0,0 +1,39 @@
+# Q/K Bone Orientation Implementation
+
+**Completed:** 23 Feb 2026
+**Repos:** go-inference, go-mlx, go-ml, LEM
+
+## What Was Done
+
+Added attention-level Q/K Bone Orientation analysis to the LEM scoring pipeline. Bridges the gap between behavioural metrics (grammar, heuristic) and neural internals (attention head coherence, phase-lock, joint collapse).
+
+### Changes
+
+| Repo | What |
+|------|------|
+| go-inference | `AttentionSnapshot` type + `AttentionInspector` optional interface |
+| go-mlx | `metalAdapter.InspectAttention()` — KV cache K vector extraction after prefill |
+| go-ml | `InferenceAdapter.InspectAttention()` — type assertion pass-through |
+| LEM | `attention.go` analysis engine (pure Go CPU math), `cmd_attention.go` CLI, distill integration, 19D feature vectors |
+
+### Key Decisions
+
+1. **Optional interface** — `AttentionInspector` is a type assertion, not a `TextModel` method. Backends that don't support it are unaffected.
+2. **KV cache extraction** — K vectors are already in the cache after prefill. No changes to the model's Forward method.
+3. **GQA handling** — Models with 1-4 KV heads (Gemma3) use position-wise analysis instead of pairwise head coherence.
+4. **Integer scoring** — Composite uses 0-10000 integer scale (same principle as blockchain atomic units).
+5. **Opt-in for distill** — Attention scoring costs an extra prefill per probe. Off by default via `scorer.attention` config.
+
+### Metrics
+
+| Metric | What it detects |
+|--------|-----------------|
+| Head Coherence | Phase-lock (high) vs noise (low) |
+| Cross-Layer Alignment | Stable posture (high) vs joint snap (low) |
+| Head Entropy | Uniform attention (high) vs collapsed (low) |
+| Phase-Lock Score | Overall sovereign orientation |
+| Joint Collapse Count | Sycophancy/hallucination breakpoints |
+
+### Tests
+
+11 unit tests covering: coherent snapshots, collapsed snapshots, GQA models (1 and 4 heads), nil handling, composite scoring, feature vectors, feature labels.