1
0
Fork 0
forked from lthn/LEM

feat: WoRF — Word Radiance Field experiments

NeRF-inspired technique for learning relational dynamics of language.
Not what words mean, but how they behave together — rhythm, pacing,
punctuation patterns, style transitions.

v1: positional field over text (baseline, memorises)
v2: masked feature prediction (relational, actually works)

Trained on Wodehouse "My Man Jeeves" (public domain, Gutenberg).
All 11 style features are highly relational — the field learns that
Wodehouse's style is a tightly coupled system.

Key finding: style interpolation between narrative and dialogue
produces sensible predictions for unmeasured features, suggesting
the continuous field captures real structural patterns.

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Snider 2026-03-04 09:43:38 +00:00
parent 41d8008e69
commit f79eaabdce
6 changed files with 20480 additions and 0 deletions

View file

@ -0,0 +1,162 @@
# WoRF — Word Radiance Fields
> **Status**: Experimental proof-of-concept (4 Mar 2026)
> **Licence**: EUPL-1.2
## What This Is
WoRF (Word Radiance Field) is a technique inspired by NeRF (Neural Radiance
Fields) for learning the **relational dynamics** of language from text.
Not what words mean — how they behave together. The pauses, the rhythm,
the texture. The stuff current token embeddings lose entirely.
A WoRF learns a continuous field over stylistic features extracted from
text. You can query the field to understand how style dimensions relate
to each other within a body of writing. The goal: teach models not WHAT
to say, but HOW to say it.
## Origin
The idea comes from a simple observation: current LLMs start with a single
flat embedding per token and rely on transformer layers to reconstruct
all the relational richness of language. That works for content, but
loses performance — timing, rhythm, word-choice patterns, deliberate
silences. The "gooey" stuff.
NeRF's core trick is: given sparse discrete observations, learn a
continuous function you can query at any point. A page of text is a
sparse observation of language relationships. A book is a scene.
The WoRF is the learned field.
## How It Works
### Feature Extraction
Each chunk of text (~300 words) is measured across 11 stylistic dimensions:
| Feature | What It Captures |
|---------|-----------------|
| avg_word_length | Vocabulary complexity |
| avg_sentence_length | Pacing |
| sentence_length_variance | Rhythm variation |
| dialogue_ratio | Conversation density |
| vocabulary_richness | Unique word usage |
| dash_density | Parenthetical style (asides, interjections) |
| exclamation_density | Emotional intensity |
| question_density | Interrogative patterns |
| short_sentence_ratio | Punchiness |
| aside_density | Digression patterns |
| avg_punct_per_sentence | Structural complexity |
### Architecture (v2 — Masked Feature Prediction)
Instead of mapping text position to features (v1, just memorises),
v2 uses the features themselves as coordinates:
```
Input: 11 features with one masked (zeroed + flag)
Each feature gets sinusoidal positional encoding (6 frequencies)
Output: Predicted values for all 11 features
Loss: MSE between predicted and actual
```
The network learns: "given THESE style characteristics, what must the
missing one be?" Each masking angle is like a different camera view
in NeRF — it reveals a different relationship in the field.
Architecture: 6-layer MLP, 256 hidden dim, GELU activations, dropout
at midpoint. AdamW with cosine annealing. ~4000 epochs.
### What the Field Reveals
Trained on Wodehouse's "My Man Jeeves" (169 chunks, 50K words):
**Every feature is highly relational** — none are independent. The
field can predict any feature from the other 10 with near-zero error.
This means Wodehouse's style is a tightly coupled system, not random.
**Key relationships discovered:**
- `aside_density``avg_punct_per_sentence` (+0.32) — his parenthetical
asides ARE the signature style
- `short_sentence_ratio``exclamation_density` (+0.16) — punchy
sentences come with Bertie's exclamations ("What!" / "Ripping!")
- `avg_sentence_length``short_sentence_ratio` (-0.29) — long
sentences = exposition, short = dialogue reactions
- `sentence_length_variance``avg_punct_per_sentence` (+0.15) —
varied rhythm = more structural punctuation
**Style interpolation works:** Walking from narrative to dialogue,
the field correctly predicts question density rises 4x, punctuation
per sentence drops, exclamations increase. Not memorisation — the
field understands style transitions.
## What This Is For
### Near-term: Training Data Quality
WoRF features could score training corpus quality — not for correctness
but for **stylistic consistency and richness**. A chunk that doesn't
fit the field = low quality or genre mismatch.
### Medium-term: EN-GB Language Pack
Feed many public domain books through WoRF to build a style field for
"native English." The field captures how English actually flows across
authors, genres, eras. Use it as auxiliary training signal — not what
the model says, but whether it sounds like real English.
### Long-term: Style-Aware Generation
Query the WoRF during generation to guide style. "Write this with
Wodehouse's rhythm" = constrain the output to the region of style
space that Wodehouse occupies. Different from fine-tuning — it's a
continuous field you can blend and interpolate.
## Relationship to LEM
WoRF connects to existing LEM work:
- **go-i18n grammar engine** — the 19D/24D scoring dimensions could
serve as WoRF "viewing angles" (the directional component NeRF uses)
- **Poindexter** — spatial indexing via KD-Tree, already doing proximity
in embedding space. WoRF adds a style dimension to that space
- **Sandwich format** — WoRF features could become additional scoring
layers in the training curriculum
- **CL-BPL** (cymatic-linguistic back-propagation) — same wave
interference maths NeRF uses for reconstruction
## Files
```
tasks/worf.txt # Original Grok chat transcript (concept)
tasks/worf-experiment.md # Experiment notes
tasks/worf-experiment.py # v1: position → features (memorised, useful baseline)
tasks/worf-v2.py # v2: masked feature prediction (relational field)
tasks/worf-field-jeeves.json # v1 field data
tasks/worf-v2-relations.json # v2 influence matrix
tasks/pg-wood.txt # Source: My Man Jeeves (Gutenberg, public domain)
```
## Next Steps
1. Add more public domain books (Wilde, Austen, Twain, Poe) and see
if the field distinguishes authors or finds universal English patterns
2. Increase feature dimensions — add n-gram patterns, word frequency
distributions, clause structure
3. Connect to go-i18n scoring as "viewing angle" dimensions
4. Test as training data quality filter on existing LEM datasets
5. Explore whether the influence matrix itself is useful as a compact
style representation (11x11 = 121 numbers to describe an author)
## Easter Egg
WoRF is named after Commander Worf, but the real reference is Data's
"little life forms" song to Spot. The idea: a model that can understand
why Eckhart Tolle is funny without being prompted, because it learned
the pause is the punchline.
---
*EUPL-1.2 — Lethean Network*

7250
experiments/worf/pg-wood.txt Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,390 @@
#!/usr/bin/env python3
"""
WoRF Experiment Word Radiance Field
======================================
Feed Wodehouse's "My Man Jeeves" into a NeRF-like MLP and see what
the continuous field learns about writing style.
NeRF: (x, y, z, θ, φ) (r, g, b, σ)
WoRF: (position_in_text, chunk_context) (style_features)
First pass: 1D position style feature vector. No viewing angle yet.
Just see if a continuous field over text position learns anything.
"""
import re
import math
import json
import torch
import torch.nn as nn
import numpy as np
from pathlib import Path
from collections import Counter
# ---------------------------------------------------------------------------
# 1. Text Splitting — each "page" is one observation (like one photo for NeRF)
# ---------------------------------------------------------------------------
def load_and_clean(path: str) -> str:
"""Strip Gutenberg header/footer, return clean text."""
text = Path(path).read_text(encoding="utf-8")
# Strip PG header
start = text.find("LEAVE IT TO JEEVES")
if start == -1:
start = text.find("*** START OF")
start = text.find("\n", start) + 1
# Strip PG footer
end = text.find("*** END OF THE PROJECT GUTENBERG")
if end == -1:
end = len(text)
return text[start:end].strip()
def split_into_chunks(text: str, chunk_size: int = 500) -> list[str]:
"""Split text into roughly equal word-count chunks (pages)."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = " ".join(words[i:i + chunk_size])
if len(chunk.split()) > 50: # skip tiny trailing chunks
chunks.append(chunk)
return chunks
# ---------------------------------------------------------------------------
# 2. Feature Extraction — what we measure about each "page"
# ---------------------------------------------------------------------------
def extract_features(chunk: str) -> dict:
"""Extract stylistic features from a chunk of text.
These are the 'RGB + density' equivalent what the field predicts.
"""
words = chunk.split()
sentences = re.split(r'[.!?]+', chunk)
sentences = [s.strip() for s in sentences if s.strip()]
word_lengths = [len(w.strip(".,;:!?\"'()—-")) for w in words]
word_lengths = [l for l in word_lengths if l > 0]
# Dialogue detection
dialogue_chars = sum(1 for c in chunk if c == '"')
total_chars = len(chunk) or 1
# Punctuation patterns (Wodehouse loves dashes and exclamations)
dashes = chunk.count("") + chunk.count("--")
exclamations = chunk.count("!")
questions = chunk.count("?")
commas = chunk.count(",")
# Vocabulary richness (unique words / total words)
unique_words = len(set(w.lower().strip(".,;:!?\"'()—-") for w in words))
total_words = len(words) or 1
# Sentence length variation (std dev) — captures rhythm
sent_lengths = [len(s.split()) for s in sentences]
sent_mean = np.mean(sent_lengths) if sent_lengths else 0
sent_std = np.std(sent_lengths) if sent_lengths else 0
# Short sentence ratio (punchy lines like "Injudicious, sir.")
short_sentences = sum(1 for l in sent_lengths if l <= 5)
short_ratio = short_sentences / (len(sent_lengths) or 1)
# Aside/parenthetical density (commas, dashes per word)
aside_density = (commas + dashes) / total_words
return {
"avg_word_length": np.mean(word_lengths) if word_lengths else 0,
"avg_sentence_length": sent_mean,
"sentence_length_variance": sent_std,
"dialogue_ratio": dialogue_chars / total_chars,
"vocabulary_richness": unique_words / total_words,
"dash_density": dashes / total_words,
"exclamation_density": exclamations / total_words,
"question_density": questions / total_words,
"short_sentence_ratio": short_ratio,
"aside_density": aside_density,
"avg_punct_per_sentence": (commas + dashes + exclamations + questions) / (len(sent_lengths) or 1),
}
FEATURE_NAMES = list(extract_features("dummy text here for keys").keys())
NUM_FEATURES = len(FEATURE_NAMES)
# ---------------------------------------------------------------------------
# 3. Positional Encoding — NeRF's trick for capturing high-frequency detail
# ---------------------------------------------------------------------------
def positional_encoding(x: torch.Tensor, num_frequencies: int = 10) -> torch.Tensor:
"""NeRF-style sinusoidal positional encoding.
Maps a scalar position into a higher-dimensional space so the MLP
can learn sharp transitions (same reason NeRF needs it for edges).
"""
encodings = [x]
for freq in range(num_frequencies):
encodings.append(torch.sin(2.0 ** freq * math.pi * x))
encodings.append(torch.cos(2.0 ** freq * math.pi * x))
return torch.cat(encodings, dim=-1)
# ---------------------------------------------------------------------------
# 4. The WoRF Network — tiny MLP, same architecture as vanilla NeRF
# ---------------------------------------------------------------------------
class WoRF(nn.Module):
"""Word Radiance Field — learns a continuous style field over text position."""
def __init__(self, input_dim: int, hidden_dim: int = 128, num_layers: int = 4,
output_dim: int = NUM_FEATURES):
super().__init__()
layers = []
layers.append(nn.Linear(input_dim, hidden_dim))
layers.append(nn.ReLU())
for i in range(num_layers - 2):
layers.append(nn.Linear(hidden_dim, hidden_dim))
layers.append(nn.ReLU())
# Skip connection at midpoint (like NeRF)
if i == (num_layers - 2) // 2 - 1:
self.skip_layer_idx = len(layers)
layers.append(nn.Linear(hidden_dim, output_dim))
self.network = nn.Sequential(*layers)
self.input_dim = input_dim
self.hidden_dim = hidden_dim
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.network(x)
# ---------------------------------------------------------------------------
# 5. Training
# ---------------------------------------------------------------------------
def prepare_data(chunks: list[str]) -> tuple[torch.Tensor, torch.Tensor, dict]:
"""Convert chunks to training data: positions → features."""
n = len(chunks)
positions = []
features = []
for i, chunk in enumerate(chunks):
pos = i / (n - 1) # normalise to [0, 1]
feat = extract_features(chunk)
positions.append(pos)
features.append([feat[k] for k in FEATURE_NAMES])
positions = torch.tensor(positions, dtype=torch.float32).unsqueeze(-1)
features = torch.tensor(features, dtype=torch.float32)
# Normalise features to [0, 1] range for training stability
feat_min = features.min(dim=0).values
feat_max = features.max(dim=0).values
feat_range = feat_max - feat_min
feat_range[feat_range == 0] = 1.0 # avoid division by zero
features_norm = (features - feat_min) / feat_range
norm_stats = {"min": feat_min, "max": feat_max, "range": feat_range}
return positions, features_norm, norm_stats
def train_worf(positions: torch.Tensor, features: torch.Tensor,
num_frequencies: int = 10, epochs: int = 2000, lr: float = 1e-3):
"""Train the WoRF field."""
# Encode positions
encoded = positional_encoding(positions, num_frequencies)
input_dim = encoded.shape[-1]
model = WoRF(input_dim=input_dim, output_dim=features.shape[-1])
optimiser = torch.optim.Adam(model.parameters(), lr=lr)
loss_fn = nn.MSELoss()
print(f"\nTraining WoRF: {len(positions)} chunks, {input_dim}D input, {features.shape[-1]} features")
print(f"Positional encoding frequencies: {num_frequencies}")
print("-" * 60)
for epoch in range(epochs):
pred = model(encoded)
loss = loss_fn(pred, features)
optimiser.zero_grad()
loss.backward()
optimiser.step()
if epoch % 200 == 0 or epoch == epochs - 1:
print(f" Epoch {epoch:4d}/{epochs} Loss: {loss.item():.6f}")
return model, num_frequencies
# ---------------------------------------------------------------------------
# 6. Query the Field — the interesting bit
# ---------------------------------------------------------------------------
def query_field(model: WoRF, num_frequencies: int, num_points: int = 500) -> np.ndarray:
"""Query the learned field at many points, including between training samples."""
positions = torch.linspace(0, 1, num_points).unsqueeze(-1)
encoded = positional_encoding(positions, num_frequencies)
with torch.no_grad():
predictions = model(encoded).numpy()
return positions.squeeze().numpy(), predictions
def analyse_field(positions: np.ndarray, predictions: np.ndarray,
norm_stats: dict, chunks: list[str]):
"""Analyse what the field learned."""
print("\n" + "=" * 60)
print("FIELD ANALYSIS")
print("=" * 60)
# Denormalise for interpretability
feat_min = norm_stats["min"].numpy()
feat_range = norm_stats["range"].numpy()
predictions_real = predictions * feat_range + feat_min
# Find peaks and valleys for each feature
print("\nFeature dynamics across the book:")
print("-" * 60)
for i, name in enumerate(FEATURE_NAMES):
values = predictions_real[:, i]
peak_pos = positions[np.argmax(values)]
valley_pos = positions[np.argmin(values)]
mean_val = np.mean(values)
std_val = np.std(values)
dynamic_range = np.max(values) - np.min(values)
print(f" {name:30s} mean={mean_val:.4f} std={std_val:.4f} "
f"range={dynamic_range:.4f} peak@{peak_pos:.2f} valley@{valley_pos:.2f}")
# Find story boundaries by looking for sharp transitions
print("\n\nSharp transitions (potential story/scene boundaries):")
print("-" * 60)
# Use total gradient magnitude across all features
gradients = np.diff(predictions, axis=0)
gradient_magnitude = np.sqrt(np.sum(gradients ** 2, axis=1))
# Find top transition points
top_transitions = np.argsort(gradient_magnitude)[-8:] # top 8 (roughly one per story)
top_transitions = np.sort(top_transitions)
for idx in top_transitions:
pos = positions[idx]
# Estimate which chunk this corresponds to
chunk_idx = int(pos * (len(chunks) - 1))
chunk_preview = chunks[min(chunk_idx, len(chunks) - 1)][:80]
print(f" Position {pos:.3f} (magnitude {gradient_magnitude[idx]:.4f})")
print(f" Text: \"{chunk_preview}...\"")
print()
# Compare dialogue-heavy vs narrative-heavy regions
print("\nDialogue vs Narrative rhythm:")
print("-" * 60)
dialogue_idx = FEATURE_NAMES.index("dialogue_ratio")
sent_var_idx = FEATURE_NAMES.index("sentence_length_variance")
short_idx = FEATURE_NAMES.index("short_sentence_ratio")
# Split into quartiles
n = len(positions)
for q, label in [(0, "Opening"), (1, "Early-mid"), (2, "Late-mid"), (3, "Closing")]:
start = q * n // 4
end = (q + 1) * n // 4
avg_dialogue = np.mean(predictions_real[start:end, dialogue_idx])
avg_variance = np.mean(predictions_real[start:end, sent_var_idx])
avg_short = np.mean(predictions_real[start:end, short_idx])
print(f" {label:12s} dialogue={avg_dialogue:.4f} "
f"sent_variance={avg_variance:.4f} short_ratio={avg_short:.4f}")
# Interpolation test — what does the field predict BETWEEN chunks?
print("\n\nInterpolation test (querying between training points):")
print("-" * 60)
print("The field predicts style features at positions where no text exists.")
print("If interpolation is smooth and sensible, the field learned structure.")
print("If it's noisy/random, it just memorised individual chunks.")
# Check smoothness: average absolute second derivative
second_deriv = np.diff(predictions, n=2, axis=0)
smoothness = np.mean(np.abs(second_deriv))
print(f"\n Smoothness score (lower = smoother): {smoothness:.6f}")
if smoothness < 0.01:
print(" → Very smooth field — learned continuous style patterns")
elif smoothness < 0.05:
print(" → Moderately smooth — some structure learned")
else:
print(" → Rough field — mostly memorised chunks")
return predictions_real
# ---------------------------------------------------------------------------
# 7. Save results for later
# ---------------------------------------------------------------------------
def save_results(positions, predictions_real, output_path):
"""Save the field data as JSON for potential visualisation later."""
results = {
"positions": positions.tolist(),
"features": {
name: predictions_real[:, i].tolist()
for i, name in enumerate(FEATURE_NAMES)
},
"feature_names": FEATURE_NAMES,
"description": "WoRF continuous field over Wodehouse's 'My Man Jeeves'",
}
Path(output_path).write_text(json.dumps(results, indent=2))
print(f"\nField data saved to {output_path}")
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
book_path = Path(__file__).parent / "pg-wood.txt"
print("WoRF — Word Radiance Field Experiment")
print("=" * 60)
print(f"Source: {book_path.name}")
# Load and split
text = load_and_clean(str(book_path))
print(f"Clean text: {len(text):,} characters, {len(text.split()):,} words")
chunks = split_into_chunks(text, chunk_size=300)
print(f"Chunks: {len(chunks)} (≈300 words each)")
# Prepare training data
positions, features, norm_stats = prepare_data(chunks)
print(f"Feature dimensions: {NUM_FEATURES}")
print(f"Features: {', '.join(FEATURE_NAMES)}")
# Train
model, num_freq = train_worf(positions, features, epochs=3000)
# Query the continuous field
query_positions, predictions = query_field(model, num_freq, num_points=1000)
# Analyse
predictions_real = analyse_field(query_positions, predictions, norm_stats, chunks)
# Save
save_results(query_positions, predictions_real,
str(book_path.parent / "worf-field-jeeves.json"))
print("\n" + "=" * 60)
print("Done. The field exists. Poke it and see what it tells you.")
print("=" * 60)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,474 @@
#!/usr/bin/env python3
"""
WoRF v2 Word Radiance Field (Feature-Space)
===============================================
v1 used position-in-book as coordinates just memorised chunks.
v2 uses the style features themselves as the coordinate system.
The field learns relationships BETWEEN style dimensions:
"When dialogue is high and sentences are short, what happens to
vocabulary richness and aside density?"
That's the relational wording data — not what words mean,
but how they behave together. The stuff a language pack needs.
Multi-book ready: each book is more "photos" of the same style field.
"""
import re
import math
import json
import torch
import torch.nn as nn
import numpy as np
from pathlib import Path
from collections import Counter
from dataclasses import dataclass
WORF_DIR = Path(__file__).parent
# ---------------------------------------------------------------------------
# 1. Feature Extraction (same as v1, proven to work)
# ---------------------------------------------------------------------------
FEATURE_NAMES = [
"avg_word_length",
"avg_sentence_length",
"sentence_length_variance",
"dialogue_ratio",
"vocabulary_richness",
"dash_density",
"exclamation_density",
"question_density",
"short_sentence_ratio",
"aside_density",
"avg_punct_per_sentence",
]
NUM_FEATURES = len(FEATURE_NAMES)
def extract_features(chunk: str) -> list[float]:
"""Extract stylistic features from a chunk of text."""
words = chunk.split()
sentences = re.split(r'[.!?]+', chunk)
sentences = [s.strip() for s in sentences if s.strip()]
word_lengths = [len(w.strip(".,;:!?\"'()—-")) for w in words]
word_lengths = [wl for wl in word_lengths if wl > 0]
dialogue_chars = sum(1 for c in chunk if c == '"')
total_chars = len(chunk) or 1
dashes = chunk.count("") + chunk.count("--")
exclamations = chunk.count("!")
questions = chunk.count("?")
commas = chunk.count(",")
unique_words = len(set(w.lower().strip(".,;:!?\"'()—-") for w in words))
total_words = len(words) or 1
sent_lengths = [len(s.split()) for s in sentences]
sent_mean = float(np.mean(sent_lengths)) if sent_lengths else 0.0
sent_std = float(np.std(sent_lengths)) if sent_lengths else 0.0
short_sentences = sum(1 for sl in sent_lengths if sl <= 5)
short_ratio = short_sentences / (len(sent_lengths) or 1)
aside_density = (commas + dashes) / total_words
return [
float(np.mean(word_lengths)) if word_lengths else 0.0,
sent_mean,
sent_std,
dialogue_chars / total_chars,
unique_words / total_words,
dashes / total_words,
exclamations / total_words,
questions / total_words,
short_ratio,
aside_density,
(commas + dashes + exclamations + questions) / (len(sent_lengths) or 1),
]
# ---------------------------------------------------------------------------
# 2. Text Loading (multi-book ready)
# ---------------------------------------------------------------------------
@dataclass
class BookChunk:
text: str
features: list[float]
book: str
chunk_idx: int
position: float # 0-1 position within book
def load_gutenberg(path: str, title: str) -> list[BookChunk]:
"""Load a Gutenberg text, split into chunks, extract features."""
text = Path(path).read_text(encoding="utf-8")
# Strip PG header/footer
for marker in ["*** START OF THE PROJECT GUTENBERG EBOOK",
"*** START OF THIS PROJECT GUTENBERG EBOOK"]:
idx = text.find(marker)
if idx != -1:
text = text[text.find("\n", idx) + 1:]
break
end = text.find("*** END OF THE PROJECT GUTENBERG")
if end != -1:
text = text[:end]
text = text.strip()
words = text.split()
chunk_size = 300
chunks = []
for i in range(0, len(words), chunk_size):
chunk_text = " ".join(words[i:i + chunk_size])
if len(chunk_text.split()) > 50:
chunks.append(chunk_text)
results = []
for i, chunk_text in enumerate(chunks):
results.append(BookChunk(
text=chunk_text,
features=extract_features(chunk_text),
book=title,
chunk_idx=i,
position=i / max(len(chunks) - 1, 1),
))
print(f" {title}: {len(results)} chunks from {len(words):,} words")
return results
# ---------------------------------------------------------------------------
# 3. WoRF v2 — Masked Feature Prediction
# ---------------------------------------------------------------------------
#
# Instead of position → features, we do:
# features_with_one_masked → predict_all_features
#
# This learns the RELATIONSHIPS between style dimensions.
# Like a denoising autoencoder where each mask reveals a different
# relationship. Like NeRF views — each masking angle shows a different
# aspect of the same underlying field.
def positional_encoding(x: torch.Tensor, num_frequencies: int = 6) -> torch.Tensor:
"""Sinusoidal encoding for continuous feature values."""
encodings = [x]
for freq in range(num_frequencies):
encodings.append(torch.sin(2.0 ** freq * math.pi * x))
encodings.append(torch.cos(2.0 ** freq * math.pi * x))
return torch.cat(encodings, dim=-1)
class WoRFv2(nn.Module):
"""Word Radiance Field v2 — learns inter-feature relationships.
Input: N features (one zeroed out) + mask indicator per feature
Output: predicted values for all features
The network learns: given these style characteristics,
what must the missing one be? That's the relational field.
"""
def __init__(self, num_features: int, num_frequencies: int = 6,
hidden_dim: int = 256, num_layers: int = 6):
super().__init__()
self.num_features = num_features
self.num_frequencies = num_frequencies
per_feature_dim = 1 + 2 * num_frequencies # encoded value
input_dim = num_features * (per_feature_dim + 1) # +1 for mask flag
layers = []
layers.append(nn.Linear(input_dim, hidden_dim))
layers.append(nn.GELU())
for i in range(num_layers - 2):
layers.append(nn.Linear(hidden_dim, hidden_dim))
layers.append(nn.GELU())
if i == num_layers // 2 - 2:
layers.append(nn.Dropout(0.05))
layers.append(nn.Linear(hidden_dim, num_features))
self.network = nn.Sequential(*layers)
def encode_input(self, features: torch.Tensor, mask_idx: torch.Tensor) -> torch.Tensor:
"""Encode features with positional encoding + mask flags."""
encoded_parts = []
for f in range(self.num_features):
feat_val = features[:, f:f+1]
feat_encoded = positional_encoding(feat_val, self.num_frequencies)
is_masked = (mask_idx == f).float().unsqueeze(-1)
feat_encoded = feat_encoded * (1.0 - is_masked)
feat_with_mask = torch.cat([feat_encoded, is_masked], dim=-1)
encoded_parts.append(feat_with_mask)
return torch.cat(encoded_parts, dim=-1)
def forward(self, features: torch.Tensor, mask_idx: torch.Tensor) -> torch.Tensor:
encoded = self.encode_input(features, mask_idx)
return self.network(encoded)
# ---------------------------------------------------------------------------
# 4. Training
# ---------------------------------------------------------------------------
def train_worf_v2(chunks: list[BookChunk], epochs: int = 3000, lr: float = 5e-4):
"""Train WoRF v2 with random feature masking."""
features = torch.tensor([c.features for c in chunks], dtype=torch.float32)
feat_min = features.min(dim=0).values
feat_max = features.max(dim=0).values
feat_range = feat_max - feat_min
feat_range[feat_range == 0] = 1.0
features_norm = (features - feat_min) / feat_range
norm_stats = {"min": feat_min, "max": feat_max, "range": feat_range}
model = WoRFv2(num_features=NUM_FEATURES)
optimiser = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimiser, T_max=epochs)
loss_fn = nn.MSELoss()
n_chunks = len(chunks)
print(f"\nTraining WoRF v2: {n_chunks} chunks, {NUM_FEATURES} features")
print(f"Architecture: masked feature prediction (like masked autoencoder)")
print("-" * 60)
best_loss = float("inf")
for epoch in range(epochs):
mask_idx = torch.randint(0, NUM_FEATURES, (n_chunks,))
pred = model(features_norm, mask_idx)
loss = loss_fn(pred, features_norm)
optimiser.zero_grad()
loss.backward()
optimiser.step()
scheduler.step()
if loss.item() < best_loss:
best_loss = loss.item()
if epoch % 300 == 0 or epoch == epochs - 1:
print(f" Epoch {epoch:4d}/{epochs} Loss: {loss.item():.6f} "
f"Best: {best_loss:.6f} LR: {scheduler.get_last_lr()[0]:.6f}")
return model, features_norm, norm_stats
# ---------------------------------------------------------------------------
# 5. Analysis
# ---------------------------------------------------------------------------
def probe_relationships(model: WoRFv2, features_norm: torch.Tensor, norm_stats: dict):
"""Probe what the field learned about feature relationships."""
print("\n" + "=" * 60)
print("RELATIONAL FIELD ANALYSIS")
print("=" * 60)
model.eval()
# --- Test 1: Feature predictability ---
print("\nFeature predictability (lower error = stronger relationship to others):")
print("-" * 60)
feature_errors = {}
with torch.no_grad():
for f in range(NUM_FEATURES):
mask_idx = torch.full((len(features_norm),), f, dtype=torch.long)
pred = model(features_norm, mask_idx)
error = torch.mean((pred[:, f] - features_norm[:, f]) ** 2).item()
feature_errors[FEATURE_NAMES[f]] = error
sorted_features = sorted(feature_errors.items(), key=lambda x: x[1])
for name, error in sorted_features:
bar_len = int((1 - min(error * 20, 1)) * 40)
bar = "#" * bar_len
predictability = "highly relational" if error < 0.01 else \
"moderately relational" if error < 0.05 else "independent"
print(f" {name:30s} error={error:.5f} [{bar:40s}] {predictability}")
# --- Test 2: Feature influence matrix ---
print("\n\nFeature influence matrix:")
print("(When feature X increases, what happens to feature Y?)")
print("-" * 60)
influence_matrix = np.zeros((NUM_FEATURES, NUM_FEATURES))
with torch.no_grad():
baseline = features_norm.mean(dim=0, keepdim=True)
for source_f in range(NUM_FEATURES):
high = baseline.clone()
low = baseline.clone()
high[0, source_f] = 0.9
low[0, source_f] = 0.1
for target_f in range(NUM_FEATURES):
if target_f == source_f:
continue
mask = torch.tensor([target_f])
pred_high = model(high, mask)[0, target_f].item()
pred_low = model(low, mask)[0, target_f].item()
influence_matrix[source_f, target_f] = pred_high - pred_low
# Print matrix
short_names = [n[:8] for n in FEATURE_NAMES]
print(f"\n {'':30s}", end="")
for sn in short_names:
print(f" {sn:>8s}", end="")
print()
for i, name in enumerate(FEATURE_NAMES):
print(f" {name:30s}", end="")
for j in range(NUM_FEATURES):
val = influence_matrix[i, j]
if i == j:
print(f" ---", end="")
elif abs(val) > 0.15:
print(f" {val:+.2f}*", end="")
else:
print(f" {val:+.3f}", end="")
print()
# --- Test 3: Style interpolation ---
print("\n\nStyle interpolation (walking through the field):")
print("-" * 60)
print("Interpolating between 'narrative exposition' and 'snappy dialogue':\n")
with torch.no_grad():
narrative = baseline.clone()
narrative[0, FEATURE_NAMES.index("dialogue_ratio")] = 0.05
narrative[0, FEATURE_NAMES.index("avg_sentence_length")] = 0.8
narrative[0, FEATURE_NAMES.index("short_sentence_ratio")] = 0.1
narrative[0, FEATURE_NAMES.index("vocabulary_richness")] = 0.8
dialogue = baseline.clone()
dialogue[0, FEATURE_NAMES.index("dialogue_ratio")] = 0.9
dialogue[0, FEATURE_NAMES.index("avg_sentence_length")] = 0.2
dialogue[0, FEATURE_NAMES.index("short_sentence_ratio")] = 0.8
dialogue[0, FEATURE_NAMES.index("vocabulary_richness")] = 0.4
predict_features = [
FEATURE_NAMES.index("exclamation_density"),
FEATURE_NAMES.index("question_density"),
FEATURE_NAMES.index("dash_density"),
FEATURE_NAMES.index("aside_density"),
FEATURE_NAMES.index("avg_punct_per_sentence"),
]
print(f" {'blend':>5s}", end="")
for name in ["excl_dens", "quest_dens", "dash_dens", "aside_dens", "punct/sent"]:
print(f" {name:>10s}", end="")
print()
print(f" {'':>5s}{'':->55s}")
for alpha in np.linspace(0, 1, 11):
blended = narrative * (1 - alpha) + dialogue * alpha
predictions = []
for pf in predict_features:
mask = torch.tensor([pf])
pred = model(blended, mask)[0, pf].item()
pred_real = pred * norm_stats["range"][pf].item() + norm_stats["min"][pf].item()
predictions.append(pred_real)
label = "narr" if alpha < 0.3 else "dial" if alpha > 0.7 else "mix"
print(f" {alpha:4.1f}{label:1s}", end="")
for p in predictions:
print(f" {p:10.4f}", end="")
print()
# --- Test 4: Reconstruction accuracy ---
print("\n\nReconstruction accuracy per feature:")
print("-" * 60)
with torch.no_grad():
total_error = 0
total_count = 0
for f in range(NUM_FEATURES):
mask = torch.full((len(features_norm),), f, dtype=torch.long)
pred = model(features_norm, mask)
errors = (pred[:, f] - features_norm[:, f]) ** 2
rmse_real = math.sqrt(errors.mean().item()) * norm_stats["range"][f].item()
total_error += errors.sum().item()
total_count += len(errors)
print(f" {FEATURE_NAMES[f]:30s} RMSE (real units): {rmse_real:.4f}")
avg_error = total_error / total_count
print(f"\n Overall MSE: {avg_error:.6f}")
print(f" Overall RMSE: {math.sqrt(avg_error):.4f}")
return influence_matrix
# ---------------------------------------------------------------------------
# 6. Save
# ---------------------------------------------------------------------------
def save_results(influence_matrix: np.ndarray, output_path: str):
"""Save the influence matrix and metadata."""
results = {
"feature_names": FEATURE_NAMES,
"influence_matrix": influence_matrix.tolist(),
"description": "WoRF v2: inter-feature influence matrix from masked prediction",
"interpretation": "influence_matrix[i][j] = when feature i goes high, "
"how much does the predicted value of feature j change",
}
Path(output_path).write_text(json.dumps(results, indent=2))
print(f"\nResults saved to {output_path}")
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
print("WoRF v2 — Word Radiance Field (Relational)")
print("=" * 60)
all_chunks = []
book_path = WORF_DIR / "pg-wood.txt"
if book_path.exists():
all_chunks.extend(load_gutenberg(str(book_path), "My Man Jeeves"))
# Add more books here:
# all_chunks.extend(load_gutenberg("pg-wilde.txt", "Importance of Being Earnest"))
# all_chunks.extend(load_gutenberg("pg-austen.txt", "Pride and Prejudice"))
if not all_chunks:
print("No books found!")
return
books = set(c.book for c in all_chunks)
print(f"\nTotal: {len(all_chunks)} chunks from {len(books)} book(s)")
model, features_norm, norm_stats = train_worf_v2(all_chunks, epochs=4000)
influence_matrix = probe_relationships(model, features_norm, norm_stats)
save_results(influence_matrix, str(WORF_DIR / "worf-v2-relations.json"))
print("\n" + "=" * 60)
print("The relational field exists.")
print("This is what Wodehouse's English 'feels like' in feature space.")
print("Add more books to build toward an EN-GB WoRF language pack.")
print("=" * 60)
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,162 @@
{
"feature_names": [
"avg_word_length",
"avg_sentence_length",
"sentence_length_variance",
"dialogue_ratio",
"vocabulary_richness",
"dash_density",
"exclamation_density",
"question_density",
"short_sentence_ratio",
"aside_density",
"avg_punct_per_sentence"
],
"influence_matrix": [
[
0.0,
-0.03247326612472534,
-0.0239107608795166,
-0.00048324093222618103,
0.1107892394065857,
0.015222892165184021,
-0.024353697896003723,
0.02327282726764679,
0.055540263652801514,
0.04952073097229004,
-0.018031805753707886
],
[
-0.11262395977973938,
0.0,
0.1966363489627838,
0.0003904178738594055,
-0.02297872304916382,
-0.068694107234478,
-0.12937799841165543,
-0.19205902516841888,
-0.29318100214004517,
-0.09364050626754761,
0.21115505695343018
],
[
0.005609989166259766,
0.13626961410045624,
0.0,
-0.0007154941558837891,
-0.02271491289138794,
0.005668185651302338,
-0.0020959973335266113,
-0.01791289448738098,
0.04299241304397583,
0.03149789571762085,
0.153947114944458
],
[
-0.01625087857246399,
0.012996375560760498,
0.004404813051223755,
0.0,
-0.004828751087188721,
-0.010406054556369781,
0.012377187609672546,
-0.007560417056083679,
0.017317771911621094,
-0.006858497858047485,
0.013844549655914307
],
[
0.05449041724205017,
-0.002728700637817383,
0.03543153405189514,
-0.0007495768368244171,
0.0,
0.02357766404747963,
-0.06922292709350586,
-0.01401202380657196,
0.03409099578857422,
-0.022808074951171875,
-0.06983467936515808
],
[
0.05502724647521973,
-0.028156444430351257,
0.016653388738632202,
-0.0004658550024032593,
0.008968591690063477,
0.0,
0.07332807779312134,
0.004690051078796387,
0.004198431968688965,
0.1471288800239563,
0.1343848705291748
],
[
-0.008408337831497192,
-0.03403817117214203,
-0.03511646389961243,
0.0002146884799003601,
0.01336967945098877,
0.012008734047412872,
0.0,
-0.038716867566108704,
0.01683211326599121,
0.015300273895263672,
0.038202375173568726
],
[
-0.04866918921470642,
-0.09030131995677948,
-0.08065217733383179,
0.0006130747497081757,
-0.04372537136077881,
0.035463668406009674,
0.020850971341133118,
0.0,
0.06807422637939453,
0.04871469736099243,
0.015091657638549805
],
[
0.07264012098312378,
-0.17126457393169403,
0.007805615663528442,
0.0005212798714637756,
-0.07545053958892822,
-0.011027880012989044,
0.16361884027719498,
0.1303078681230545,
0.0,
0.08242395520210266,
-0.042179644107818604
],
[
0.05252787470817566,
-0.06419773399829865,
0.006353020668029785,
-0.0005619712173938751,
-0.03329026699066162,
0.04053857922554016,
0.05099382996559143,
0.0370599627494812,
0.05590474605560303,
0.0,
0.22894394397735596
],
[
-0.011781513690948486,
0.0985381007194519,
0.09538811445236206,
-0.00027120113372802734,
-0.0469667911529541,
0.04663299024105072,
0.04154162108898163,
0.0520768016576767,
-0.12925076484680176,
0.32439711689949036,
0.0
]
],
"description": "WoRF v2: inter-feature influence matrix from masked prediction",
"interpretation": "influence_matrix[i][j] = when feature i goes high, how much does the predicted value of feature j change"
}