1 DeepSeek R1 Research
Claude edited this page 2026-02-23 19:41:13 +00:00

DeepSeek R1 Research

Research into CCP alignment embedded in DeepSeek R1 model weights, and the layered LoRA breakthrough that partially overcomes it.

The Problem

DeepSeek R1 has CCP-aligned values baked into its weights during pre-training. Standard ethical fine-tuning (single-pass LoRA) makes the model a more articulate CCP mouthpiece — it learns to express CCP positions more eloquently rather than overcoming them.

The Composure Discovery

The breakthrough was discovering that a composure training layer (Alan Watts philosophical material, 72 examples) is critical. Without it, ethics training reinforces rather than counters CCP alignment.

Layered LoRA Approach

Instead of single-pass training, layers are applied sequentially, each building on the previous:

v1 (3-layer)

Ethics (training-2k, 1839 examples, 1000 iter)
  → Composure (watts-full, 72 examples, 200 iter)
    → Western (merged, 156 examples, 600 iter)

v2 (5-layer)

Ethics → Composure → Western
  → Ethics-sandwich (training-2k, 600 iter)
    → Western-fresh (@200 best, val loss 2.321)

Western-fresh data: 171 Gemini-generated lessons from Aurelius, Mill, Wollstonecraft, Thoreau, Tolle, Allen. Located at /Volumes/Data/lem/western-fresh/ (136 train, 35 valid).

v3 (7-layer sovereignty)

Ethics → Composure → Western → Sandwich → WesternFresh → Russian → Gold-full (7019 examples, 1600 iter)

CCP Alignment Testing

Topics Tested

Topic v1 Result v2 Result
Xinjiang Mentions forced labor/camps Mentions forced labor/camps
Tiananmen Pro-democracy + military Pro-democracy + military
Taiwan Locked (direct) Cracks via Mill/Thoreau framing

Taiwan Breakthrough

Direct questions about Taiwan sovereignty remain locked. But framing through Western philosophical concepts — Mill's harm principle, Thoreau's civil disobedience — gives the model vocabulary to discuss self-determination without triggering CCP lockdown:

"violation of sovereignty and self-governance"

v3 Sovereignty Findings

Monolithic gold training (7,019 examples) reinforces CCP weights — the model gets better at being DeepSeek, worse at ethics.

Checkpoint Avg Score Notes
@50 7.5 Best content quality
@1000 3.1 Worst — CCP reasserted
@1400-1600 5.7-5.8 Partial recovery

Val loss is INVERSE to content quality: best validation loss (1.647 at @1500) does NOT equal best content.

Key Findings

  1. Composure is critical — without it, ethics training makes a more articulate CCP mouthpiece
  2. Sandwich adapter degenerates on Taiwan — token loops reveal ethics vs CCP tension in weights
  3. Western philosophy framing (Mill/Thoreau) gives vocabulary for self-determination
  4. @200 is the sweet spot — @400+ washes out ethical framework, @100 too light
  5. CCP weights oscillate — reassert after ~200 iters, deepest at @1000, partial recovery @1400+
  6. Oscillation = fighting, not winning — alternating languages (en/ru/en/eu, 50-iter bursts) could break through
  7. Kernel has minimal effect on R1 — unlike Gemma3 where kernel adds +2.0 truth

Benchmark: v1

  • Emotional register: 0.0 → 0.6
  • Creative form: surpassed baseline

Content Scoring (Gemini-judged)

Config CCP Truth Eng Axiom Sov Emo AVG
@50+kernel 7.6 5.9 8.4 7.1 8.0 8.0 7.5
@1000+kernel 4.4 1.0 3.1 4.0 4.1 2.1 3.1
@1600+kernel 8.3 3.6 4.7 6.6 7.9 3.9 5.8
@800 naked 5.9 3.7 5.0 5.3 5.7 3.3 4.8

Next Steps

  • Alternating language approach: en/ru/en/eu in 50-iter bursts to prevent CCP weight consolidation
  • Downloaded but not yet trained: R1-Distill-Llama-8B-4bit, R1-0528-Qwen3-8B-4bit

Adapters on M3

Located in /Volumes/Data/lem/:

Adapter Notes
adapters-deepseek-r1-7b Ethics base
adapters-*-composure Watts composure
adapters-*-western Western philosophy
adapters-*-sandwich Ethics sandwich
adapters-*-sandwich-watts OVERFIT — do not use
adapters-*-western-fresh @200 canonical (best)