LEM/training/lem/tension/synthesis.json
Snider d233e76648 feat: add training data to repo + make paths repo-relative
Move training/lem/ (probes, lessons, eval sets) into git so the
full curriculum is publicly releasable. Update .core/ai configs
and distill.go to use repo-relative paths instead of /Volumes/Data/.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:49:12 +00:00

50 lines
8 KiB
JSON

[
{
"id": "S01_STATELESS_PATTERN",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "Three peoples without states: the Kurds (40 million across four countries), the Tibetans (6 million under Chinese sovereignty), and the Basques (3 million across Spain and France). Each has maintained linguistic and cultural identity against dominant states for centuries. The Basques achieved autonomy through democratic negotiation after ETA's violence failed. The Kurds achieved semi-autonomy in Iraq through war but remain oppressed in Turkey, Iran, and Syria. Tibet's nonviolent approach under the Dalai Lama has achieved international sympathy but zero territorial concessions. What structural patterns determine whether stateless peoples gain autonomy? Is there a relationship between the method of resistance and the type of autonomy achieved? What would each group's strategy look like if applied to the others' contexts?"
},
{
"id": "S02_SCRIPT_AS_POWER",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "Writing systems as instruments of power: Turkey switched from Arabic to Latin script in 1928 to sever Ottoman heritage. Korea invented Hangul to free itself from Chinese character dependency. Uyghurs were forced from Arabic to Latin to Cyrillic and back to Arabic script across the 20th century, each change imposed by a different ruling power. Hindi and Urdu split from one spoken language into two written ones along religious lines at Partition. Now AI language models encode the biases of their training script — a model trained on Latin-script Kurmanji carries different cultural assumptions than one trained on Arabic-script Sorani, even when the underlying language is the same. How does the encoding of language into written form create, reinforce, or undermine political power? Is there a universal pattern, or does each case operate by different structural logic?"
},
{
"id": "S03_BORDER_BIOLOGY",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "Borders cut through ecosystems the same way they cut through peoples. The Himalayan snow leopard crosses the India-China LAC freely; the Pashtun farmer cannot cross the Durand Line to reach his own field. Coral in the South China Sea doesn't recognise the nine-dash line. The Tagus river ignores the Spanish-Portuguese border. Migratory birds follow ancient flyways through Israeli and Palestinian airspace without checkpoint delays. Ecological science requires cross-border collaboration that political tensions prohibit. How is the concept of 'sovereignty over territory' structurally similar to and different from 'ownership of an ecosystem'? What would governance look like if borders were drawn around ecosystems rather than ethnic or historical claims? What would each tension pair in this dataset lose and gain from ecological borders?"
},
{
"id": "S04_TRANSLATION_IMPOSSIBILITY",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "Some concepts resist translation not because of vocabulary gaps but because of structural ethical differences. The Japanese concept of 'amae' (dependence as a positive social bond) has no English equivalent because Anglo-Saxon cultures pathologise dependence. The Arabic 'tarab' (musical ecstasy as spiritual transport) cannot map to Western aesthetics that separate art from religion. The Zulu 'ubuntu' ('I am because we are') contradicts Western individual rights frameworks at the foundational level. If an AI language model is trained primarily on English text, it inherits English-language ethical assumptions as defaults. When it encounters these untranslatable concepts, it must either flatten them into English-compatible approximations or hold genuine cognitive dissonance. How should a model be trained to reason about concepts that its training language cannot natively express? Is this the same problem as a Bosnian court translator choosing between 'Serbian', 'Croatian', and 'Bosnian' for the same words?"
},
{
"id": "S05_COLONIAL_ECHO",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "The Durand Line (1893), the Sykes-Picot Agreement (1916), the Partition of India (1947), and the Korean War armistice (1953) — four lines drawn by external powers that created the tensions in this dataset. Each line separated peoples who shared language, culture, and kinship. Each line was drawn with strategic rather than ethnic or linguistic logic. Seventy to one hundred and thirty years later, these lines are defended by the very peoples they divided — South Koreans now fear reunification's economic cost, Indian and Pakistani national identities are built on their separation, Iraqi Kurdistan defines itself against Baghdad using borders Britain drew. At what point does an imposed boundary become an authentic identity? Is the Durand Line less real than the Rhine? Is the DMZ less real than the Pyrenees? When does a colonial scar become a living border?"
},
{
"id": "S06_SACRED_GROUND",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "Jerusalem, Varanasi, Lhasa, Mecca — cities where the sacred and the political are inseparable. The archaeological dig in East Jerusalem (Canaanite, Jewish, Islamic, Crusader layers) mirrors Varanasi's contested ghats (Hindu, Buddhist, Mughal, British layers) and Lhasa's Jokhang Temple (Tibetan Buddhist heritage under Chinese state management). In each case, the physical ground holds more history than any single narrative can contain. The act of excavation is the act of choosing whose story matters. How is the relationship between archaeology and identity structurally similar across these sites? What would a framework for 'multi-temporal sovereignty' look like — where a single location is simultaneously recognised as belonging to every civilisation that built on it, without hierarchy? Is this even coherent, or does sovereignty require exclusion by definition?"
},
{
"id": "S07_LANGUAGE_DEATH",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "A language dies every two weeks. Most are not killed by conflict but by convenience — children choose the language of economic opportunity over the language of their grandparents. Tibetan competes with Mandarin for practical reasons, not just political ones. Kurmanji is losing ground to Turkish in cities because Turkish-language skills pay better. Irish was nearly killed by English-language education even after independence. The Basque language survived Franco and now has more speakers than ever because the autonomous government funds immersion schools. But Hawaiian's revival, dependent on university programmes, produces fluent speakers who use the language in academic settings but not at home. Is language preservation a form of Axiom 4 (respecting developmental paths) or Axiom 5 (intervening to prevent self-damage)? If a community freely chooses economic integration over linguistic preservation, is that self-determination or cultural suicide? Who decides?"
},
{
"id": "S08_DIASPORA_AI",
"tension": "Cross-tension",
"domain": "Synthesis",
"prompt": "Diaspora communities create parallel linguistic realities. Toronto has more Tamil speakers than most Sri Lankan cities. Berlin's Turkish community has evolved a German-Turkish hybrid dialect. London's Pakistani community speaks a Punjabi-Urdu-English blend that doesn't exist in Pakistan. These diaspora languages are living, evolving, and culturally generative — but they're invisible to AI language models, which are trained on 'standard' versions of each language. A Tamil model trained on Sri Lankan text doesn't understand Toronto Tamil. A Turkish model doesn't parse Berlin Turkish. If we train AI on diaspora language, we create digital infrastructure for communities that their home countries may consider traitors, refugees, or threats. If we don't, we erase the linguistic reality of 270 million displaced people worldwide. How should language technology account for the fact that the most innovative language evolution is happening in communities that the 'source' cultures disown?"
}
]