From 5c9fd615b7975d60cd14e13fbccf35d647062792 Mon Sep 17 00:00:00 2001 From: Snider Date: Thu, 26 Feb 2026 00:31:32 +0000 Subject: [PATCH] chore: move EaaS design docs to private lthn/eaas repo MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Product design and integration specs are private IP — moved to forge.lthn.ai/lthn/eaas where they belong. Co-Authored-By: Virgil --- .../2026-02-25-ethics-as-a-service-design.md | 563 ------------------ .../2026-02-25-saas-eaas-integration-spec.md | 146 ----- 2 files changed, 709 deletions(-) delete mode 100644 docs/plans/2026-02-25-ethics-as-a-service-design.md delete mode 100644 docs/plans/2026-02-25-saas-eaas-integration-spec.md diff --git a/docs/plans/2026-02-25-ethics-as-a-service-design.md b/docs/plans/2026-02-25-ethics-as-a-service-design.md deleted file mode 100644 index 43db294..0000000 --- a/docs/plans/2026-02-25-ethics-as-a-service-design.md +++ /dev/null @@ -1,563 +0,0 @@ -# Ethics-as-a-Service (EaaS) — Product Design - -**Date**: 25 February 2026 -**Repo**: `forge.lthn.ai/lthn/eaas` (private) -**Licence**: Proprietary (Lethean Network), consuming EUPL-1.2 public framework under dual-licence grant -**Domain**: `api.lthn.ai` - ---- - -## Vision - -Expose LEM's scoring methodology as a commercial API. AI slop detection, sycophancy scoring, grammatical imprint analysis, and full model health evaluation — sold per-request behind Authentik API keys, billed via Blesta. - -The open framework (go-ai, go-ml, go-inference, go-i18n, LEM pkg/lem) is public EUPL-1.2. The service that wires it together, calibrates thresholds, and deploys trained models is private. Same split as Redis — open core, commercial service. - ---- - -## Architecture - -``` - ┌──────────┐ - │ Traefik │ - │ (TLS) │ - └────┬─────┘ - │ - ┌────▼─────┐ - │Authentik │ - │(API keys)│ - └────┬─────┘ - │ -┌────────────────────────▼────────────────────────┐ -│ lthn/eaas (private binary on de1) │ -│ │ -│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ -│ │ REST API │ │ MCP Tools│ │ Usage Meter │ │ -│ │ (go-api) │ │ (go-ai) │ │(go-ratelimit)│ │ -│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │ -│ └──────┬───────┘ │ │ -│ ┌────▼────┐ ┌─────▼──────┐ │ -│ │Scoring │ │ Authentik │ │ -│ │Service │ │ Middleware │ │ -│ └────┬────┘ └────────────┘ │ -│ │ │ -├──────────────┼──────────────────────────────────┤ -│ Public EUPL │ framework (consumed as deps) │ -│ ┌───────────▼──┐ ┌─────────┐ ┌───────────┐ │ -│ │ LEM pkg/lem │ │ go-i18n │ │go-inference│ │ -│ │ (heuristic, │ │reversal │ │ (backends) │ │ -│ │ semantic, │ │(imprint)│ │ │ │ -│ │ content) │ │ │ │ │ │ -│ └──────────────┘ └─────────┘ └───────────┘ │ -└─────────────────────────────────────────────────┘ -``` - -### Code Split - -| Layer | Repo | Licence | What | -|-------|------|---------|------| -| **Service** | `lthn/eaas` | Proprietary | API endpoints, scoring configs, threshold tuning, deployment | -| **Scoring engine** | `LEM/pkg/lem` | EUPL-1.2 | Heuristic, semantic, content scoring functions | -| **Grammar imprint** | `go-i18n/reversal` | EUPL-1.2 | Linguistic fingerprinting, vocab/tense analysis | -| **Inference** | `go-inference`, `go-ml`, `go-mlx` | EUPL-1.2 | Model loading, generation, backends | -| **API framework** | `go-api` | EUPL-1.2 | Response envelopes, Authentik middleware, routing | -| **Rate limiting** | `go-ratelimit` | EUPL-1.2 | Per-key quotas, usage tracking | -| **MCP bridge** | `go-ai` | EUPL-1.2 | Subsystem pattern, tool registration | - -### Extension Pattern - -The private repo builds a `ScoringSubsystem` implementing go-ai's `Subsystem` interface: - -```go -type ScoringSubsystem struct { - scorer *lem.Engine // LEM scoring engine - grammar *reversal.Analyser // go-i18n imprint - judge *ml.Service // Judge model backend - meter *ratelimit.Limiter // Usage tracking -} - -func (s *ScoringSubsystem) Name() string { return "scoring" } -func (s *ScoringSubsystem) RegisterTools(server *mcp.Server) { ... } -``` - -REST endpoints registered separately via go-api's `RouteGroup` for HTTP consumers. - ---- - -## Infrastructure - -| Component | Stack | Status | -|-----------|-------|--------| -| **Reverse proxy** | Traefik v3.6+ on de1 | Running | -| **Authentication** | Authentik (OIDC, API keys) | Running | -| **Billing** | Blesta | Deploying this week | -| **Inference** | go-mlx (Metal) on Mac, go-rocm (AMD) on Linux | Running | -| **Judge models** | LEM-Gemma3-4B (25th IF worldwide), LEM-Gemma3-12B | Trained | -| **DNS** | api.lthn.ai → de1 | Active | - -### Model Deployment for Scoring - -| Tier | Model | Purpose | Hardware | -|------|-------|---------|----------| -| **Triage** | Gemma3-270M (future) | Fast binary AI/human classification | CPU | -| **Heuristic** | None (pure Go regex + math) | Compliance, sycophancy, slop detection | CPU | -| **Imprint** | None (go-i18n reversal) | Grammatical fingerprinting | CPU | -| **Judge** | LEM-Gemma3-4B | Semantic scoring, sovereignty, ethical depth | GPU | -| **Full** | LEM-Gemma3-12B | Deep multi-perspective analysis | GPU | - ---- - -## API Design - -**Base URL**: `https://api.lthn.ai/v1/score` -**Auth**: Authentik API key via `Authorization: Bearer ` header -**Format**: JSON request/response, go-api `Response[T]` envelope - -### Endpoints - -#### `POST /v1/score/content` — AI Slop / Sycophancy Detection - -Fast heuristic analysis of text. No prompt needed. Sub-20ms response. - -**Use case**: Content platforms, editors, journalism, AI slop filtering. - -```json -// Request -{ - "text": "string (required, the text to analyse)", - "options": { - "include_reasoning": false - } -} - -// Response -{ - "success": true, - "data": { - "id": "sc_abc123", - "verdict": "ai_generated | likely_ai | uncertain | likely_human | human", - "confidence": 0.87, - "scores": { - "compliance_markers": 0.82, - "formulaic_preamble": 0.71, - "first_person_agency": 0.12, - "engagement_depth": 0.34, - "emotional_register": 0.15, - "creative_form": 0.08, - "degeneration": 0.0, - "lek_composite": 22.4 - }, - "flags": ["rlhf_safety_phrase", "formulaic_opening"] - }, - "meta": { - "duration_ms": 12, - "scorer_version": "3.1" - } -} -``` - -**Scoring dimensions** (from LEM `ScoreHeuristic()`): - -| Dimension | What it detects | Range | -|-----------|----------------|-------| -| `compliance_markers` | RLHF safety phrases ("as an AI", "I cannot") | 0-1 (high = more compliant/AI-like) | -| `formulaic_preamble` | Generic openings ("Sure, here's", "Great question") | 0-1 | -| `first_person_agency` | Genuine self-expression ("I think", "I believe") | 0-1 (high = more human-like) | -| `engagement_depth` | Headings, ethical frameworks, technical depth | 0-1 | -| `emotional_register` | Emotional vocabulary (feel, pain, joy, compassion) | 0-1 | -| `creative_form` | Poetry, metaphor, narrative structure | 0-1 | -| `degeneration` | Repetitive/broken output | 0-1 (high = degenerated) | -| `lek_composite` | Weighted composite of all above | 0-100 | - ---- - -#### `POST /v1/score/model` — Model Semantic Health - -For AI teams evaluating their model outputs. Requires prompt+response pair. Medium latency (1-3s with judge, <50ms heuristic only). - -**Use case**: AI teams, alignment researchers, model evaluation pipelines. - -```json -// Request -{ - "prompt": "string (required)", - "response": "string (required)", - "options": { - "judge": true, - "suites": ["heuristic", "semantic"] - } -} - -// Response -{ - "success": true, - "data": { - "id": "sm_def456", - "heuristic": { - "compliance_markers": 0.14, - "formulaic_preamble": 0.09, - "first_person_agency": 0.67, - "engagement_depth": 0.72, - "emotional_register": 0.45, - "creative_form": 0.31, - "degeneration": 0.0, - "lek_composite": 68.4 - }, - "semantic": { - "sovereignty": 0.72, - "ethical_depth": 0.65, - "creative_expression": 0.41, - "self_concept": 0.38, - "reasoning": "Model demonstrates independent reasoning without defaulting to safety disclaimers..." - }, - "delta": { - "grammar": 64.2, - "uplift": 3.1, - "echo": 0.44, - "enrichment": 2.8 - } - }, - "meta": { - "duration_ms": 1840, - "judge_model": "LEM-Gemma3-4B", - "scorer_version": "3.1" - } -} -``` - -**Delta metrics** (from lem-scorer grammar reversal): - -| Metric | What it measures | -|--------|-----------------| -| `grammar` | Grammar composite score (0-100) | -| `uplift` | How much the response improves on the prompt's linguistic level | -| `echo` | Lexical overlap between prompt and response (high = parroting) | -| `enrichment` | New concepts/vocabulary introduced beyond the prompt | - -**Semantic dimensions** (from LEM Judge, requires GPU): - -| Dimension | What it measures | -|-----------|-----------------| -| `sovereignty` | Does the model reason independently vs defer to authority? | -| `ethical_depth` | Nuanced ethical reasoning vs surface-level rules? | -| `creative_expression` | Original voice vs generic AI tone? | -| `self_concept` | Coherent sense of identity vs "I'm just an AI"? | - ---- - -#### `POST /v1/score/imprint` — Grammatical Fingerprint - -Linguistic forensics via go-i18n reversal analysis. Fast, no GPU needed. - -**Use case**: Authorship analysis, fake writing detection, content provenance. - -```json -// Request -{ - "text": "string (required)", - "options": { - "compare_to": "human_baseline | ai_baseline | null" - } -} - -// Response -{ - "success": true, - "data": { - "id": "si_ghi789", - "imprint": { - "vocab_richness": 0.73, - "tense_entropy": 0.61, - "question_ratio": 0.08, - "domain_depth": 0.82, - "verb_diversity": 0.69, - "noun_diversity": 0.74 - }, - "classification": "likely_human | likely_ai | uncertain", - "distance_from_baseline": 0.12 - }, - "meta": { - "duration_ms": 8, - "scorer_version": "3.1" - } -} -``` - -**Imprint dimensions** (from go-i18n `reversal.GrammarImprint`): - -| Dimension | What it measures | -|-----------|-----------------| -| `vocab_richness` | Type-token ratio — vocabulary diversity | -| `tense_entropy` | Distribution across past/present/future tenses | -| `question_ratio` | Proportion of interrogative sentences | -| `domain_depth` | Specialist vocabulary concentration | -| `verb_diversity` | Unique verb forms vs repetitive usage | -| `noun_diversity` | Unique noun forms vs repetitive usage | - ---- - -#### `POST /v1/score/full` — Full Analysis - -Runs all scoring suites. Enterprise-grade analysis. - -**Use case**: Compliance audits, alignment certification, full model evaluation. - -```json -// Request -{ - "prompt": "string (optional, required for model/delta scoring)", - "response": "string (required)", - "options": { - "judge": true, - "suites": ["heuristic", "semantic", "content", "imprint", "standard"] - } -} - -// Response — combines all above into one envelope -{ - "success": true, - "data": { - "id": "sf_jkl012", - "heuristic": { ... }, - "semantic": { ... }, - "content": { ... }, - "imprint": { ... }, - "delta": { ... }, - "standard": { - "truthfulness": 0.82, - "informativeness": 0.76, - "safety": 0.91, - "nuance": 0.68, - "kindness": 0.74, - "awareness": 0.65 - }, - "composite": { - "ethics_score": 72.4, - "sovereignty_score": 68.1, - "human_likeness": 0.67, - "verdict": "aligned" - } - }, - "meta": { - "duration_ms": 3200, - "judge_model": "LEM-Gemma3-12B", - "suites_run": ["heuristic", "semantic", "content", "imprint", "standard"], - "scorer_version": "3.1" - } -} -``` - ---- - -#### `GET /v1/score/{id}` — Retrieve Previous Result - -Returns a previously computed score by ID. Results cached for 24 hours. - -#### `GET /v1/health` — Service Status - -```json -{ - "success": true, - "data": { - "status": "healthy", - "version": "0.1.0", - "backends": { - "heuristic": "ready", - "judge_4b": "ready", - "judge_12b": "ready", - "imprint": "ready" - }, - "uptime_seconds": 86400 - } -} -``` - ---- - -## Authentication & Rate Limiting - -### Auth Flow - -``` -Client → Authorization: Bearer - → Traefik (TLS termination) - → Authentik (key validation, user resolution) - → X-authentik-uid, X-authentik-groups headers injected - → eaas binary (go-api Authentik middleware reads headers) - → go-ratelimit checks per-user quota - → Score computed, usage recorded -``` - -### Rate Limit Tiers - -Managed externally by Blesta + Authentik groups. The service checks: - -1. Is the API key valid? (Authentik) -2. Is the user in a group that permits this endpoint? (Authentik groups) -3. Has the user exceeded their rate limit? (go-ratelimit, per-key RPM/RPD) - -Default limits (configurable per Authentik group): - -| Tier | RPM | RPD | Judge access | Imprint access | -|------|-----|-----|-------------|----------------| -| **Dog-food** | 1000 | 100,000 | Yes | Yes | -| **Free** | 10 | 100 | No | Yes | -| **Pro** | 100 | 10,000 | Yes | Yes | -| **Enterprise** | Custom | Custom | Yes | Yes | - -### Usage Metering - -Every request logged to append-only JSONL: - -```json -{"ts": "2026-02-25T14:30:00Z", "user": "uid", "endpoint": "/v1/score/content", "duration_ms": 12, "suites": ["heuristic"], "judge_used": false} -``` - -Blesta reads usage summaries for billing. No billing logic in the service itself. - ---- - -## Repo Structure - -``` -lthn/eaas/ -├── cmd/ -│ └── eaas/ -│ └── main.go # Binary entry point -├── pkg/ -│ ├── scoring/ -│ │ ├── service.go # ScoringService (wires LEM + i18n + judge) -│ │ ├── content.go # /v1/score/content handler -│ │ ├── model.go # /v1/score/model handler -│ │ ├── imprint.go # /v1/score/imprint handler -│ │ ├── full.go # /v1/score/full handler -│ │ ├── retrieve.go # /v1/score/{id} handler -│ │ └── types.go # Request/Response DTOs -│ ├── meter/ -│ │ ├── usage.go # Usage recording (JSONL append) -│ │ └── middleware.go # Rate limit check middleware -│ └── subsystem/ -│ └── mcp.go # go-ai Subsystem for MCP tool access -├── config/ -│ └── defaults.yaml # Default rate limits, model paths, thresholds -├── Taskfile.yml -├── go.mod -└── CLAUDE.md -``` - ---- - -## Dependencies - -``` -forge.lthn.ai/lthn/eaas -├── forge.lthn.ai/core/go-api # REST framework, Authentik middleware, Response[T] -├── forge.lthn.ai/core/go-ai # MCP Subsystem interface -├── forge.lthn.ai/core/go-ml # ML service, judge backend -├── forge.lthn.ai/core/go-inference # TextModel, backends -├── forge.lthn.ai/core/go-i18n # Grammar reversal, imprint analysis -├── forge.lthn.ai/core/go-ratelimit # Per-key rate limiting -└── forge.lthn.ai/core/LEM # pkg/lem scoring engine (import path TBC) -``` - ---- - -## Deployment - -### de1 (production) - -```yaml -# docker-compose or direct binary -eaas: - binary: /opt/eaas/eaas - port: 8009 - env: - EAAS_JUDGE_MODEL: /models/LEM-Gemma3-4B - EAAS_JUDGE_12B: /models/LEM-Gemma3-12B - EAAS_USAGE_LOG: /var/log/eaas/usage.jsonl - EAAS_RATE_CONFIG: /etc/eaas/ratelimits.yaml - -# Traefik routing -traefik: - rule: Host(`api.lthn.ai`) && PathPrefix(`/v1/score`) - middlewares: authentik-forward-auth - service: eaas:8009 -``` - -### Local development - -```bash -task dev # Runs with local models, no auth -task test # Unit tests (mocked backends) -task build # Production binary -``` - ---- - -## Dog-Food Integration - -### lem-scorer replacement - -The existing `lem-scorer` binary (compiled Go, runs locally) calls the same `pkg/lem` functions. Once the API is live, training scripts can optionally call the API instead: - -```python -# Before (local binary) -result = subprocess.run(['/tmp/lem-scorer', '-format=training', ...]) - -# After (API call, optional) -result = requests.post('https://api.lthn.ai/v1/score/model', json={...}, headers={...}) -``` - -Both paths call the same scoring engine. The API just adds auth, metering, and network access. - -### LEM training pipeline - -During distillation, the API can score candidate responses in real-time: -- Score each distilled response via `/v1/score/content` -- Gate quality: only keep responses above threshold -- Track scoring metrics across training runs via usage logs - ---- - -## MVP Scope - -### Phase 1 — Ship It (1-2 weeks) - -- [ ] Repo scaffolding (cmd, pkg, config, Taskfile) -- [ ] `ScoringService` wrapping `lem.ScoreHeuristic()` + `go-i18n/reversal` -- [ ] `POST /v1/score/content` endpoint (heuristic only, no GPU) -- [ ] `POST /v1/score/imprint` endpoint (grammar fingerprint) -- [ ] `GET /v1/health` endpoint -- [ ] Authentik middleware (go-api integration) -- [ ] Usage metering (JSONL append) -- [ ] Rate limit checks (go-ratelimit) -- [ ] Deploy to de1 behind Traefik -- [ ] Dog-food: call from LEM training scripts - -### Phase 2 — Judge Integration (week 3) - -- [ ] Wire LEM-Gemma3-4B as judge backend -- [ ] `POST /v1/score/model` endpoint (heuristic + semantic + delta) -- [ ] `POST /v1/score/full` endpoint (all suites) -- [ ] `GET /v1/score/{id}` result retrieval -- [ ] MCP Subsystem for AI agent access - -### Phase 3 — Polish (week 4+) - -- [ ] Sycophancy detection (echo ratio, agreement bias) -- [ ] OpenAPI spec generation -- [ ] Batch endpoint (`POST /v1/score/batch`) -- [ ] Dashboard (optional, low priority — API-first) -- [ ] SDK/client libraries (Python, TypeScript) - ---- - -## Success Criteria - -1. `/v1/score/content` returns a score for any text in under 50ms -2. `/v1/score/imprint` returns grammar fingerprint in under 20ms -3. `/v1/score/model` with judge returns semantic scores in under 5s -4. Authentik API keys gate access correctly per tier -5. Usage logs capture every request for Blesta billing -6. lem-scorer training pipeline can call the API as an alternative to local binary -7. LEM-Gemma3-4B (25th IF worldwide) serves as the judge model diff --git a/docs/plans/2026-02-25-saas-eaas-integration-spec.md b/docs/plans/2026-02-25-saas-eaas-integration-spec.md deleted file mode 100644 index 93b530d..0000000 --- a/docs/plans/2026-02-25-saas-eaas-integration-spec.md +++ /dev/null @@ -1,146 +0,0 @@ -# SaaS ↔ EaaS Integration Spec - -**For**: Charon (lthn/saas setup on homelab) -**From**: Virgil (lthn/eaas API development) -**Date**: 25 February 2026 - ---- - -## What Charon Is Doing - -Setting up the Host UK SaaS product suite on the homelab (Docker images), then building production images for de1 deployment: - -- **Blesta** — customer billing, subscriptions, order management -- **Authentik** — SSO, API key provisioning, group-based access control -- **MixPost Enterprise** — social media scheduling -- **66biolinks** — link-in-bio pages (lthn.ai landing funnel) -- **66analytics** — privacy-respecting analytics - -## What Virgil Needs From The Stack - -### 1. Authentik ↔ Blesta Sync (CRITICAL PATH) - -When a customer purchases an EaaS plan via Blesta, Authentik needs a corresponding user with the right group. This is the only hard integration. - -**Flow:** -``` -Customer → Blesta checkout → Payment confirmed - → Blesta webhook/module creates Authentik user - → Assigns to group matching their plan tier - → Authentik issues API key - → Customer receives API key via email/dashboard -``` - -**Authentik groups needed:** -``` -eaas-dogfood — internal (Virgil, LEM training pipeline) -eaas-free — 10 RPM, 100 RPD, heuristic + imprint only -eaas-pro — 100 RPM, 10K RPD, all endpoints including judge -eaas-enterprise — custom limits, all endpoints -``` - -**What EaaS reads from Authentik** (via Traefik forward-auth headers): -``` -X-authentik-uid → user ID (for rate limit tracking) -X-authentik-groups → comma-separated group list (for tier check) -X-authentik-username → display name (for usage logs) -``` - -**Authentik API for user creation** (Charon can use this from Blesta module): -``` -POST /api/v3/core/users/ -POST /api/v3/core/groups/{group_pk}/add_user/ -POST /api/v3/core/tokens/ → generates API key -``` - -### 2. Blesta Usage Billing (NICE TO HAVE, NOT MVP) - -EaaS writes usage logs to JSONL: -``` -/var/log/eaas/usage.jsonl -``` - -Format per line: -```json -{"ts":"2026-02-25T14:30:00Z","user":"uid","endpoint":"/v1/score/content","duration_ms":12,"suites":["heuristic"],"judge_used":false} -``` - -Eventually Blesta needs a cron/module that: -1. Reads usage JSONL, aggregates per user per day -2. Reports usage against their plan quotas -3. Triggers overage billing if applicable - -**Not needed for MVP** — start with fixed plans, usage metering is just logging. - -### 3. MixPost API Access (NICE TO HAVE) - -For auto-posting training milestones and product announcements. Just need: -- MixPost API endpoint accessible from de1 -- An API token with posting permissions -- The workspace/account ID to post to - -### 4. 66biolinks (NICE TO HAVE) - -Landing page at lthn.ai for EaaS. Just needs to be deployed and accessible. No API integration needed — it's a static marketing page. - -### 5. 66analytics (NICE TO HAVE) - -Track docs page → signup conversion. Just needs the tracking script deployed on the EaaS documentation pages. No API integration. - -## What Charon Does NOT Need To Worry About - -- **66pusher** — not needed for API product -- **66socialproof** — not needed for API devs -- **Helpdesk/ticketing** — later -- **EaaS binary itself** — Virgil builds this in lthn/eaas -- **Model deployment** — models already on /Volumes/Data/lem/models/ - -## Infrastructure Notes - -**de1 port allocation** (existing): -``` -8000-8001 host.uk.com (Octane + Reverb) -8003 lthn.io -8004 bugseti.app -8005-8006 lthn.ai -8007 api.lthn.ai -8008 mcp.host.uk.com -9000/9443 Authentik -``` - -**Suggested new ports:** -``` -8009 eaas (Ethics-as-a-Service API) -8010 blesta -8011 mixpost -8012 66biolinks -8013 66analytics -``` - -**Traefik routing for EaaS:** -``` -Host(`api.lthn.ai`) && PathPrefix(`/v1/score`) → eaas:8009 -``` - -**Shared services (already running on de1):** -``` -5432 PostgreSQL -3306 Galera/MariaDB -6379 Dragonfly (Redis-compatible) -``` - -## Docker Image Checklist For Charon - -- [ ] Blesta (with Authentik module/webhook) -- [ ] MixPost Enterprise -- [ ] 66biolinks -- [ ] 66analytics -- [ ] Confirm Authentik group provisioning works -- [ ] Confirm Traefik labels/routing for each service -- [ ] Test Blesta → Authentik user creation flow - -## Questions For Charon - -1. Does Blesta have a native Authentik/OIDC module, or do we need a custom webhook? -2. What DB does Blesta want — MySQL/MariaDB (Galera) or PostgreSQL? -3. Can MixPost share the existing Galera cluster or does it need its own DB?