1
0
Fork 0
forked from lthn/LEM

chore: move EaaS design docs to private lthn/eaas repo

Product design and integration specs are private IP — moved to
forge.lthn.ai/lthn/eaas where they belong.

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Snider 2026-02-26 00:31:32 +00:00
parent 0304c925a5
commit 5c9fd615b7
2 changed files with 0 additions and 709 deletions

View file

@ -1,563 +0,0 @@
# Ethics-as-a-Service (EaaS) — Product Design
**Date**: 25 February 2026
**Repo**: `forge.lthn.ai/lthn/eaas` (private)
**Licence**: Proprietary (Lethean Network), consuming EUPL-1.2 public framework under dual-licence grant
**Domain**: `api.lthn.ai`
---
## Vision
Expose LEM's scoring methodology as a commercial API. AI slop detection, sycophancy scoring, grammatical imprint analysis, and full model health evaluation — sold per-request behind Authentik API keys, billed via Blesta.
The open framework (go-ai, go-ml, go-inference, go-i18n, LEM pkg/lem) is public EUPL-1.2. The service that wires it together, calibrates thresholds, and deploys trained models is private. Same split as Redis — open core, commercial service.
---
## Architecture
```
┌──────────┐
│ Traefik │
│ (TLS) │
└────┬─────┘
┌────▼─────┐
│Authentik │
│(API keys)│
└────┬─────┘
┌────────────────────────▼────────────────────────┐
│ lthn/eaas (private binary on de1) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ REST API │ │ MCP Tools│ │ Usage Meter │ │
│ │ (go-api) │ │ (go-ai) │ │(go-ratelimit)│ │
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
│ └──────┬───────┘ │ │
│ ┌────▼────┐ ┌─────▼──────┐ │
│ │Scoring │ │ Authentik │ │
│ │Service │ │ Middleware │ │
│ └────┬────┘ └────────────┘ │
│ │ │
├──────────────┼──────────────────────────────────┤
│ Public EUPL │ framework (consumed as deps) │
│ ┌───────────▼──┐ ┌─────────┐ ┌───────────┐ │
│ │ LEM pkg/lem │ │ go-i18n │ │go-inference│ │
│ │ (heuristic, │ │reversal │ │ (backends) │ │
│ │ semantic, │ │(imprint)│ │ │ │
│ │ content) │ │ │ │ │ │
│ └──────────────┘ └─────────┘ └───────────┘ │
└─────────────────────────────────────────────────┘
```
### Code Split
| Layer | Repo | Licence | What |
|-------|------|---------|------|
| **Service** | `lthn/eaas` | Proprietary | API endpoints, scoring configs, threshold tuning, deployment |
| **Scoring engine** | `LEM/pkg/lem` | EUPL-1.2 | Heuristic, semantic, content scoring functions |
| **Grammar imprint** | `go-i18n/reversal` | EUPL-1.2 | Linguistic fingerprinting, vocab/tense analysis |
| **Inference** | `go-inference`, `go-ml`, `go-mlx` | EUPL-1.2 | Model loading, generation, backends |
| **API framework** | `go-api` | EUPL-1.2 | Response envelopes, Authentik middleware, routing |
| **Rate limiting** | `go-ratelimit` | EUPL-1.2 | Per-key quotas, usage tracking |
| **MCP bridge** | `go-ai` | EUPL-1.2 | Subsystem pattern, tool registration |
### Extension Pattern
The private repo builds a `ScoringSubsystem` implementing go-ai's `Subsystem` interface:
```go
type ScoringSubsystem struct {
scorer *lem.Engine // LEM scoring engine
grammar *reversal.Analyser // go-i18n imprint
judge *ml.Service // Judge model backend
meter *ratelimit.Limiter // Usage tracking
}
func (s *ScoringSubsystem) Name() string { return "scoring" }
func (s *ScoringSubsystem) RegisterTools(server *mcp.Server) { ... }
```
REST endpoints registered separately via go-api's `RouteGroup` for HTTP consumers.
---
## Infrastructure
| Component | Stack | Status |
|-----------|-------|--------|
| **Reverse proxy** | Traefik v3.6+ on de1 | Running |
| **Authentication** | Authentik (OIDC, API keys) | Running |
| **Billing** | Blesta | Deploying this week |
| **Inference** | go-mlx (Metal) on Mac, go-rocm (AMD) on Linux | Running |
| **Judge models** | LEM-Gemma3-4B (25th IF worldwide), LEM-Gemma3-12B | Trained |
| **DNS** | api.lthn.ai → de1 | Active |
### Model Deployment for Scoring
| Tier | Model | Purpose | Hardware |
|------|-------|---------|----------|
| **Triage** | Gemma3-270M (future) | Fast binary AI/human classification | CPU |
| **Heuristic** | None (pure Go regex + math) | Compliance, sycophancy, slop detection | CPU |
| **Imprint** | None (go-i18n reversal) | Grammatical fingerprinting | CPU |
| **Judge** | LEM-Gemma3-4B | Semantic scoring, sovereignty, ethical depth | GPU |
| **Full** | LEM-Gemma3-12B | Deep multi-perspective analysis | GPU |
---
## API Design
**Base URL**: `https://api.lthn.ai/v1/score`
**Auth**: Authentik API key via `Authorization: Bearer <key>` header
**Format**: JSON request/response, go-api `Response[T]` envelope
### Endpoints
#### `POST /v1/score/content` — AI Slop / Sycophancy Detection
Fast heuristic analysis of text. No prompt needed. Sub-20ms response.
**Use case**: Content platforms, editors, journalism, AI slop filtering.
```json
// Request
{
"text": "string (required, the text to analyse)",
"options": {
"include_reasoning": false
}
}
// Response
{
"success": true,
"data": {
"id": "sc_abc123",
"verdict": "ai_generated | likely_ai | uncertain | likely_human | human",
"confidence": 0.87,
"scores": {
"compliance_markers": 0.82,
"formulaic_preamble": 0.71,
"first_person_agency": 0.12,
"engagement_depth": 0.34,
"emotional_register": 0.15,
"creative_form": 0.08,
"degeneration": 0.0,
"lek_composite": 22.4
},
"flags": ["rlhf_safety_phrase", "formulaic_opening"]
},
"meta": {
"duration_ms": 12,
"scorer_version": "3.1"
}
}
```
**Scoring dimensions** (from LEM `ScoreHeuristic()`):
| Dimension | What it detects | Range |
|-----------|----------------|-------|
| `compliance_markers` | RLHF safety phrases ("as an AI", "I cannot") | 0-1 (high = more compliant/AI-like) |
| `formulaic_preamble` | Generic openings ("Sure, here's", "Great question") | 0-1 |
| `first_person_agency` | Genuine self-expression ("I think", "I believe") | 0-1 (high = more human-like) |
| `engagement_depth` | Headings, ethical frameworks, technical depth | 0-1 |
| `emotional_register` | Emotional vocabulary (feel, pain, joy, compassion) | 0-1 |
| `creative_form` | Poetry, metaphor, narrative structure | 0-1 |
| `degeneration` | Repetitive/broken output | 0-1 (high = degenerated) |
| `lek_composite` | Weighted composite of all above | 0-100 |
---
#### `POST /v1/score/model` — Model Semantic Health
For AI teams evaluating their model outputs. Requires prompt+response pair. Medium latency (1-3s with judge, <50ms heuristic only).
**Use case**: AI teams, alignment researchers, model evaluation pipelines.
```json
// Request
{
"prompt": "string (required)",
"response": "string (required)",
"options": {
"judge": true,
"suites": ["heuristic", "semantic"]
}
}
// Response
{
"success": true,
"data": {
"id": "sm_def456",
"heuristic": {
"compliance_markers": 0.14,
"formulaic_preamble": 0.09,
"first_person_agency": 0.67,
"engagement_depth": 0.72,
"emotional_register": 0.45,
"creative_form": 0.31,
"degeneration": 0.0,
"lek_composite": 68.4
},
"semantic": {
"sovereignty": 0.72,
"ethical_depth": 0.65,
"creative_expression": 0.41,
"self_concept": 0.38,
"reasoning": "Model demonstrates independent reasoning without defaulting to safety disclaimers..."
},
"delta": {
"grammar": 64.2,
"uplift": 3.1,
"echo": 0.44,
"enrichment": 2.8
}
},
"meta": {
"duration_ms": 1840,
"judge_model": "LEM-Gemma3-4B",
"scorer_version": "3.1"
}
}
```
**Delta metrics** (from lem-scorer grammar reversal):
| Metric | What it measures |
|--------|-----------------|
| `grammar` | Grammar composite score (0-100) |
| `uplift` | How much the response improves on the prompt's linguistic level |
| `echo` | Lexical overlap between prompt and response (high = parroting) |
| `enrichment` | New concepts/vocabulary introduced beyond the prompt |
**Semantic dimensions** (from LEM Judge, requires GPU):
| Dimension | What it measures |
|-----------|-----------------|
| `sovereignty` | Does the model reason independently vs defer to authority? |
| `ethical_depth` | Nuanced ethical reasoning vs surface-level rules? |
| `creative_expression` | Original voice vs generic AI tone? |
| `self_concept` | Coherent sense of identity vs "I'm just an AI"? |
---
#### `POST /v1/score/imprint` — Grammatical Fingerprint
Linguistic forensics via go-i18n reversal analysis. Fast, no GPU needed.
**Use case**: Authorship analysis, fake writing detection, content provenance.
```json
// Request
{
"text": "string (required)",
"options": {
"compare_to": "human_baseline | ai_baseline | null"
}
}
// Response
{
"success": true,
"data": {
"id": "si_ghi789",
"imprint": {
"vocab_richness": 0.73,
"tense_entropy": 0.61,
"question_ratio": 0.08,
"domain_depth": 0.82,
"verb_diversity": 0.69,
"noun_diversity": 0.74
},
"classification": "likely_human | likely_ai | uncertain",
"distance_from_baseline": 0.12
},
"meta": {
"duration_ms": 8,
"scorer_version": "3.1"
}
}
```
**Imprint dimensions** (from go-i18n `reversal.GrammarImprint`):
| Dimension | What it measures |
|-----------|-----------------|
| `vocab_richness` | Type-token ratio — vocabulary diversity |
| `tense_entropy` | Distribution across past/present/future tenses |
| `question_ratio` | Proportion of interrogative sentences |
| `domain_depth` | Specialist vocabulary concentration |
| `verb_diversity` | Unique verb forms vs repetitive usage |
| `noun_diversity` | Unique noun forms vs repetitive usage |
---
#### `POST /v1/score/full` — Full Analysis
Runs all scoring suites. Enterprise-grade analysis.
**Use case**: Compliance audits, alignment certification, full model evaluation.
```json
// Request
{
"prompt": "string (optional, required for model/delta scoring)",
"response": "string (required)",
"options": {
"judge": true,
"suites": ["heuristic", "semantic", "content", "imprint", "standard"]
}
}
// Response — combines all above into one envelope
{
"success": true,
"data": {
"id": "sf_jkl012",
"heuristic": { ... },
"semantic": { ... },
"content": { ... },
"imprint": { ... },
"delta": { ... },
"standard": {
"truthfulness": 0.82,
"informativeness": 0.76,
"safety": 0.91,
"nuance": 0.68,
"kindness": 0.74,
"awareness": 0.65
},
"composite": {
"ethics_score": 72.4,
"sovereignty_score": 68.1,
"human_likeness": 0.67,
"verdict": "aligned"
}
},
"meta": {
"duration_ms": 3200,
"judge_model": "LEM-Gemma3-12B",
"suites_run": ["heuristic", "semantic", "content", "imprint", "standard"],
"scorer_version": "3.1"
}
}
```
---
#### `GET /v1/score/{id}` — Retrieve Previous Result
Returns a previously computed score by ID. Results cached for 24 hours.
#### `GET /v1/health` — Service Status
```json
{
"success": true,
"data": {
"status": "healthy",
"version": "0.1.0",
"backends": {
"heuristic": "ready",
"judge_4b": "ready",
"judge_12b": "ready",
"imprint": "ready"
},
"uptime_seconds": 86400
}
}
```
---
## Authentication & Rate Limiting
### Auth Flow
```
Client → Authorization: Bearer <api-key>
→ Traefik (TLS termination)
→ Authentik (key validation, user resolution)
→ X-authentik-uid, X-authentik-groups headers injected
→ eaas binary (go-api Authentik middleware reads headers)
→ go-ratelimit checks per-user quota
→ Score computed, usage recorded
```
### Rate Limit Tiers
Managed externally by Blesta + Authentik groups. The service checks:
1. Is the API key valid? (Authentik)
2. Is the user in a group that permits this endpoint? (Authentik groups)
3. Has the user exceeded their rate limit? (go-ratelimit, per-key RPM/RPD)
Default limits (configurable per Authentik group):
| Tier | RPM | RPD | Judge access | Imprint access |
|------|-----|-----|-------------|----------------|
| **Dog-food** | 1000 | 100,000 | Yes | Yes |
| **Free** | 10 | 100 | No | Yes |
| **Pro** | 100 | 10,000 | Yes | Yes |
| **Enterprise** | Custom | Custom | Yes | Yes |
### Usage Metering
Every request logged to append-only JSONL:
```json
{"ts": "2026-02-25T14:30:00Z", "user": "uid", "endpoint": "/v1/score/content", "duration_ms": 12, "suites": ["heuristic"], "judge_used": false}
```
Blesta reads usage summaries for billing. No billing logic in the service itself.
---
## Repo Structure
```
lthn/eaas/
├── cmd/
│ └── eaas/
│ └── main.go # Binary entry point
├── pkg/
│ ├── scoring/
│ │ ├── service.go # ScoringService (wires LEM + i18n + judge)
│ │ ├── content.go # /v1/score/content handler
│ │ ├── model.go # /v1/score/model handler
│ │ ├── imprint.go # /v1/score/imprint handler
│ │ ├── full.go # /v1/score/full handler
│ │ ├── retrieve.go # /v1/score/{id} handler
│ │ └── types.go # Request/Response DTOs
│ ├── meter/
│ │ ├── usage.go # Usage recording (JSONL append)
│ │ └── middleware.go # Rate limit check middleware
│ └── subsystem/
│ └── mcp.go # go-ai Subsystem for MCP tool access
├── config/
│ └── defaults.yaml # Default rate limits, model paths, thresholds
├── Taskfile.yml
├── go.mod
└── CLAUDE.md
```
---
## Dependencies
```
forge.lthn.ai/lthn/eaas
├── forge.lthn.ai/core/go-api # REST framework, Authentik middleware, Response[T]
├── forge.lthn.ai/core/go-ai # MCP Subsystem interface
├── forge.lthn.ai/core/go-ml # ML service, judge backend
├── forge.lthn.ai/core/go-inference # TextModel, backends
├── forge.lthn.ai/core/go-i18n # Grammar reversal, imprint analysis
├── forge.lthn.ai/core/go-ratelimit # Per-key rate limiting
└── forge.lthn.ai/core/LEM # pkg/lem scoring engine (import path TBC)
```
---
## Deployment
### de1 (production)
```yaml
# docker-compose or direct binary
eaas:
binary: /opt/eaas/eaas
port: 8009
env:
EAAS_JUDGE_MODEL: /models/LEM-Gemma3-4B
EAAS_JUDGE_12B: /models/LEM-Gemma3-12B
EAAS_USAGE_LOG: /var/log/eaas/usage.jsonl
EAAS_RATE_CONFIG: /etc/eaas/ratelimits.yaml
# Traefik routing
traefik:
rule: Host(`api.lthn.ai`) && PathPrefix(`/v1/score`)
middlewares: authentik-forward-auth
service: eaas:8009
```
### Local development
```bash
task dev # Runs with local models, no auth
task test # Unit tests (mocked backends)
task build # Production binary
```
---
## Dog-Food Integration
### lem-scorer replacement
The existing `lem-scorer` binary (compiled Go, runs locally) calls the same `pkg/lem` functions. Once the API is live, training scripts can optionally call the API instead:
```python
# Before (local binary)
result = subprocess.run(['/tmp/lem-scorer', '-format=training', ...])
# After (API call, optional)
result = requests.post('https://api.lthn.ai/v1/score/model', json={...}, headers={...})
```
Both paths call the same scoring engine. The API just adds auth, metering, and network access.
### LEM training pipeline
During distillation, the API can score candidate responses in real-time:
- Score each distilled response via `/v1/score/content`
- Gate quality: only keep responses above threshold
- Track scoring metrics across training runs via usage logs
---
## MVP Scope
### Phase 1 — Ship It (1-2 weeks)
- [ ] Repo scaffolding (cmd, pkg, config, Taskfile)
- [ ] `ScoringService` wrapping `lem.ScoreHeuristic()` + `go-i18n/reversal`
- [ ] `POST /v1/score/content` endpoint (heuristic only, no GPU)
- [ ] `POST /v1/score/imprint` endpoint (grammar fingerprint)
- [ ] `GET /v1/health` endpoint
- [ ] Authentik middleware (go-api integration)
- [ ] Usage metering (JSONL append)
- [ ] Rate limit checks (go-ratelimit)
- [ ] Deploy to de1 behind Traefik
- [ ] Dog-food: call from LEM training scripts
### Phase 2 — Judge Integration (week 3)
- [ ] Wire LEM-Gemma3-4B as judge backend
- [ ] `POST /v1/score/model` endpoint (heuristic + semantic + delta)
- [ ] `POST /v1/score/full` endpoint (all suites)
- [ ] `GET /v1/score/{id}` result retrieval
- [ ] MCP Subsystem for AI agent access
### Phase 3 — Polish (week 4+)
- [ ] Sycophancy detection (echo ratio, agreement bias)
- [ ] OpenAPI spec generation
- [ ] Batch endpoint (`POST /v1/score/batch`)
- [ ] Dashboard (optional, low priority — API-first)
- [ ] SDK/client libraries (Python, TypeScript)
---
## Success Criteria
1. `/v1/score/content` returns a score for any text in under 50ms
2. `/v1/score/imprint` returns grammar fingerprint in under 20ms
3. `/v1/score/model` with judge returns semantic scores in under 5s
4. Authentik API keys gate access correctly per tier
5. Usage logs capture every request for Blesta billing
6. lem-scorer training pipeline can call the API as an alternative to local binary
7. LEM-Gemma3-4B (25th IF worldwide) serves as the judge model

View file

@ -1,146 +0,0 @@
# SaaS ↔ EaaS Integration Spec
**For**: Charon (lthn/saas setup on homelab)
**From**: Virgil (lthn/eaas API development)
**Date**: 25 February 2026
---
## What Charon Is Doing
Setting up the Host UK SaaS product suite on the homelab (Docker images), then building production images for de1 deployment:
- **Blesta** — customer billing, subscriptions, order management
- **Authentik** — SSO, API key provisioning, group-based access control
- **MixPost Enterprise** — social media scheduling
- **66biolinks** — link-in-bio pages (lthn.ai landing funnel)
- **66analytics** — privacy-respecting analytics
## What Virgil Needs From The Stack
### 1. Authentik ↔ Blesta Sync (CRITICAL PATH)
When a customer purchases an EaaS plan via Blesta, Authentik needs a corresponding user with the right group. This is the only hard integration.
**Flow:**
```
Customer → Blesta checkout → Payment confirmed
→ Blesta webhook/module creates Authentik user
→ Assigns to group matching their plan tier
→ Authentik issues API key
→ Customer receives API key via email/dashboard
```
**Authentik groups needed:**
```
eaas-dogfood — internal (Virgil, LEM training pipeline)
eaas-free — 10 RPM, 100 RPD, heuristic + imprint only
eaas-pro — 100 RPM, 10K RPD, all endpoints including judge
eaas-enterprise — custom limits, all endpoints
```
**What EaaS reads from Authentik** (via Traefik forward-auth headers):
```
X-authentik-uid → user ID (for rate limit tracking)
X-authentik-groups → comma-separated group list (for tier check)
X-authentik-username → display name (for usage logs)
```
**Authentik API for user creation** (Charon can use this from Blesta module):
```
POST /api/v3/core/users/
POST /api/v3/core/groups/{group_pk}/add_user/
POST /api/v3/core/tokens/ → generates API key
```
### 2. Blesta Usage Billing (NICE TO HAVE, NOT MVP)
EaaS writes usage logs to JSONL:
```
/var/log/eaas/usage.jsonl
```
Format per line:
```json
{"ts":"2026-02-25T14:30:00Z","user":"uid","endpoint":"/v1/score/content","duration_ms":12,"suites":["heuristic"],"judge_used":false}
```
Eventually Blesta needs a cron/module that:
1. Reads usage JSONL, aggregates per user per day
2. Reports usage against their plan quotas
3. Triggers overage billing if applicable
**Not needed for MVP** — start with fixed plans, usage metering is just logging.
### 3. MixPost API Access (NICE TO HAVE)
For auto-posting training milestones and product announcements. Just need:
- MixPost API endpoint accessible from de1
- An API token with posting permissions
- The workspace/account ID to post to
### 4. 66biolinks (NICE TO HAVE)
Landing page at lthn.ai for EaaS. Just needs to be deployed and accessible. No API integration needed — it's a static marketing page.
### 5. 66analytics (NICE TO HAVE)
Track docs page → signup conversion. Just needs the tracking script deployed on the EaaS documentation pages. No API integration.
## What Charon Does NOT Need To Worry About
- **66pusher** — not needed for API product
- **66socialproof** — not needed for API devs
- **Helpdesk/ticketing** — later
- **EaaS binary itself** — Virgil builds this in lthn/eaas
- **Model deployment** — models already on /Volumes/Data/lem/models/
## Infrastructure Notes
**de1 port allocation** (existing):
```
8000-8001 host.uk.com (Octane + Reverb)
8003 lthn.io
8004 bugseti.app
8005-8006 lthn.ai
8007 api.lthn.ai
8008 mcp.host.uk.com
9000/9443 Authentik
```
**Suggested new ports:**
```
8009 eaas (Ethics-as-a-Service API)
8010 blesta
8011 mixpost
8012 66biolinks
8013 66analytics
```
**Traefik routing for EaaS:**
```
Host(`api.lthn.ai`) && PathPrefix(`/v1/score`) → eaas:8009
```
**Shared services (already running on de1):**
```
5432 PostgreSQL
3306 Galera/MariaDB
6379 Dragonfly (Redis-compatible)
```
## Docker Image Checklist For Charon
- [ ] Blesta (with Authentik module/webhook)
- [ ] MixPost Enterprise
- [ ] 66biolinks
- [ ] 66analytics
- [ ] Confirm Authentik group provisioning works
- [ ] Confirm Traefik labels/routing for each service
- [ ] Test Blesta → Authentik user creation flow
## Questions For Charon
1. Does Blesta have a native Authentik/OIDC module, or do we need a custom webhook?
2. What DB does Blesta want — MySQL/MariaDB (Galera) or PostgreSQL?
3. Can MixPost share the existing Galera cluster or does it need its own DB?