docs: add Studio multimedia pipeline design
Smart/dumb architecture — LEM produces JSON manifests (creative decisions), ffmpeg executes mechanically. Remote-first GPU on homelab. Five-phase delivery targeting April demo for OF agency video remixing use case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
f3854f1077
commit
243ab4ebbe
1 changed files with 158 additions and 0 deletions
158
docs/plans/2026-03-08-studio-multimedia-pipeline-design.md
Normal file
158
docs/plans/2026-03-08-studio-multimedia-pipeline-design.md
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
# Studio: Multimedia Pipeline Design
|
||||
|
||||
**Date:** 8 March 2026
|
||||
**Status:** Approved
|
||||
|
||||
## Goal
|
||||
|
||||
Local AI multimedia pipeline for video remixing, content creation, and voice interaction. Runs as a CorePHP service (Studio) dispatching GPU work to homelab infrastructure. First client: OF agency remixing existing footage into TikTok-ready variants.
|
||||
|
||||
## Architecture
|
||||
|
||||
Studio is a job orchestrator. LEM handles creative decisions (smart layer), ffmpeg and GPU services handle execution (dumb layer). LEM never touches video frames — it produces JSON manifests that the execution layer consumes mechanically.
|
||||
|
||||
```
|
||||
Studio (CorePHP, lthn.ai/lthn.sh)
|
||||
├── Livewire UI (studio.lthn.ai)
|
||||
├── Artisan Commands (CLI)
|
||||
└── API Routes (/api/studio/*)
|
||||
│
|
||||
▼
|
||||
Studio Actions (RemixVideo, GenerateManifest, etc.)
|
||||
│
|
||||
Redis Job Queue
|
||||
│
|
||||
├── Ollama (LEM fleet) ─── Creative decisions, scripts, captions
|
||||
├── Whisper Service ────── Transcribe source footage, STT
|
||||
├── TTS Service ────────── Voiceover generation
|
||||
├── ffmpeg Worker ──────── Render manifests to video
|
||||
└── ComfyUI (Phase 2) ─── Image gen, thumbnails, overlays
|
||||
```
|
||||
|
||||
All GPU services are Docker containers on the homelab (or any GPU server). Studio dispatches over HTTP. No local GPU dependency — remote-first from day one.
|
||||
|
||||
## Library & Cataloguing
|
||||
|
||||
Source material catalogued across three stores:
|
||||
|
||||
- **PG** (`studio_assets`): Metadata — filename, duration, resolution, tags (season/theme/mood), workspace
|
||||
- **Qdrant**: Vector embeddings from Whisper transcripts + CLIP image embeddings (phase 2). Semantic search
|
||||
- **Filesystem**: Raw files on homelab storage, PG references paths
|
||||
- **.md catalogue files**: Human-readable collection descriptions, style guides, brand notes. LEM reads as context
|
||||
|
||||
Query flow:
|
||||
```
|
||||
Brief ("summer lollipop TikTok, 15s, upbeat")
|
||||
→ LEM queries PG for tagged assets
|
||||
→ LEM queries Qdrant for semantic matches
|
||||
→ LEM reads collection .md for style context
|
||||
→ LEM outputs manifest JSON
|
||||
```
|
||||
|
||||
## Manifest Format
|
||||
|
||||
LEM produces, ffmpeg consumes. No AI in execution.
|
||||
|
||||
```json
|
||||
{
|
||||
"template": "tiktok-15s",
|
||||
"clips": [
|
||||
{"asset_id": 42, "start": 3.2, "end": 8.1, "order": 1},
|
||||
{"asset_id": 17, "start": 0.0, "end": 5.5, "order": 2}
|
||||
],
|
||||
"captions": [
|
||||
{"text": "Summer vibes only", "at": 0.5, "duration": 3, "style": "bold-center"}
|
||||
],
|
||||
"audio": {"track": "original", "fade_in": 0.5},
|
||||
"output": {"format": "mp4", "resolution": "1080x1920", "fps": 30}
|
||||
}
|
||||
```
|
||||
|
||||
Variants: LEM produces multiple manifests from the same brief. Worker renders each independently.
|
||||
|
||||
## GPU Services (Homelab)
|
||||
|
||||
| Service | Container | Port | Model | Purpose |
|
||||
|---------|-----------|------|-------|---------|
|
||||
| Ollama | studio-ollama | 11434 | LEM fleet | Creative decisions, scripts, captions |
|
||||
| Whisper | studio-whisper | 9100 | whisper-large-v3-turbo | Transcribe footage, STT |
|
||||
| TTS | studio-tts | 9200 | Kokoro/Parler | Voiceover generation |
|
||||
| ffmpeg Worker | studio-worker | — | n/a | Queue consumer, renders manifests |
|
||||
| ComfyUI | studio-comfyui | 8188 | Flux/SD3.5 | Image gen, thumbnails (Phase 2) |
|
||||
|
||||
Shared with existing homelab: noc-net Docker network, Traefik, PG, Qdrant. Each service exposes REST, Studio POSTs work and gets callbacks.
|
||||
|
||||
Deployment: Ansible playbook per service, ROCm Docker images for GPU services.
|
||||
|
||||
## CorePHP Module
|
||||
|
||||
`app/Mod/Studio/` — same patterns as LEM module.
|
||||
|
||||
**Actions:**
|
||||
- `CatalogueAsset::run()` — ingest, extract metadata, generate embeddings
|
||||
- `GenerateManifest::run()` — brief + library → LEM → manifest JSON
|
||||
- `RenderManifest::run()` — dispatch to ffmpeg worker
|
||||
- `TranscribeAsset::run()` — send to Whisper, store transcript
|
||||
- `SynthesiseSpeech::run()` — send to TTS, return audio
|
||||
|
||||
**Artisan commands:**
|
||||
- `studio:catalogue` — batch ingest directory
|
||||
- `studio:remix` — brief in, rendered videos out
|
||||
- `studio:transcribe` — batch transcribe library
|
||||
|
||||
**API routes** (`/api/studio/*`):
|
||||
- `POST /remix` — submit brief, get job ID
|
||||
- `GET /remix/{id}` — poll status, get output URLs
|
||||
- `POST /assets` — upload/catalogue
|
||||
- `GET /assets` — search library
|
||||
|
||||
**Livewire UI:**
|
||||
- Asset browser with tag/search
|
||||
- Remix form — pick assets or let LEM choose, enter brief, select template
|
||||
- Job status + preview
|
||||
- Download/share
|
||||
|
||||
**Config:** `config/studio.php` — GPU endpoints, templates, Qdrant collection, storage paths.
|
||||
|
||||
## Phased Delivery
|
||||
|
||||
### Phase 1 — Foundation (before April)
|
||||
- Studio module scaffolding (actions, routes, commands)
|
||||
- Asset cataloguing (upload, PG metadata, Whisper transcripts)
|
||||
- Whisper service on homelab
|
||||
- `studio:transcribe` end to end
|
||||
- Basic Livewire asset browser
|
||||
|
||||
### Phase 2 — Remix Pipeline
|
||||
- Manifest format finalised
|
||||
- LEM integration via Ollama (brief → manifest)
|
||||
- ffmpeg worker on homelab
|
||||
- `studio:remix` CLI + API
|
||||
- Livewire remix form + job status
|
||||
|
||||
### Phase 3 — Voice & TTS
|
||||
- TTS service on homelab (Kokoro)
|
||||
- Voice interface: Whisper STT → LEM → TTS
|
||||
- Voiceover generation for scripts
|
||||
|
||||
### Phase 4 — Visual Generation
|
||||
- ComfyUI on homelab with Flux/SD3.5
|
||||
- Thumbnail generation
|
||||
- Image overlays in manifests
|
||||
- Video generation via Wan2.1 (experimental)
|
||||
|
||||
### Phase 5 — Production
|
||||
- Full library from agency
|
||||
- Authentik account for client
|
||||
- studio.lthn.ai live
|
||||
- Usage tracking via 66analytics
|
||||
|
||||
Phase 1 + 2 = April demo. Upload videos, enter brief, get remixed TikToks back.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Smart/dumb separation**: LEM produces prompts and manifests (creative), ffmpeg executes (mechanical). Value is in the creative layer.
|
||||
- **Remote-first GPU**: All inference on homelab/GPU server, never local. Easy to scale to cloud later.
|
||||
- **Manifest-driven**: JSON contract between LEM and execution. Either side can evolve independently.
|
||||
- **Same Action pattern**: CLI and API call identical actions. UI is just a thin Livewire layer.
|
||||
- **Existing infra**: PG, Redis, Qdrant, Ollama, Traefik, Authentik — all already deployed.
|
||||
Loading…
Add table
Reference in a new issue