docs/docs/php/features/studio.md

315 lines
9.9 KiB
Markdown
Raw Normal View History

# Studio Multimedia Pipeline
Studio is a CorePHP module that orchestrates video remixing, transcription, voice synthesis, and image generation by dispatching GPU work to remote services. It separates creative decisions (LEM/Ollama) from mechanical execution (ffmpeg, Whisper, TTS, ComfyUI).
## Architecture
Studio is a job orchestrator, not a renderer. All GPU-intensive work runs on remote Docker services accessed over HTTP.
```
Studio Module (CorePHP)
├── Livewire UI (asset browser, remix form, voice, thumbnails)
├── Artisan Commands (CLI)
└── API Routes (/api/studio/*)
Actions (CatalogueAsset, GenerateManifest, RenderManifest, etc.)
Redis Job Queue
├── Ollama (LEM) ─────── Creative decisions, scripts, manifests
├── Whisper ───────────── Speech-to-text transcription
├── Kokoro TTS ────────── Voiceover generation
├── ffmpeg Worker ─────── Video rendering from manifests
└── ComfyUI ──────────── Image generation, thumbnails
```
### Smart/Dumb Separation
LEM produces JSON manifests (the creative layer). ffmpeg and GPU services consume them mechanically (the execution layer). Neither side knows about the other's internals — the manifest format is the contract.
## Module Structure
The Studio module lives at `app/Mod/Studio/` and follows standard CorePHP patterns:
```
app/Mod/Studio/
├── Boot.php # Lifecycle events (API, Console, Web)
├── Actions/
│ ├── CatalogueAsset.php # Ingest files, extract metadata
│ ├── TranscribeAsset.php # Send to Whisper, store transcript
│ ├── GenerateManifest.php # Brief + library → LEM → manifest JSON
│ ├── RenderManifest.php # Dispatch manifest to ffmpeg worker
│ ├── SynthesiseSpeech.php # Text → TTS → audio file
│ ├── GenerateVoiceover.php # Script → voiced audio for remix
│ ├── GenerateImage.php # Prompt → ComfyUI → image
│ ├── GenerateThumbnail.php # Asset → thumbnail image
│ └── BatchRemix.php # Queue multiple remix jobs
├── Console/
│ ├── Catalogue.php # studio:catalogue — batch ingest
│ ├── Transcribe.php # studio:transcribe — batch transcription
│ ├── Remix.php # studio:remix — brief in, video out
│ ├── Voice.php # studio:voice — text-to-speech
│ ├── Thumbnail.php # studio:thumbnail — generate thumbnails
│ └── BatchRemixCommand.php # studio:batch-remix — queue batch jobs
├── Controllers/Api/
│ ├── AssetController.php # GET/POST /api/studio/assets
│ ├── RemixController.php # POST /api/studio/remix
│ ├── VoiceController.php # POST /api/studio/voice
│ └── ImageController.php # POST /api/studio/images/thumbnail
├── Models/
│ ├── StudioAsset.php # Multimedia asset with metadata
│ └── StudioJob.php # Job tracking (status, manifest, output)
├── Livewire/
│ ├── AssetBrowserPage.php # Browse/search/tag assets
│ ├── RemixPage.php # Remix form + job status
│ ├── VoicePage.php # Voice synthesis interface
│ └── ThumbnailPage.php # Thumbnail generator
└── Routes/
├── api.php # REST API endpoints
└── web.php # Livewire page routes
```
## Asset Cataloguing
Assets are multimedia files (video, image, audio) tracked in the `studio_assets` table with metadata including duration, resolution, tags, and transcripts.
### Ingesting Assets
```php
use Mod\Studio\Actions\CatalogueAsset;
// From an uploaded file
$asset = CatalogueAsset::run($uploadedFile, ['summer', 'beach']);
// From an existing storage path
$asset = CatalogueAsset::run('studio/raw/clip-001.mp4', ['interview']);
```
Only `video/*`, `image/*`, and `audio/*` MIME types are accepted.
### CLI Batch Ingest
```bash
php artisan studio:catalogue /path/to/media --tags=summer,promo
```
### Querying Assets
```php
use Mod\Studio\Models\StudioAsset;
// By type
$videos = StudioAsset::videos()->get();
$images = StudioAsset::images()->get();
$audio = StudioAsset::audio()->get();
// By tag
$summer = StudioAsset::tagged('summer')->get();
```
## Transcription
Transcription sends assets to a Whisper service and stores the returned text and detected language.
```php
use Mod\Studio\Actions\TranscribeAsset;
$asset = TranscribeAsset::run($asset);
echo $asset->transcript; // "Hello and welcome..."
echo $asset->transcript_language; // "en"
```
The action handles missing files and API failures gracefully — it returns the asset unchanged without throwing.
### CLI Batch Transcription
```bash
php artisan studio:transcribe
```
## Manifest-Driven Remixing
The remix pipeline has two stages: manifest generation (creative) and rendering (mechanical).
### Generating Manifests
```php
use Mod\Studio\Actions\GenerateManifest;
$job = GenerateManifest::run(
brief: 'Create a 15-second upbeat TikTok from the summer footage',
template: 'tiktok-15s',
);
// $job->manifest contains the JSON manifest
```
The action collects all video assets from the library, sends them as context to Ollama along with the brief, and parses the returned JSON manifest.
### Manifest Format
```json
{
"clips": [
{"asset_id": 42, "start_ms": 3200, "end_ms": 8100, "effects": ["fade_in"]},
{"asset_id": 17, "start_ms": 0, "end_ms": 5500, "effects": ["crossfade"]}
],
"audio": {"track": "original"},
"voiceover": {"script": "Summer vibes only", "voice": "default", "volume": 0.8},
"overlays": [
{"type": "image", "asset_id": 5, "at": 0.5, "duration": 3.0, "position": "bottom-right", "opacity": 0.8}
]
}
```
### Rendering
```php
use Mod\Studio\Actions\RenderManifest;
$job = RenderManifest::run($job);
```
This dispatches the manifest to the ffmpeg worker service, which renders the video and calls back when complete.
### CLI Remix
```bash
php artisan studio:remix "Create a relaxing travel montage" --template=tiktok-30s
```
## Voice & TTS
```php
use Mod\Studio\Actions\SynthesiseSpeech;
$audio = SynthesiseSpeech::run(
text: 'Welcome to our channel',
voice: 'default',
);
```
### CLI
```bash
php artisan studio:voice "Welcome to our channel" --voice=default
```
## Image Generation
Thumbnails and image overlays use ComfyUI:
```php
use Mod\Studio\Actions\GenerateThumbnail;
$thumbnail = GenerateThumbnail::run($asset);
```
### CLI
```bash
php artisan studio:thumbnail --asset=42
```
## API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/api/studio/assets` | List assets |
| `GET` | `/api/studio/assets/{id}` | Show asset details |
| `POST` | `/api/studio/assets` | Upload/catalogue asset |
| `POST` | `/api/studio/remix` | Submit remix brief |
| `GET` | `/api/studio/remix/{id}` | Poll job status |
| `POST` | `/api/studio/remix/{id}/callback` | Worker completion callback |
| `POST` | `/api/studio/voice` | Submit voice synthesis |
| `GET` | `/api/studio/voice/{id}` | Poll voice job status |
| `POST` | `/api/studio/images/thumbnail` | Generate thumbnail |
## GPU Services
All GPU services run as Docker containers, accessed over HTTP. Configuration is in `config/studio.php`:
| Service | Default Endpoint | Purpose |
|---|---|---|
| Ollama | `http://studio-ollama:11434` | Creative decisions via LEM |
| Whisper | `http://studio-whisper:9100` | Speech-to-text |
| Kokoro TTS | `http://studio-tts:9200` | Text-to-speech |
| ffmpeg Worker | `http://studio-worker:9300` | Video rendering |
| ComfyUI | `http://studio-comfyui:8188` | Image generation |
## Configuration
```php
// config/studio.php
return [
'ollama' => [
'url' => env('STUDIO_OLLAMA_URL', 'http://studio-ollama:11434'),
'model' => env('STUDIO_OLLAMA_MODEL', 'lem-4b'),
'timeout' => 60,
],
'whisper' => [
'url' => env('STUDIO_WHISPER_URL', 'http://studio-whisper:9100'),
'model' => 'large-v3-turbo',
'timeout' => 120,
],
'worker' => [
'url' => env('STUDIO_WORKER_URL', 'http://studio-worker:9300'),
'timeout' => 300,
],
'storage' => [
'disk' => 'local',
'assets_path' => 'studio/assets',
],
'templates' => [
'tiktok-15s' => ['duration' => 15, 'width' => 1080, 'height' => 1920, 'fps' => 30],
'tiktok-30s' => ['duration' => 30, 'width' => 1080, 'height' => 1920, 'fps' => 30],
'youtube-60s' => ['duration' => 60, 'width' => 1920, 'height' => 1080, 'fps' => 30],
],
];
```
## Livewire UI
Studio provides four Livewire page components:
- **Asset Browser** — browse, search, and tag multimedia assets
- **Remix Page** — enter a creative brief, select template, view job progress
- **Voice Page** — text-to-speech interface
- **Thumbnail Page** — generate thumbnails from assets
Components are registered via the module's Boot class and available under `mod.studio.livewire.*`.
## Testing
All actions are testable with `Http::fake()`:
```php
use Illuminate\Support\Facades\Http;
use Mod\Studio\Actions\TranscribeAsset;
use Mod\Studio\Models\StudioAsset;
it('transcribes an asset via Whisper', function () {
Storage::fake('local');
Storage::disk('local')->put('studio/test.mp4', 'fake-video');
Http::fake([
'*/transcribe' => Http::response([
'text' => 'Hello world',
'language' => 'en',
]),
]);
$asset = StudioAsset::factory()->create(['path' => 'studio/test.mp4']);
$result = TranscribeAsset::run($asset);
expect($result->transcript)->toBe('Hello world');
expect($result->transcript_language)->toBe('en');
});
```
## Learn More
- [Actions Pattern](actions.md)
- [Lifecycle Events](/php/framework/events)