docs/docs/php/features/studio.md
user.email 05305d9870
Some checks failed
Build and Deploy / deploy (push) Failing after 7s
docs: update documentation from implemented plans
Add new pages: scheduled-actions, studio, plug, uptelligence.
Update: go-blockchain, go-devops, go-process, mcp, lint, docs engine.
Update nav and indexes.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-14 08:09:17 +00:00

9.9 KiB

Studio Multimedia Pipeline

Studio is a CorePHP module that orchestrates video remixing, transcription, voice synthesis, and image generation by dispatching GPU work to remote services. It separates creative decisions (LEM/Ollama) from mechanical execution (ffmpeg, Whisper, TTS, ComfyUI).

Architecture

Studio is a job orchestrator, not a renderer. All GPU-intensive work runs on remote Docker services accessed over HTTP.

Studio Module (CorePHP)
  ├── Livewire UI (asset browser, remix form, voice, thumbnails)
  ├── Artisan Commands (CLI)
  └── API Routes (/api/studio/*)
        │
  Actions (CatalogueAsset, GenerateManifest, RenderManifest, etc.)
        │
  Redis Job Queue
        │
        ├── Ollama (LEM) ─────── Creative decisions, scripts, manifests
        ├── Whisper ───────────── Speech-to-text transcription
        ├── Kokoro TTS ────────── Voiceover generation
        ├── ffmpeg Worker ─────── Video rendering from manifests
        └── ComfyUI ──────────── Image generation, thumbnails

Smart/Dumb Separation

LEM produces JSON manifests (the creative layer). ffmpeg and GPU services consume them mechanically (the execution layer). Neither side knows about the other's internals — the manifest format is the contract.

Module Structure

The Studio module lives at app/Mod/Studio/ and follows standard CorePHP patterns:

app/Mod/Studio/
├── Boot.php                    # Lifecycle events (API, Console, Web)
├── Actions/
│   ├── CatalogueAsset.php      # Ingest files, extract metadata
│   ├── TranscribeAsset.php     # Send to Whisper, store transcript
│   ├── GenerateManifest.php    # Brief + library → LEM → manifest JSON
│   ├── RenderManifest.php      # Dispatch manifest to ffmpeg worker
│   ├── SynthesiseSpeech.php    # Text → TTS → audio file
│   ├── GenerateVoiceover.php   # Script → voiced audio for remix
│   ├── GenerateImage.php       # Prompt → ComfyUI → image
│   ├── GenerateThumbnail.php   # Asset → thumbnail image
│   └── BatchRemix.php          # Queue multiple remix jobs
├── Console/
│   ├── Catalogue.php           # studio:catalogue — batch ingest
│   ├── Transcribe.php          # studio:transcribe — batch transcription
│   ├── Remix.php               # studio:remix — brief in, video out
│   ├── Voice.php               # studio:voice — text-to-speech
│   ├── Thumbnail.php           # studio:thumbnail — generate thumbnails
│   └── BatchRemixCommand.php   # studio:batch-remix — queue batch jobs
├── Controllers/Api/
│   ├── AssetController.php     # GET/POST /api/studio/assets
│   ├── RemixController.php     # POST /api/studio/remix
│   ├── VoiceController.php     # POST /api/studio/voice
│   └── ImageController.php     # POST /api/studio/images/thumbnail
├── Models/
│   ├── StudioAsset.php         # Multimedia asset with metadata
│   └── StudioJob.php           # Job tracking (status, manifest, output)
├── Livewire/
│   ├── AssetBrowserPage.php    # Browse/search/tag assets
│   ├── RemixPage.php           # Remix form + job status
│   ├── VoicePage.php           # Voice synthesis interface
│   └── ThumbnailPage.php       # Thumbnail generator
└── Routes/
    ├── api.php                 # REST API endpoints
    └── web.php                 # Livewire page routes

Asset Cataloguing

Assets are multimedia files (video, image, audio) tracked in the studio_assets table with metadata including duration, resolution, tags, and transcripts.

Ingesting Assets

use Mod\Studio\Actions\CatalogueAsset;

// From an uploaded file
$asset = CatalogueAsset::run($uploadedFile, ['summer', 'beach']);

// From an existing storage path
$asset = CatalogueAsset::run('studio/raw/clip-001.mp4', ['interview']);

Only video/*, image/*, and audio/* MIME types are accepted.

CLI Batch Ingest

php artisan studio:catalogue /path/to/media --tags=summer,promo

Querying Assets

use Mod\Studio\Models\StudioAsset;

// By type
$videos = StudioAsset::videos()->get();
$images = StudioAsset::images()->get();
$audio = StudioAsset::audio()->get();

// By tag
$summer = StudioAsset::tagged('summer')->get();

Transcription

Transcription sends assets to a Whisper service and stores the returned text and detected language.

use Mod\Studio\Actions\TranscribeAsset;

$asset = TranscribeAsset::run($asset);

echo $asset->transcript;           // "Hello and welcome..."
echo $asset->transcript_language;  // "en"

The action handles missing files and API failures gracefully — it returns the asset unchanged without throwing.

CLI Batch Transcription

php artisan studio:transcribe

Manifest-Driven Remixing

The remix pipeline has two stages: manifest generation (creative) and rendering (mechanical).

Generating Manifests

use Mod\Studio\Actions\GenerateManifest;

$job = GenerateManifest::run(
    brief: 'Create a 15-second upbeat TikTok from the summer footage',
    template: 'tiktok-15s',
);

// $job->manifest contains the JSON manifest

The action collects all video assets from the library, sends them as context to Ollama along with the brief, and parses the returned JSON manifest.

Manifest Format

{
  "clips": [
    {"asset_id": 42, "start_ms": 3200, "end_ms": 8100, "effects": ["fade_in"]},
    {"asset_id": 17, "start_ms": 0, "end_ms": 5500, "effects": ["crossfade"]}
  ],
  "audio": {"track": "original"},
  "voiceover": {"script": "Summer vibes only", "voice": "default", "volume": 0.8},
  "overlays": [
    {"type": "image", "asset_id": 5, "at": 0.5, "duration": 3.0, "position": "bottom-right", "opacity": 0.8}
  ]
}

Rendering

use Mod\Studio\Actions\RenderManifest;

$job = RenderManifest::run($job);

This dispatches the manifest to the ffmpeg worker service, which renders the video and calls back when complete.

CLI Remix

php artisan studio:remix "Create a relaxing travel montage" --template=tiktok-30s

Voice & TTS

use Mod\Studio\Actions\SynthesiseSpeech;

$audio = SynthesiseSpeech::run(
    text: 'Welcome to our channel',
    voice: 'default',
);

CLI

php artisan studio:voice "Welcome to our channel" --voice=default

Image Generation

Thumbnails and image overlays use ComfyUI:

use Mod\Studio\Actions\GenerateThumbnail;

$thumbnail = GenerateThumbnail::run($asset);

CLI

php artisan studio:thumbnail --asset=42

API Endpoints

Method Endpoint Description
GET /api/studio/assets List assets
GET /api/studio/assets/{id} Show asset details
POST /api/studio/assets Upload/catalogue asset
POST /api/studio/remix Submit remix brief
GET /api/studio/remix/{id} Poll job status
POST /api/studio/remix/{id}/callback Worker completion callback
POST /api/studio/voice Submit voice synthesis
GET /api/studio/voice/{id} Poll voice job status
POST /api/studio/images/thumbnail Generate thumbnail

GPU Services

All GPU services run as Docker containers, accessed over HTTP. Configuration is in config/studio.php:

Service Default Endpoint Purpose
Ollama http://studio-ollama:11434 Creative decisions via LEM
Whisper http://studio-whisper:9100 Speech-to-text
Kokoro TTS http://studio-tts:9200 Text-to-speech
ffmpeg Worker http://studio-worker:9300 Video rendering
ComfyUI http://studio-comfyui:8188 Image generation

Configuration

// config/studio.php
return [
    'ollama' => [
        'url' => env('STUDIO_OLLAMA_URL', 'http://studio-ollama:11434'),
        'model' => env('STUDIO_OLLAMA_MODEL', 'lem-4b'),
        'timeout' => 60,
    ],
    'whisper' => [
        'url' => env('STUDIO_WHISPER_URL', 'http://studio-whisper:9100'),
        'model' => 'large-v3-turbo',
        'timeout' => 120,
    ],
    'worker' => [
        'url' => env('STUDIO_WORKER_URL', 'http://studio-worker:9300'),
        'timeout' => 300,
    ],
    'storage' => [
        'disk' => 'local',
        'assets_path' => 'studio/assets',
    ],
    'templates' => [
        'tiktok-15s' => ['duration' => 15, 'width' => 1080, 'height' => 1920, 'fps' => 30],
        'tiktok-30s' => ['duration' => 30, 'width' => 1080, 'height' => 1920, 'fps' => 30],
        'youtube-60s' => ['duration' => 60, 'width' => 1920, 'height' => 1080, 'fps' => 30],
    ],
];

Livewire UI

Studio provides four Livewire page components:

  • Asset Browser — browse, search, and tag multimedia assets
  • Remix Page — enter a creative brief, select template, view job progress
  • Voice Page — text-to-speech interface
  • Thumbnail Page — generate thumbnails from assets

Components are registered via the module's Boot class and available under mod.studio.livewire.*.

Testing

All actions are testable with Http::fake():

use Illuminate\Support\Facades\Http;
use Mod\Studio\Actions\TranscribeAsset;
use Mod\Studio\Models\StudioAsset;

it('transcribes an asset via Whisper', function () {
    Storage::fake('local');
    Storage::disk('local')->put('studio/test.mp4', 'fake-video');

    Http::fake([
        '*/transcribe' => Http::response([
            'text' => 'Hello world',
            'language' => 'en',
        ]),
    ]);

    $asset = StudioAsset::factory()->create(['path' => 'studio/test.mp4']);
    $result = TranscribeAsset::run($asset);

    expect($result->transcript)->toBe('Hello world');
    expect($result->transcript_language)->toBe('en');
});

Learn More