Architecture
Deep dive into the OpenReels codebase — directory structure, key abstractions, and data flow.
OpenReels is a 6-stage AI pipeline that transforms a text topic into a rendered YouTube Short. This page explains the codebase structure, key abstractions, and how data flows through the system.
High-Level Architecture
Topic (text)
│
▼
┌──────────┐ ┌──────────┐ ┌─────┐ ┌─────────┐ ┌──────────┐ ┌────────┐
│ Research │──►│ Director │──►│ TTS │──►│ Visuals │──►│ Assembly │──►│ Critic │
└──────────┘ └──────────┘ └─────┘ └─────────┘ └──────────┘ └────────┘
│ │ │ │ │ │
│ DirectorScore Audio + Images, final.mp4 Score +
│ word videos, review
│ stamps stock clips
▼
Research data
(facts, mood)The pipeline is orchestrated by Mastra (a workflow framework) in src/pipeline/orchestrator.ts. Each stage is a Mastra step that receives the output of previous stages.
Directory Structure
src/agents/
AI agent functions that make LLM calls with structured output. Each agent has a system prompt (in prompts/) and returns typed data via Zod schemas.
| File | Agent | Input | Output |
|---|---|---|---|
research.ts | Research | topic | summary, key facts, mood |
creative-director.ts | Creative Director | research data + archetype | DirectorScore |
image-prompter.ts | Image Prompter | raw visual prompt | optimized prompt for image gen |
music-prompter.ts | Music Prompter | music mood | prompt string for AI music gen |
critic.ts | Critic | DirectorScore + topic | score 0-10, strengths, weaknesses |
src/pipeline/
The orchestrator and supporting utilities.
| File | Purpose |
|---|---|
orchestrator.ts | Mastra workflow definition with 6 stages |
utils.ts | Stage names, callback interfaces, pipeline options/result types |
music-resolver.ts | Resolves music via bundled library or AI generation (Lyria) |
scene-assets.ts | Maps DirectorScore scenes to resolved asset file paths |
src/providers/
Provider implementations behind stable interfaces. Each category has a base interface and one or more concrete implementations.
providers/
factory.ts # createProviders() — wires everything together
llm/
base.ts # BaseLLM abstract class (search injection, parametric fallback)
anthropic.ts # AnthropicLLM (Claude, native web search)
openai.ts # OpenAILLM (GPT, native web search)
gemini.ts # GeminiLLM (Gemini, native Google Search)
openrouter.ts # OpenRouterLLM (300+ models via OpenRouter)
openai-compatible.ts # OpenAICompatibleLLM (Ollama, Together, Groq, etc.)
search/
tavily.ts # Tavily search tools for providers without native search
tts/
elevenlabs.ts # ElevenLabsTTS (native word timestamps)
inworld.ts # InworldTTS (native word timestamps)
kokoro.ts # KokoroTTS (local, needs alignment)
gemini.ts # GeminiTTS (needs alignment)
openai.ts # OpenAITTS (needs alignment)
aligned-tts-provider.ts # Decorator that adds Whisper alignment
whisper-aligner.ts # Whisper-based word timestamp alignment
image/
gemini.ts # GeminiImage
openai.ts # OpenAIImage
stock/
pexels.ts # PexelsStock
pixabay.ts # PixabayStock
adaptive-resolver.ts # Multi-provider fallback with query reformulation
query-reformer.ts # Rewrites prompts for better stock search results
stock-verifier.ts # VLM-based relevance verification
video/
gemini.ts # GeminiVideo (Veo)
fal.ts # FalVideo (Kling, Wan)
video-resolver.ts # Multi-provider fallback for video generation
music/
bundled.ts # Bundled music library manifest + validation
bundled-adapter.ts # BundledMusic provider (selects from library)
lyria.ts # LyriaMusic (Google Lyria 3 Pro generation)src/schema/
Zod schemas that define the data contracts.
| File | Schemas |
|---|---|
director-score.ts | DirectorScore, Scene, MusicMood, VisualType, Motion, TransitionType |
providers.ts | Provider key types, LLMProvider, TTSProvider, ImageProvider, StockProvider, VideoProvider, MusicProvider interfaces |
archetype.ts | ArchetypeConfig schema |
src/config/
Configuration and registration.
| File | Purpose |
|---|---|
archetype-registry.ts | Loads and registers the 14 archetype JSON configs |
archetypes/*.json | Individual archetype definitions (colors, pacing, caption style, etc.) |
playbook.ts | Pipeline behavior configuration |
platforms.ts | Platform specs (YouTube, TikTok, etc.) with resolution, FPS, duration |
src/remotion/
The Remotion video rendering layer.
| Directory | Purpose |
|---|---|
compositions/ | OpenReelsVideo.tsx — the main Remotion composition |
beats/ | Beat components: AIImageBeat, StockImageBeat, StockVideoBeat, TextCardBeat |
captions/ | 6 caption styles + timing utilities for word-level sync |
audio/ | MusicTrack (with ducking) and VoiceoverTrack components |
lib/ | score-to-props.ts (maps DirectorScore to Remotion props), font loading |
src/cli/
CLI-specific code (not used by the web UI).
| File | Purpose |
|---|---|
args.ts | Argument parser (topic, flags, provider selection) |
progress.ts | Terminal progress display with stage indicators |
cost-estimator.ts | Estimates and tracks actual costs across providers |
validate-env.ts | Validates required environment variables |
web/
React SPA with Tailwind CSS, built with Vite.
| Directory | Purpose |
|---|---|
pages/ | HomePage, JobPage, GalleryPage, SettingsPage |
hooks/ | useApi (REST calls), useSSE (real-time events) |
components/ | Layout, stage cards, pipeline visualization, shadcn/ui components |
lib/ | Scene asset utilities, general helpers |
Key Abstractions
LLMProvider Interface
All LLM providers implement this interface via the BaseLLM abstract class:
interface LLMProvider {
readonly id: LLMProviderKey;
generate<T extends z.ZodType>(opts: {
systemPrompt: string;
userMessage: string;
schema: T;
enableWebSearch?: boolean;
}): Promise<LLMResult<z.infer<T>>>;
}BaseLLM provides two generation paths:
- Direct structured output — single LLM call with Zod schema
- Two-pass web search — first pass uses search tools, second pass structures the results
Subclasses implement createLanguageModel() and createSearchTools().
TTSProvider Interface
interface TTSProvider {
generate(text: string): Promise<TTSResult>;
}
interface TTSResult {
audio: Buffer;
words: WordTimestamp[]; // { word, start, end } for each word
}Providers that return native word timestamps (ElevenLabs, Inworld) implement this directly. Providers without native timestamps (Kokoro, Gemini TTS, OpenAI TTS) are wrapped in AlignedTTSProvider, which uses WhisperAligner to add word-level timing after generation.
ImageProvider Interface
interface ImageProvider {
generate(prompt: string, style?: string): Promise<Buffer>;
}StockProvider Interface
interface StockProvider {
searchVideo(query: string): Promise<StockCandidate[]>;
searchImage(query: string): Promise<StockCandidate[]>;
download(candidate: StockCandidate): Promise<StockAsset>;
}Stock resolution uses AdaptiveResolver which tries multiple providers with query reformulation and VLM-based verification.
VideoProvider Interface
interface VideoProvider {
readonly supportedDurations: number[];
generate(opts: {
sourceImage: Buffer;
prompt: string;
durationSeconds?: number;
aspectRatio?: string;
}): Promise<VideoResult>;
}MusicProvider Interface
interface MusicProvider {
generate(prompt: string, mood: MusicMood): Promise<MusicResult>;
}Two implementations: BundledMusic (selects from a pre-packaged library) and LyriaMusic (generates via Google Lyria 3 Pro).
Provider Factory
src/providers/factory.ts exports createProviders(config) which wires together all provider instances based on the configuration. It handles:
- Provider selection by key (e.g.,
llm: "anthropic") - BYOK (Bring Your Own Key) — per-job API keys passed via the
keysmap - Multi-provider arrays for stock and video (primary + fallback)
- TTS alignment wrapping for providers without native timestamps
Data Flow
- Research — LLM call with web search produces facts and mood
- Director — LLM call takes research + archetype config and produces a
DirectorScore - TTS — Each scene's
script_lineis synthesized to audio with word timestamps - Visuals — Each scene's
visual_typedetermines the generation strategy (AI image, AI video, stock, text card). Music is resolved in parallel. - Assembly —
score-to-props.tsmaps the DirectorScore + assets into Remotion props. Remotion renders the final video with a headless Chrome instance. - Critic — VLM evaluates the rendered video and produces a quality score.
Pipeline Callbacks
The pipeline communicates progress via PipelineCallbacks:
interface PipelineCallbacks {
onStageStart?(stage: StageName): void;
onStageComplete?(stage: StageName, detail: string, durationSec: number): void;
onStageSkip?(stage: StageName, reason: string): void;
onStageError?(stage: StageName, error: string): void;
onProgress?(stage: StageName, data: Record<string, unknown>): void;
onCostEstimate?(estimate: CostBreakdown, imageProvider: ImageProviderKey): Promise<boolean>;
onActualCost?(cost: ActualCostBreakdown): void;
onLog?(message: string): void;
onRunDir?(runDir: string): void;
isCancelled?(): boolean;
}The CLI uses callbacks to drive the terminal progress display. The worker uses callbacks to emit BullMQ progress events and write meta.json updates.