Music Providers
Configure background music — Google Lyria 3 Pro for AI-generated scores or the free bundled royalty-free track library.
The music provider generates or selects a background score that plays under the voiceover. Music is scene-synced to match the video's emotional arc, with automatic audio ducking during narration.
Supported providers
| Provider | Env var | Flag value | Cost |
|---|---|---|---|
| Bundled library | none | bundled | Free |
| Google Lyria 3 Pro | GOOGLE_API_KEY | lyria | $0.08/track |
Usage
# Free bundled tracks (default)
pnpm start "topic" --music-provider bundled
# AI-generated music
pnpm start "topic" --music-provider lyriaThe --provider google shortcut sets music to Lyria automatically. To disable music entirely:
pnpm start "topic" --no-musicBundled library
The bundled library ships with 25 royalty-free tracks organized by mood. No API key required, no network calls, zero cost.
Track selection works by mood matching:
- The Creative Director's DirectorScore includes a
music_moodfield - The bundled adapter filters tracks by that mood
- A random track from the matching mood pool is selected
- If no tracks match the exact mood, a random track from any mood is used as fallback
The tracks are stored in assets/music/ with metadata in assets/music-manifest.json. Each track entry includes mood classification, duration, source attribution, and license info.
Google Lyria 3 Pro
Lyria generates a unique AI-composed background score tailored to your video's content. The Music Prompter agent writes a detailed prompt describing instruments, tempo, sections, and emotional arc, which Lyria uses to generate the audio.
- Model:
lyria-3-pro-preview - Output format: MP3
- Cost: $0.08 per generated track
- Env var:
GOOGLE_API_KEY(same key used for Gemini LLM, Imagen, Veo, and Gemini TTS)
How it works
- The Music Prompter agent generates a detailed prompt based on the DirectorScore's mood, tempo, and scene structure
- The prompt is sent to the Lyria 3 Pro API via the Gemini
generateContentendpoint withaudioresponse modality - Lyria returns base64-encoded audio (typically MP3) plus optional text metadata describing sections and BPM
- The audio is written to a temp file and passed to the Remotion assembly stage
Safety filter handling
Lyria has content safety filters that may reject prompts with intense or triggering language. If a prompt is rejected:
- The provider automatically sanitizes the prompt by replacing intense adjectives (e.g., "aggressive", "menacing", "violent") with "restrained"
- The sanitized prompt is retried once
- If it fails again, the error propagates and the pipeline falls back to no music for that run
The API may also return finishReason: OTHER, which is an opaque catch-all from the Gemini API. This is not retried since the cause cannot be determined from the response.
Cost comparison
| Provider | Per track | Quality |
|---|---|---|
| Bundled | Free | Good — curated royalty-free library |
| Lyria 3 Pro | $0.08 | Excellent — AI-composed, unique per video, matched to emotional arc |
For most use cases, the bundled library works well. Lyria is worth the $0.08 when you want music that is specifically composed to match your video's content and mood.