Short Video Agent (Full 1-Minute Short-Film Setup)
If what you want isn't just "a few images and a few clips" but a complete one-minute short film — script breakdown, character design, storyboard tables, per-shot Seedance video generation, and final merged output — then Short Video Agent is much more complete than the basic chapters in this guide.
It wraps the whole production flow into a 5-stage agent. Paste a story script, wait 15–20 minutes, and you get a complete short film + 13 files (character reference sheets, environment floor plans, storyboard tables, per-Part videos, merged final output).
GitHub: snoopyrain/vpick-short-video-agent
Why You Need It
vpick itself is a node-based canvas + MCP toolset. Making a complete short film requires the right sequence:
- Define style (realistic / anime / cartoon), aspect ratio, number of characters
- Break the script into 4–6 Parts, each becoming a 6–10s clip
- Generate "four-view reference sheets" for every character — so faces stay consistent across shots
- Generate environment floor plans + prop images for each scene
- Write zh-TW storyboard tables (one per Part, with camera moves / dialogue / actions)
- Generate clips with Seedance 2 (one per Part, with start frame + dialogue)
- Merge into final film
Skip any step and you get: drifting character faces, inconsistent shot style, dialogue out of sync, or clips that don't cut together.
Short Video Agent wraps these 7 steps into 5 stages. Each stage waits for your "all confirmed" before moving on — so you don't burn time on a broken storyboard.
Two Ways to Use It
The two folders contain identical content — just different packaging.
| Mode | For | Install |
|---|---|---|
| Claude.ai Project | Web Claude.ai users | Upload 7 files to a Project, paste 00-INSTRUCTIONS.md into Instructions |
| Claude Code skill | Terminal Claude Code CLI users | Copy VPick-Short-Video-Skill/ to ~/.claude/skills/vpick-storyboard/ |
Detailed install steps live in each folder's README in the GitHub repo.
Required Prerequisite
The VPick MCP connector must be connected to Claude first. Every image/video node in the agent calls mcp__vpick__* tools — without the connector, nothing runs.
See Connect MCP for setup.
The 5 Stages
| Stage | Action | Parallel | Gate |
|---|---|---|---|
| 1 | Confirm video setup (style, aspect, character count, whether to use reference images / voices) | — | 1 JSON |
| 2 | Break script → JSON + zh-TW table | — | 1 JSON |
| 3 | create_project → character four-view sheets + environment plans + prop images |
✓ | All images confirmed |
| 4 | zh-TW storyboard table per Part | ✓ | All images confirmed |
| 5 | Seedance 2 video (with dialogue) → merge → file list | ✓ | "Start generation" + final file list |
Each stage waits for your "all confirmed" before proceeding. Stage 5 will not call run_video_generator until you explicitly say "start generation" — so the agent never burns 5 minutes on a wrong storyboard.
Quick Try
Pick one of the two modes above and install it
Paste a story script into chat, e.g.:
A rainy night. Anna walks through a narrow alley holding a red umbrella. Bryan suddenly steps out of the shadows: "I've been waiting a long time." She steps back, tightening her grip on the handle…
Claude enters Stage 1, confirming the setup. Walk through the prompts.
15–20 minutes later: complete short film + all assets
Customizing Characters and Voices
The repo ships with an Andy male-lead sample (headshot + voice) so you can run the whole flow without preparing assets.
⚠️ It's just a sample. Use your own headshots / voices for real productions.
Three ways to swap in your own:
Route 1: Tell Claude in Stage 1
Use https://your.com/me.png as character_b's reference_image_url
Use https://your.com/me.mp3 as character_b's voice_reference_url
Route 2: Edit the Stage 2 JSON directly
{
"id": "character_b",
"reference_image_url": "https://your.com/me.png",
"voice_reference_url": "https://your.com/me.mp3"
}
Route 3: Skip references entirely
Set the URL fields to null. Stage 3 generates characters from text only; Stage 5 Seedance auto-generates dialogue audio.
Recommended Asset Specs
| Type | Recommendation |
|---|---|
| Headshot | png / jpg, ≥ 512×512, ideally 4 views (front / side / back / closeup) |
| Voice mp3 | 10–30s clean voice, single speaker, no background music, neutral tone |
URLs can be anything publicly fetchable: Google Drive public share (direct-download form), GCS / S3 / Dropbox public links, your own image host, or any direct png / jpg / mp3.
Compared to the Basic Chapters
| Aspect | Basic Chapters | Short Video Agent |
|---|---|---|
| Use case | Single image / single clip / simple batch | Complete one-minute short film |
| Flow | Free-form node wiring | 5 fixed stages, gated by user confirm |
| Character consistency | Manually wire reference images | Built-in four-view character sheet |
| Storyboard | Write prompts yourself | zh-TW storyboard table auto-generated |
| Video merge | Wire a Combine node yourself | Auto-merged at end of flow |
| Time to result | A few minutes | 15–20 minutes |
Use the basic chapters for simple needs. For a real one-minute short film, go straight to the agent.