vpick User Guide Open vpick

Short Video Agent (Full 1-Minute Short-Film Setup)

If what you want isn't just "a few images and a few clips" but a complete one-minute short film — script breakdown, character design, storyboard tables, per-shot Seedance video generation, and final merged output — then Short Video Agent is much more complete than the basic chapters in this guide.

It wraps the whole production flow into a 5-stage agent. Paste a story script, wait 15–20 minutes, and you get a complete short film + 13 files (character reference sheets, environment floor plans, storyboard tables, per-Part videos, merged final output).

GitHub: snoopyrain/vpick-short-video-agent


Why You Need It

vpick itself is a node-based canvas + MCP toolset. Making a complete short film requires the right sequence:

  1. Define style (realistic / anime / cartoon), aspect ratio, number of characters
  2. Break the script into 4–6 Parts, each becoming a 6–10s clip
  3. Generate "four-view reference sheets" for every character — so faces stay consistent across shots
  4. Generate environment floor plans + prop images for each scene
  5. Write zh-TW storyboard tables (one per Part, with camera moves / dialogue / actions)
  6. Generate clips with Seedance 2 (one per Part, with start frame + dialogue)
  7. Merge into final film

Skip any step and you get: drifting character faces, inconsistent shot style, dialogue out of sync, or clips that don't cut together.

Short Video Agent wraps these 7 steps into 5 stages. Each stage waits for your "all confirmed" before moving on — so you don't burn time on a broken storyboard.


Two Ways to Use It

The two folders contain identical content — just different packaging.

Mode For Install
Claude.ai Project Web Claude.ai users Upload 7 files to a Project, paste 00-INSTRUCTIONS.md into Instructions
Claude Code skill Terminal Claude Code CLI users Copy VPick-Short-Video-Skill/ to ~/.claude/skills/vpick-storyboard/

Detailed install steps live in each folder's README in the GitHub repo.

Required Prerequisite

The VPick MCP connector must be connected to Claude first. Every image/video node in the agent calls mcp__vpick__* tools — without the connector, nothing runs.

See Connect MCP for setup.


The 5 Stages

Stage Action Parallel Gate
1 Confirm video setup (style, aspect, character count, whether to use reference images / voices) 1 JSON
2 Break script → JSON + zh-TW table 1 JSON
3 create_project → character four-view sheets + environment plans + prop images All images confirmed
4 zh-TW storyboard table per Part All images confirmed
5 Seedance 2 video (with dialogue) → merge → file list "Start generation" + final file list

Each stage waits for your "all confirmed" before proceeding. Stage 5 will not call run_video_generator until you explicitly say "start generation" — so the agent never burns 5 minutes on a wrong storyboard.


Quick Try

  1. Pick one of the two modes above and install it

  2. Paste a story script into chat, e.g.:

    A rainy night. Anna walks through a narrow alley holding a red umbrella. Bryan suddenly steps out of the shadows: "I've been waiting a long time." She steps back, tightening her grip on the handle…

  3. Claude enters Stage 1, confirming the setup. Walk through the prompts.

  4. 15–20 minutes later: complete short film + all assets


Customizing Characters and Voices

The repo ships with an Andy male-lead sample (headshot + voice) so you can run the whole flow without preparing assets.

⚠️ It's just a sample. Use your own headshots / voices for real productions.

Three ways to swap in your own:

Route 1: Tell Claude in Stage 1

Use https://your.com/me.png as character_b's reference_image_url
Use https://your.com/me.mp3 as character_b's voice_reference_url

Route 2: Edit the Stage 2 JSON directly

{
  "id": "character_b",
  "reference_image_url": "https://your.com/me.png",
  "voice_reference_url": "https://your.com/me.mp3"
}

Route 3: Skip references entirely

Set the URL fields to null. Stage 3 generates characters from text only; Stage 5 Seedance auto-generates dialogue audio.

Recommended Asset Specs

Type Recommendation
Headshot png / jpg, ≥ 512×512, ideally 4 views (front / side / back / closeup)
Voice mp3 10–30s clean voice, single speaker, no background music, neutral tone

URLs can be anything publicly fetchable: Google Drive public share (direct-download form), GCS / S3 / Dropbox public links, your own image host, or any direct png / jpg / mp3.


Compared to the Basic Chapters

Aspect Basic Chapters Short Video Agent
Use case Single image / single clip / simple batch Complete one-minute short film
Flow Free-form node wiring 5 fixed stages, gated by user confirm
Character consistency Manually wire reference images Built-in four-view character sheet
Storyboard Write prompts yourself zh-TW storyboard table auto-generated
Video merge Wire a Combine node yourself Auto-merged at end of flow
Time to result A few minutes 15–20 minutes

Use the basic chapters for simple needs. For a real one-minute short film, go straight to the agent.