What is vpick

vpick is a visual workflow canvas where creators and AI Agents collaborate to make videos.

The Pain of Making Videos

The most time-consuming part of making a great video isn't the creativity itself — it's the execution:

Step	What You Do	Time Spent
Storyboarding	Plan each shot's composition, framing, description	A lot
Visual Generation	Generate key frames for each scene, select and adjust styles	A lot
Animation	Set start/end frames, duration, transitions, wait for generation	A lot
Creative Decisions	Decide style direction, pick favorites, give feedback	A little

You spend 80% of your time on repetitive execution, and only 20% on the creative decisions that truly matter.

How vpick Solves This

Hand that 80% to an AI Agent, and focus only on the critical 20%.

What You Handle (20%)

The video's theme and style direction
Reviewing results and picking the best ones
Telling the Agent what to adjust

What the Agent Handles (80%)

Planning the storyboard based on your direction
Batch generating all scene key frames
Turning static images into animated videos
Immediately regenerating based on your feedback

Collaboration Flow

You: "Make a coffee brand ad, 6 scenes, warm tones"
        |
Agent: Plans 6 scene descriptions
Agent: Batch generates 6 key frames
Agent: Creates 6 short videos using start/end frames
        |
You: Browse the results
You: "Change scene 2 to a top-down angle, make scene 4 warmer"
        |
Agent: Adjusts immediately, regenerates those two shots
        |
You: Satisfied, download all videos

The canvas is your shared workspace. You can see every step the Agent takes, and you can stop, modify, or take over at any time.

Supported Generation Types

Type	Models	Description
Image	Nano Banana 2, Grok Imagine, Seedream	Scene key frames, product shots, generated in seconds
Video	Veo 3.1, Kling 3.0, Grok Video, Runway	3-15 second clips with start/end frame control and sound
Voice	ElevenLabs V3	Multi-voice, multi-language TTS
Music	Suno V4.5	AI music generation with vocal and instrumental modes
Lipsync	Kling Avatar	Static portrait + voice = talking video
Vocal Separation	Demucs	Separate audio into vocals and accompaniment
Voice Changer	ElevenLabs STS	Voice style transformation
Text	Gemini	Storyboard scripts, scene descriptions, copy

Start/End Frame Control

This is the most practical feature for making videos. You can:

Upload a product photo as start frame -> Agent generates a product rotation animation
Generate two images at different angles as start/end frames -> Agent creates a camera movement effect
Only provide a start frame -> Agent freely creates the rest of the animation based on the prompt

[Start Frame] -> [Video Generator] <- [End Frame]
                       |
              3-10 second animated video

Just Talk to Make It Happen

You don't need to learn any operations. Just tell the Agent:

Plan storyboard — "Plan a 30-second product ad for me, 8 scenes"
Generate visuals — "Create key frames for all 8 scenes"
Make videos — "Turn these into videos, 5 seconds each"
Adjust — "Change scene 3 to warm tones and redo it"

From idea to finished video, all through natural language conversation.