What is vpick
vpick is a visual workflow canvas where creators and AI Agents collaborate to make videos.
The Pain of Making Videos
The most time-consuming part of making a great video isn't the creativity itself — it's the execution:
| Step | What You Do | Time Spent |
|---|---|---|
| Storyboarding | Plan each shot's composition, framing, description | A lot |
| Visual Generation | Generate key frames for each scene, select and adjust styles | A lot |
| Animation | Set start/end frames, duration, transitions, wait for generation | A lot |
| Creative Decisions | Decide style direction, pick favorites, give feedback | A little |
You spend 80% of your time on repetitive execution, and only 20% on the creative decisions that truly matter.
How vpick Solves This
Hand that 80% to an AI Agent, and focus only on the critical 20%.
What You Handle (20%)
- The video's theme and style direction
- Reviewing results and picking the best ones
- Telling the Agent what to adjust
What the Agent Handles (80%)
- Planning the storyboard based on your direction
- Batch generating all scene key frames
- Turning static images into animated videos
- Immediately regenerating based on your feedback
Collaboration Flow
You: "Make a coffee brand ad, 6 scenes, warm tones"
|
Agent: Plans 6 scene descriptions
Agent: Batch generates 6 key frames
Agent: Creates 6 short videos using start/end frames
|
You: Browse the results
You: "Change scene 2 to a top-down angle, make scene 4 warmer"
|
Agent: Adjusts immediately, regenerates those two shots
|
You: Satisfied, download all videos
The canvas is your shared workspace. You can see every step the Agent takes, and you can stop, modify, or take over at any time.
Supported Generation Types
| Type | Models | Description |
|---|---|---|
| Image | Nano Banana 2, Grok Imagine, Seedream | Scene key frames, product shots, generated in seconds |
| Video | Veo 3.1, Kling 3.0, Grok Video, Runway | 3-15 second clips with start/end frame control and sound |
| Voice | ElevenLabs V3 | Multi-voice, multi-language TTS |
| Music | Suno V4.5 | AI music generation with vocal and instrumental modes |
| Lipsync | Kling Avatar | Static portrait + voice = talking video |
| Vocal Separation | Demucs | Separate audio into vocals and accompaniment |
| Voice Changer | ElevenLabs STS | Voice style transformation |
| Text | Gemini | Storyboard scripts, scene descriptions, copy |
Start/End Frame Control
This is the most practical feature for making videos. You can:
- Upload a product photo as start frame -> Agent generates a product rotation animation
- Generate two images at different angles as start/end frames -> Agent creates a camera movement effect
- Only provide a start frame -> Agent freely creates the rest of the animation based on the prompt
[Start Frame] -> [Video Generator] <- [End Frame]
|
3-10 second animated video
Just Talk to Make It Happen
You don't need to learn any operations. Just tell the Agent:
- Plan storyboard — "Plan a 30-second product ad for me, 8 scenes"
- Generate visuals — "Create key frames for all 8 scenes"
- Make videos — "Turn these into videos, 5 seconds each"
- Adjust — "Change scene 3 to warm tones and redo it"
From idea to finished video, all through natural language conversation.