Understanding Nodes
Nodes are the basic building blocks on the vpick canvas. Each node handles one specific task.
Node Types Overview
| Node | What It Does |
|---|---|
| Text | Stores a piece of text |
| AI Assistant | Calls AI to generate copy |
| Image Generator | Generates images with AI |
| Video Generator | Generates short videos with AI |
| Audio Generator | Generates voice-over with AI (TTS) |
| Music Generator | Generates music with AI (BGM) |
| Lipsync | Makes a portrait talk in sync with audio |
| Vocal Separator | Separates audio into vocals, accompaniment, and original |
| Voice Changer | Transforms voice style with ElevenLabs |
| Audio Combine | Mixes multiple audio tracks into one |
| Combine | Merges multiple video clips into one |
| List | Stores multiple items for batch processing |
| Upload | Uploads your own images or files |
| Group | Visually groups multiple nodes |
AI Assistant
Enter a prompt and AI will generate text content for you.
Common uses:
- Generate product descriptions
- List creative ideas
- Write social media copy
You can enable "Export as List" to automatically split the AI response into multiple items, making it easy to feed into an Image Generator for batch processing.
Image Generator
Supports multiple models (Nano Banana 2, Grok Imagine, Seedream, etc.) to generate images from text descriptions.
- Aspect ratios: 1:1 (square), 16:9 (landscape), 9:16 (portrait), etc.
- Supports multiple reference image inputs
- Connect a List to generate multiple images at once
- See AI Models Guide for detailed model comparisons
Video Generator
Supports multiple models (Veo 3.1, Kling 3.0, Grok Video, Runway, etc.) to generate short videos from text descriptions.
- Duration varies by model, from 3 to 15 seconds
- Some models support sound generation (Kling, Grok, Veo)
- Start/end frame support: Upload images as the starting or ending frame of the video
- See AI Models Guide for detailed model comparisons
Audio Generator (Voice Over)
Uses the ElevenLabs V3 model to convert text to speech.
- 9 voices available (Roger, Sarah, Brian, etc.), each with a demo preview
- Supports 10 languages
- Adjustable Stability: affects the expressiveness of the voice
- Output audio can connect to the Combine node to overlay on video, or to the Lipsync node
Music Generator
Uses the Suno V4.5 model to generate complete music from text descriptions.
- Simple mode: Enter a description (e.g., "an upbeat jazz piano piece"), AI generates automatically
- Custom mode: Specify music style and song title
- Instrumental: Enable Instrumental mode to generate vocal-free background music
- Great for video background music — connect to the Combine node's audio-in port
Lipsync
Uses the Kling Avatar model to turn a static portrait photo into a talking video.
- Connect a portrait photo (image-in) + an audio clip (audio-in)
- AI syncs the person's lip movements to the audio
- Two modes: Standard ($0.12/sec), Pro ($0.24/sec)
- Best results: Use a front-facing, clear photo with closed mouth
Vocal Separator
Uses the Demucs model to separate audio into three independent tracks.
- Input: video (video-in) or audio (audio-in)
- Output: vocals (vocals-out), accompaniment (accompaniment-out), original audio (origin-out)
- Automatically creates 3 Upload nodes to store the results after separation
- Perfect for removing background music or extracting vocals
Voice Changer
Uses ElevenLabs Speech-to-Speech to transform voice style.
- Input: audio (audio-in)
- Output: transformed audio (audio-out)
- Uses your own ElevenLabs API Key (set in Settings -> ElevenLabs)
- Choose from built-in voices or clone your own voice
- Option to remove background noise
- No vpick credits consumed (uses your own ElevenLabs quota)
Audio Combine
Mixes multiple audio tracks into one.
- Connect multiple audio sources to the audio-in port
- Outputs the mixed audio (audio-out)
- Perfect for mixing vocals with background music
Combine
Merges multiple video clips in order into one complete video.
- Connect multiple Video Generator / Lipsync / Upload nodes to the
videos-inport - Optionally connect audio (audio-in) as background music
- Audio mixing: If the video already has sound, it will be mixed with the background music (not replaced)
- Automatically handles videos with different resolutions (re-encodes to a unified format)
List
The key node for batch generation. Store multiple items in a List, connect it to an Image or Video Generator, and it will automatically generate one output for each item.
For example, a List with 5 items connected to an Image Generator will produce 5 images.
Upload
Upload images from your computer to the canvas. Common uses:
- As reference images for Image Generator
- As start frames or end frames for Video Generator
- As portrait photos for Lipsync
Group
Visually group multiple nodes for easier management.
- Select multiple nodes and press Ctrl+G to create a group
- Dragging a group moves all member nodes together
- Customize group color and label
Connections
Nodes connect to each other with lines, and data flows along these connections:
[AI Assistant] -> [List] -> [Image Generator]
This way, the AI-generated text enters the List, and each item in the List generates an image separately.
Advanced Example: Lipsync Video
[Upload (Portrait Photo)] -> image-in -> [Lipsync]
[Audio Generator] -> audio-in -> [Lipsync]
[Lipsync] -> videos-in -> [Combine]
[Music Generator] -> audio-in -> [Combine]
This workflow will:
- Audio Generator creates the voice-over
- Lipsync makes the portrait talk
- Music Generator creates background music
- Combine merges the lipsync video + background music into the final video