WanGP: Local AI Video Generation

Table Of Content
- What is WanGP?
- Table Overview of WanGP
- Test Setup and Plan
- Installation Options
- Step-by-Step: Docker Installation
- First Launch and Interface Overview
- First Test: Wan 2.1 (1.3B) Text-to-Video
- First-Run Downloads
- VRAM Monitoring and Steps
- Result Summary
- Second Test: Larger Variant (13B) Text-to-Video
- VRAM Usage: Quick Comparison
- Using the Interface: A Short Guide
- Notes on Downloads, Precision, and Decoding
- Controls and Settings to Explore
- Practical Tips for Low-VRAM GPUs
- Troubleshooting and Stability Notes
- Key Features of WanGP
- Model Choice and Quality Notes
- Step-by-Step: From Zero to First Video
- Observations and Takeaways
- Conclusion
AI video generation is exciting, but the hardware barrier is high. Many popular models need large amounts of VRAM and modern GPUs, which makes local experimentation difficult.
I’ve been testing video models for a while, often on data center GPUs with 80 GB of VRAM, and even then the process can be slow. That’s why WanGP caught my attention: it claims you can generate videos locally with as little as 6–8 GB of VRAM, even on older GPUs.
In this article, I install WanGP, run it locally, and measure real GPU memory usage while generating videos with different model sizes. The goal is to see how far we can go with lower VRAM, and to outline what you can expect from this tool in practice.
What is WanGP?
WanGP is a local tool that provides a browser interface for running text-to-video models on your own machine. It supports multiple model families, including Wan 2.1 and 2.2, and others such as Hunyuan.
The project focuses on making video generation feasible on low to mid-range GPUs. It offers multiple installation paths, including Docker and a one-click option via Pinokio, and exposes settings to control model size and generation parameters.
The promise is simple: run modern video models locally and keep VRAM usage low by selecting smaller model variants and efficient inference settings.
Table Overview of WanGP
Item | Summary |
---|---|
Purpose | Local AI video generation with lower VRAM requirements |
Typical VRAM target | 6–8 GB for smaller models |
Model support | Wan 2.1, Wan 2.2, Hunyuan, and others |
Installation options | Docker, Pinokio (one-click), manual (git clone + requirements) |
Interface | Web UI on localhost (default port 7860) |
Tested OS in this walkthrough | Ubuntu |
GPU used for testing | NVIDIA RTX 6000 (48 GB VRAM), with VRAM monitoring during generation |
Notable features | Multiple model sizes, LoRA support, frequent updates |
Test Setup and Plan
I ran WanGP on Ubuntu with an NVIDIA RTX 6000 (48 GB) to track VRAM usage during generation. The intention was not to judge model quality, but to verify VRAM consumption and stability across different model sizes.
I used Docker for the installation. The server exposed a browser interface on port 7860, where I selected models, entered prompts, and initiated generation.
Installation Options
WanGP offers several ways to get started:
- Docker: a containerized setup that detects your GPU and installs dependencies automatically.
- Pinokio: a one-click desktop application that can fetch and run the project.
- Manual: clone the repository and install requirements directly on your system.
For this walkthrough, I chose Docker to keep the environment clean and reproducible.
Step-by-Step: Docker Installation
Follow these steps to install and launch WanGP with Docker:
-
Install Docker and the NVIDIA Container Toolkit.
- Ensure your NVIDIA drivers are installed and
nvidia-smi
works. - Verify Docker is recent and GPU support is enabled.
- Ensure your NVIDIA drivers are installed and
-
Clone the WanGP repository.
- Use
git clone
to fetch the repo and change into its directory.
- Use
-
Run the provided Docker script from the repo root.
- The script detects your GPU and sets up the container.
- The first run can take several minutes as images and dependencies are pulled.
-
Wait for the server to start.
- On success, the app serves at http://localhost:7860.
-
Open the interface in your browser.
- You’ll see model selectors on the left and configuration options on the right.
First Launch and Interface Overview
After the container finished building, the app started on port 7860. The interface loads into a clean, single-page web app.
- Left side: a catalog of supported models, including Wan 2.1, Wan 2.2, Hunyuan, and others.
- Right side: model variants and sizes (e.g., 1.3B, 13B), along with generation settings.
- Note on VRAM: larger variants (such as 14B and above) increase VRAM demand.
I kept the defaults initially and focused on monitoring GPU memory usage during the first two runs.
First Test: Wan 2.1 (1.3B) Text-to-Video
For the initial benchmark, I selected the Wan 2.1 1.3B text-to-video model. The prompt was straightforward: “A baby kangaroo going into the pouch of mother kangaroo.”
I did not change any settings. I clicked Generate and let the tool handle model loading and any required downloads.
First-Run Downloads
The first run downloaded several components:
- The model weights in half precision.
- CLIP and text encoder models for prompt processing.
- Supporting assets required by the inference pipeline.
This is normal and only happens on first launch for each model.
VRAM Monitoring and Steps
Once the model loaded, I tracked VRAM usage through the steps. It remained under 5 GB during the diffusion steps and decoding.
- Steps completed: 30
- Peak VRAM observed: just above 5 GB
- Final phase: the variational autoencoder (VAE) decoded from latent space to pixel space
The generation completed without memory spikes or instability.
Result Summary
The output included a baby kangaroo and a mother kangaroo. The motion did not perfectly match the prompt action, but the focus of this test was the tool’s operation and memory behavior, not prompt engineering or model benchmarking.
Second Test: Larger Variant (13B) Text-to-Video
Next, I moved to a larger text-to-video variant with 13B parameters using the same prompt. This required a new model download on the first run.
During generation, VRAM usage increased in line with the model size. The process completed successfully, and I monitored the consumption closely.
- Peak VRAM observed: about 17 GB
- Steps completed: full cycle without errors
The output did not exactly match the directional nuance of the prompt, but again the goal was to measure memory and verify stability.
VRAM Usage: Quick Comparison
Here’s a concise comparison of the two runs:
Model | Steps | Peak VRAM (approx.) | Outcome |
---|---|---|---|
Wan 2.1 (1.3B) | 30 | ~5 GB | Completed; output rendered |
Text-to-video (13B) | 30 | ~17 GB | Completed; output rendered |
Observations:
- Smaller models fit comfortably within a 6–8 GB budget.
- Larger variants scale VRAM usage alongside parameter count.
Using the Interface: A Short Guide
Here is a simple workflow that mirrors what I did:
- Open the app at http://localhost:7860.
- Choose a model family on the left (e.g., Wan 2.1).
- Pick a model size on the right (e.g., 1.3B for low VRAM).
- Enter your text prompt in the provided field.
- Leave defaults as-is for a first run.
- Click Generate and wait for downloads and processing.
- Monitor VRAM usage with
nvidia-smi
if you want to track memory in real time. - Review the output video once the steps complete.
If you want higher fidelity, consider trying other model families such as Hunyuan. Be aware that larger models will consume more memory.
Notes on Downloads, Precision, and Decoding
- First runs trigger downloads for each chosen model. This can include half-precision weights, CLIP encoders, and additional assets.
- Inference runs in half precision, which helps control VRAM usage while preserving output quality.
- After the diffusion steps, the VAE decodes the latent representation into the final video frames.
These behaviors are expected and appeared consistent during testing.
Controls and Settings to Explore
While I kept defaults for the measurements, the interface exposes useful controls:
- Model size: start with smaller sizes (e.g., 1.3B) for a low VRAM setup.
- Prompt and negative prompt fields: adjust wording to guide motion and composition.
- Generation parameters: sampling steps, guidance scales, and related fields (vary by model).
- LoRA support: apply low-rank adaptation modules that the tool lists in the UI.
- Model catalog updates: new models and LoRAs are added and updated frequently.
These options give you flexibility to trade off VRAM usage, speed, and output characteristics.
Practical Tips for Low-VRAM GPUs
- Pick smaller variants: begin with 1.3B or similar model sizes to stay near or below 6–8 GB.
- Be patient on first runs: model downloads and caching can take time, especially on slower connections.
- Keep settings conservative: defaults are a good starting point while you verify stability.
- Monitor memory: use
nvidia-smi
to watch VRAM and GPU load; adjust model size if you hit limits. - Increment in steps: once a small model runs well, scale up cautiously to larger variants.
This approach helps you stay within your GPU’s limits while getting reproducible results.
Troubleshooting and Stability Notes
- If the container does not see your GPU, confirm your NVIDIA drivers are installed and the NVIDIA Container Toolkit is configured.
- If the web UI does not load, check that the container is running and that port 7860 is not blocked.
- If downloads fail, verify your network connection and available disk space.
- If you get out-of-memory errors, step down to a smaller model or reduce the batch/temporal settings if the model exposes them.
These checks cover the most common early blockers.
Key Features of WanGP
WanGP brings together several practical features for local video generation:
-
Local-first workflow
- Run models on your own machine without sending prompts or outputs to remote servers.
- Keep control over downloads, caching, and storage.
-
Multi-model support
- Access multiple families such as Wan 2.1, Wan 2.2, and Hunyuan from one interface.
- Swap model sizes to match your GPU’s VRAM budget.
-
Low-VRAM operation
- Smaller variants can run under 6–8 GB in my testing, with the 1.3B run staying near 5 GB.
- Larger variants scale predictably, with the 13B run near 17 GB.
-
LoRA integration
- Load LoRAs directly in the UI to adapt model behavior without switching cores.
- Take advantage of frequent updates that add new LoRAs and model options.
-
Straightforward installation paths
- Docker script for a contained setup.
- Pinokio for a one-click experience.
- Manual install via git clone and requirements if you prefer full control.
These features make the project approachable for experimentation on a range of GPUs.
Model Choice and Quality Notes
In my tests, I kept the prompt constant and focused on VRAM and successful completion. The smaller Wan 2.1 1.3B model produced a basic result within a tight memory footprint. The larger 13B model ran smoothly at higher VRAM and rendered the clip as expected.
If you want higher quality or different motion characteristics, try other model families such as Hunyuan and experiment with prompts and settings. This is where the tool’s model catalog and LoRA support will help you find an approach that fits your goals and hardware.
Step-by-Step: From Zero to First Video
To recap the full process I followed:
-
Prepare the system
- Install NVIDIA drivers and verify with
nvidia-smi
. - Install Docker and enable GPU support via the NVIDIA Container Toolkit.
- Install NVIDIA drivers and verify with
-
Fetch the code
- Run
git clone
to download the WanGP repository. - Change into the project directory.
- Run
-
Launch with Docker
- Execute the provided script to build and run the container.
- Wait as images are pulled and dependencies are installed.
-
Open the UI
- Visit http://localhost:7860 in your browser.
-
Select a model family
- Choose Wan 2.1 or another supported family in the left panel.
-
Pick a model size
- Start with 1.3B for low VRAM, or go larger if your GPU can handle it.
-
Enter your prompt
- Use a clear description of the motion and scene.
-
Generate
- Click Generate and let the model download and process the video.
-
Monitor and review
- Watch VRAM usage during steps.
- Play the output and assess the result.
-
Iterate
- Try alternate models (e.g., Hunyuan) or sizes.
- Adjust prompts and settings based on your goals.
Observations and Takeaways
- The tool ran reliably through multiple generations.
- The first run overhead is front-loaded into model downloads; subsequent runs start faster.
- VRAM usage tracked closely with model size, which makes memory planning straightforward.
- Running small models locally for video generation is realistic on a mid-range GPU.
These points align with the project’s aim of making video generation practical for users who do not have access to high-end data center GPUs.
Conclusion
WanGP delivered on its promise to make local AI video generation accessible on lower VRAM. The 1.3B test fit comfortably under 5 GB, and the 13B run completed at around 17 GB. Installation with Docker was clear, and the interface made it easy to switch between models and sizes.
If you’re exploring video generation on a commodity GPU, start with a small model, confirm stable runs, and scale thoughtfully. The built-in model catalog, LoRA support, and frequent updates give you room to grow without changing your setup.
Related Posts

GLM-4.6 vs Qwen 3 Max: Coding, Long-Context Comparison
GLM 4.6 storms in with big gains in coding, long context and agentic workflows. Fast comparison vs Qwen3-Max, plus Omni and 3VL, and what it means for AI devs.

LING 1T: Fast Open Source LLM Model, 128K Context
Meet LING 1T, a flagship non-thinking model built for efficient reasoning at scale. Sparse MoE (50B active params/token), 128K context—architecture insights and real-world tests inside.

ByteBot Open-Source AI Desktop Agent
Step-by-step guide to install and test ByteBot—an open-source AI desktop agent that automates computer tasks in a virtual desktop environment. Hands-on demo included.