WanGP: Local AI Video Generation

AI video generation is exciting, but the hardware barrier is high. Many popular models need large amounts of VRAM and modern GPUs, which makes local experimentation difficult.

I’ve been testing video models for a while, often on data center GPUs with 80 GB of VRAM, and even then the process can be slow. That’s why WanGP caught my attention: it claims you can generate videos locally with as little as 6–8 GB of VRAM, even on older GPUs.

In this article, I install WanGP, run it locally, and measure real GPU memory usage while generating videos with different model sizes. The goal is to see how far we can go with lower VRAM, and to outline what you can expect from this tool in practice.

What is WanGP?

WanGP is a local tool that provides a browser interface for running text-to-video models on your own machine. It supports multiple model families, including Wan 2.1 and 2.2, and others such as Hunyuan.

The project focuses on making video generation feasible on low to mid-range GPUs. It offers multiple installation paths, including Docker and a one-click option via Pinokio, and exposes settings to control model size and generation parameters.

The promise is simple: run modern video models locally and keep VRAM usage low by selecting smaller model variants and efficient inference settings.

Table Overview of WanGP

Item	Summary
Purpose	Local AI video generation with lower VRAM requirements
Typical VRAM target	6–8 GB for smaller models
Model support	Wan 2.1, Wan 2.2, Hunyuan, and others
Installation options	Docker, Pinokio (one-click), manual (git clone + requirements)
Interface	Web UI on localhost (default port 7860)
Tested OS in this walkthrough	Ubuntu
GPU used for testing	NVIDIA RTX 6000 (48 GB VRAM), with VRAM monitoring during generation
Notable features	Multiple model sizes, LoRA support, frequent updates

Test Setup and Plan

I ran WanGP on Ubuntu with an NVIDIA RTX 6000 (48 GB) to track VRAM usage during generation. The intention was not to judge model quality, but to verify VRAM consumption and stability across different model sizes.

I used Docker for the installation. The server exposed a browser interface on port 7860, where I selected models, entered prompts, and initiated generation.

Installation Options

WanGP offers several ways to get started:

Docker: a containerized setup that detects your GPU and installs dependencies automatically.
Pinokio: a one-click desktop application that can fetch and run the project.
Manual: clone the repository and install requirements directly on your system.

For this walkthrough, I chose Docker to keep the environment clean and reproducible.

Step-by-Step: Docker Installation

Follow these steps to install and launch WanGP with Docker:

Install Docker and the NVIDIA Container Toolkit.
- Ensure your NVIDIA drivers are installed and nvidia-smi works.
- Verify Docker is recent and GPU support is enabled.
Clone the WanGP repository.
- Use git clone to fetch the repo and change into its directory.
Run the provided Docker script from the repo root.
- The script detects your GPU and sets up the container.
- The first run can take several minutes as images and dependencies are pulled.
Wait for the server to start.
- On success, the app serves at http://localhost:7860.
Open the interface in your browser.
- You’ll see model selectors on the left and configuration options on the right.

First Launch and Interface Overview

After the container finished building, the app started on port 7860. The interface loads into a clean, single-page web app.

Left side: a catalog of supported models, including Wan 2.1, Wan 2.2, Hunyuan, and others.
Right side: model variants and sizes (e.g., 1.3B, 13B), along with generation settings.
Note on VRAM: larger variants (such as 14B and above) increase VRAM demand.

I kept the defaults initially and focused on monitoring GPU memory usage during the first two runs.

First Test: Wan 2.1 (1.3B) Text-to-Video

For the initial benchmark, I selected the Wan 2.1 1.3B text-to-video model. The prompt was straightforward: “A baby kangaroo going into the pouch of mother kangaroo.”

I did not change any settings. I clicked Generate and let the tool handle model loading and any required downloads.

First-Run Downloads

The first run downloaded several components:

The model weights in half precision.
CLIP and text encoder models for prompt processing.
Supporting assets required by the inference pipeline.

This is normal and only happens on first launch for each model.

VRAM Monitoring and Steps

Once the model loaded, I tracked VRAM usage through the steps. It remained under 5 GB during the diffusion steps and decoding.

Steps completed: 30
Peak VRAM observed: just above 5 GB
Final phase: the variational autoencoder (VAE) decoded from latent space to pixel space

The generation completed without memory spikes or instability.

Result Summary

The output included a baby kangaroo and a mother kangaroo. The motion did not perfectly match the prompt action, but the focus of this test was the tool’s operation and memory behavior, not prompt engineering or model benchmarking.

Second Test: Larger Variant (13B) Text-to-Video

Next, I moved to a larger text-to-video variant with 13B parameters using the same prompt. This required a new model download on the first run.

During generation, VRAM usage increased in line with the model size. The process completed successfully, and I monitored the consumption closely.

Peak VRAM observed: about 17 GB
Steps completed: full cycle without errors

The output did not exactly match the directional nuance of the prompt, but again the goal was to measure memory and verify stability.

VRAM Usage: Quick Comparison

Here’s a concise comparison of the two runs:

Model	Steps	Peak VRAM (approx.)	Outcome
Wan 2.1 (1.3B)	30	~5 GB	Completed; output rendered
Text-to-video (13B)	30	~17 GB	Completed; output rendered

Observations:

Smaller models fit comfortably within a 6–8 GB budget.
Larger variants scale VRAM usage alongside parameter count.

Using the Interface: A Short Guide

Here is a simple workflow that mirrors what I did:

Open the app at http://localhost:7860.
Choose a model family on the left (e.g., Wan 2.1).
Pick a model size on the right (e.g., 1.3B for low VRAM).
Enter your text prompt in the provided field.
Leave defaults as-is for a first run.
Click Generate and wait for downloads and processing.
Monitor VRAM usage with nvidia-smi if you want to track memory in real time.
Review the output video once the steps complete.

If you want higher fidelity, consider trying other model families such as Hunyuan. Be aware that larger models will consume more memory.

Notes on Downloads, Precision, and Decoding

First runs trigger downloads for each chosen model. This can include half-precision weights, CLIP encoders, and additional assets.
Inference runs in half precision, which helps control VRAM usage while preserving output quality.
After the diffusion steps, the VAE decodes the latent representation into the final video frames.

These behaviors are expected and appeared consistent during testing.

Controls and Settings to Explore

While I kept defaults for the measurements, the interface exposes useful controls:

Model size: start with smaller sizes (e.g., 1.3B) for a low VRAM setup.
Prompt and negative prompt fields: adjust wording to guide motion and composition.
Generation parameters: sampling steps, guidance scales, and related fields (vary by model).
LoRA support: apply low-rank adaptation modules that the tool lists in the UI.
Model catalog updates: new models and LoRAs are added and updated frequently.

These options give you flexibility to trade off VRAM usage, speed, and output characteristics.

Practical Tips for Low-VRAM GPUs

Pick smaller variants: begin with 1.3B or similar model sizes to stay near or below 6–8 GB.
Be patient on first runs: model downloads and caching can take time, especially on slower connections.
Keep settings conservative: defaults are a good starting point while you verify stability.
Monitor memory: use nvidia-smi to watch VRAM and GPU load; adjust model size if you hit limits.
Increment in steps: once a small model runs well, scale up cautiously to larger variants.

This approach helps you stay within your GPU’s limits while getting reproducible results.

Troubleshooting and Stability Notes

If the container does not see your GPU, confirm your NVIDIA drivers are installed and the NVIDIA Container Toolkit is configured.
If the web UI does not load, check that the container is running and that port 7860 is not blocked.
If downloads fail, verify your network connection and available disk space.
If you get out-of-memory errors, step down to a smaller model or reduce the batch/temporal settings if the model exposes them.

These checks cover the most common early blockers.

Key Features of WanGP

WanGP brings together several practical features for local video generation:

Local-first workflow
- Run models on your own machine without sending prompts or outputs to remote servers.
- Keep control over downloads, caching, and storage.
Multi-model support
- Access multiple families such as Wan 2.1, Wan 2.2, and Hunyuan from one interface.
- Swap model sizes to match your GPU’s VRAM budget.
Low-VRAM operation
- Smaller variants can run under 6–8 GB in my testing, with the 1.3B run staying near 5 GB.
- Larger variants scale predictably, with the 13B run near 17 GB.
LoRA integration
- Load LoRAs directly in the UI to adapt model behavior without switching cores.
- Take advantage of frequent updates that add new LoRAs and model options.
Straightforward installation paths
- Docker script for a contained setup.
- Pinokio for a one-click experience.
- Manual install via git clone and requirements if you prefer full control.

These features make the project approachable for experimentation on a range of GPUs.

Model Choice and Quality Notes

In my tests, I kept the prompt constant and focused on VRAM and successful completion. The smaller Wan 2.1 1.3B model produced a basic result within a tight memory footprint. The larger 13B model ran smoothly at higher VRAM and rendered the clip as expected.

If you want higher quality or different motion characteristics, try other model families such as Hunyuan and experiment with prompts and settings. This is where the tool’s model catalog and LoRA support will help you find an approach that fits your goals and hardware.

Step-by-Step: From Zero to First Video

To recap the full process I followed:

Prepare the system
- Install NVIDIA drivers and verify with nvidia-smi.
- Install Docker and enable GPU support via the NVIDIA Container Toolkit.
Fetch the code
- Run git clone to download the WanGP repository.
- Change into the project directory.
Launch with Docker
- Execute the provided script to build and run the container.
- Wait as images are pulled and dependencies are installed.
Open the UI
- Visit http://localhost:7860 in your browser.
Select a model family
- Choose Wan 2.1 or another supported family in the left panel.
Pick a model size
- Start with 1.3B for low VRAM, or go larger if your GPU can handle it.
Enter your prompt
- Use a clear description of the motion and scene.
Generate
- Click Generate and let the model download and process the video.
Monitor and review
- Watch VRAM usage during steps.
- Play the output and assess the result.
Iterate

Try alternate models (e.g., Hunyuan) or sizes.
Adjust prompts and settings based on your goals.

Observations and Takeaways

The tool ran reliably through multiple generations.
The first run overhead is front-loaded into model downloads; subsequent runs start faster.
VRAM usage tracked closely with model size, which makes memory planning straightforward.
Running small models locally for video generation is realistic on a mid-range GPU.

These points align with the project’s aim of making video generation practical for users who do not have access to high-end data center GPUs.

Conclusion

WanGP delivered on its promise to make local AI video generation accessible on lower VRAM. The 1.3B test fit comfortably under 5 GB, and the 13B run completed at around 17 GB. Installation with Docker was clear, and the interface made it easy to switch between models and sizes.

If you’re exploring video generation on a commodity GPU, start with a small model, confirm stable runs, and scale thoughtfully. The built-in model catalog, LoRA support, and frequent updates give you room to grow without changing your setup.