Tencent HunyuanVideo 1.5: 8B Low‑VRAM ComfyUI workflow

I tested Tencent’s HunyuanVideo-1.5 and the results are solid. Motion feels expressive, details are sharp, and the model holds up well given its size.

The big appeal is practical: it’s an 8B-parameter video model that runs on lower-end GPUs and still outputs up to full HD. It’s open source, easy to pull from common model hubs, and already packaged for ComfyUI.

Below, I cover what it is, what it needs, how to set it up in ComfyUI, and the settings that worked for me.

What Is HunyuanVideo-1.5?

HunyuanVideo-1.5 is a text-and-image-to-video model from Tencent. It targets local use with modest VRAM, yet supports 480p, 720p, and 1080p outputs.

It ships in multiple variants and precisions, including FP16 and FP8, and is available for both text-to-video (T2V) and image-to-video (I2V) pipelines. The FP8 variant is especially light and works well in ComfyUI with the official workflow.

Practical Focus

8B parameters: compact for local setups.
Full HD output: up to 1920×1080.
Two pipelines: T2V and I2V.
Multiple precision options: FP16 and FP8.

Expressive Motion and Detail

In my runs, the model produced expressive poses and stable motion. Anatomy consistency and hands were handled better than expected for this size. Edges looked crisp, and textures stayed clean.

Open Source and Ready for ComfyUI

The model and its required components are hosted openly. There’s a ready-to-use ComfyUI workflow that prompts you to fetch everything you need. Installation is straightforward once you know which files go where.

Table Overview: HunyuanVideo-1.5

Attribute	Details
Parameters	8B
Pipelines	Text-to-Video (T2V), Image-to-Video (I2V)
Output Resolutions	480p, 720p, 1080p
Precisions Available	FP16, FP8
Inference Target	Local GPU with lower VRAM
Required Components	Text encoders, CLIP Vision, diffusion model, VAE
Text Encoders	Qwen2.5-VL-7B FP8; ByT5-small (names may vary slightly by package)
Vision Backbone	CLIP Vision
Model Distribution	Open source; available on common model hubs
ComfyUI Support	Official workflow available; prompts auto-downloads for missing assets
Tested Variant	I2V FP8
Steps Guidance	20+ steps for this pipeline (worked reliably for me)
CFG Guidance	1.0 for this distilled setup (worked reliably for me)
Motion Control	“Shift” set to 5 (worked reliably for me)

Key Features of HunyuanVideo-1.5

Compact and efficient: 8B parameters, suitable for lower-end cards.
High-resolution output: up to full HD.
Strong motion and detail: expressive results with solid anatomy and hands.
Versatile pipelines: text-to-video and image-to-video options.
Open source with ComfyUI workflow: easy to fetch and run.

Availability and Packaging

HunyuanVideo-1.5 is open source. You can fetch it from common hubs and run it locally. There’s an official ComfyUI workflow that bundles the graph and dependencies logic so you can import it and let the Manager fetch what’s missing.

There’s also a “lightex 2V” variant mentioned in the same ecosystem, but I couldn’t get it to render (only black frames in my tests). If you want reliable output, stick to the HunyuanVideo-1.5 packages described below.

Prepare ComfyUI

Before importing the workflow, update ComfyUI so the nodes and loaders are current. This ensures the workflow can resolve dependencies correctly.

Update Steps

Open ComfyUI.
Go to Manager.
Click Update All.
Restart ComfyUI if prompted.

Import the Official Workflow

Drag and drop the official HunyuanVideo-1.5 workflow JSON into ComfyUI.
A popup will list required models and components not currently installed.
Confirm and let the Manager fetch or guide you to download them.

Dependencies You Need

The workflow will reference several assets. Fetch each of the following and place them in the correct folders.

Text Encoders

Qwen2.5-VL-7B FP8
ByT5-small (exact naming can differ slightly in different repos)

These are required for text understanding and alignment in the pipeline.

Vision Backbone

CLIP Vision

This is used for visual encoding where needed in the workflow.

Diffusion Model Variants

FP16 and FP8 variants are provided.
Resolution-specific variants: 480p, 720p, 1080p.
Pipeline-specific variants: Text-to-Video and Image-to-Video.

For my test, I used the Image-to-Video FP8 model. It balanced speed and VRAM well.

VAE

Download the VAE specified by the workflow.
This handles encoding and decoding of latents to pixel space.

Folder Placement

Once downloads finish, place files in the following ComfyUI folders.

Folder Map

Text encoders → comfyui/models/text_encoders
CLIP Vision → comfyui/models/clip_vision
Diffusion model → comfyui/models/checkpoints (or the workflow’s specified folder)
VAE → comfyui/models/vae

If your ComfyUI install uses different paths, match the structure your setup expects. The Manager popup will usually point to exact destinations.

My Test Setup

I ran the Image-to-Video FP8 variant. The goal was to animate a still image with a short descriptive prompt, then measure quality and speed at 480p.

Source and Prompt

Source: a still image I generated earlier.
Prompt: “A young Japanese girl stands looking to the horizon. The wind blows her long black hair and traditional Japanese clothes while she firmly holds her bow.”

The prompt focuses on stance, wind, attire, and an object in hand. That kept motion and pose coherent.

Key Settings That Worked

CFG: 1.0 (this model is distilled; a low CFG worked well in my runs)
Motion “Shift”: 5
Steps: at least 20 for this pipeline

The “lightex 2V” variant mentioned in related repos is supposed to run with fewer steps (around 4–8), but I couldn’t get it to output anything but black frames. I wouldn’t recommend it at the moment.

Results and Quality

At 480p, the render looked clean with stable motion and good edge fidelity. The model handled anatomy and hands better than expected for an 8B checkpoint. For scenes with subtle wind or posture, it preserved intent without turning faces or limbs into noise.

The run was fast for the step count used. Scaling to 720p or 1080p is supported by the model variants, though compute needs will rise with resolution.

Step-by-Step: From Zero to First Video

Follow this process to get a working I2V render:

Update ComfyUI
- Open ComfyUI Manager.
- Click Update All.
- Restart ComfyUI.
Import the Workflow
- Drag the official HunyuanVideo-1.5 workflow JSON into ComfyUI.
- Accept the popup and note the missing assets list.
Download Components
- Text encoders: Qwen2.5-VL-7B FP8 and ByT5-small.
- CLIP Vision.
- Diffusion model: choose your pipeline (I2V) and precision (FP8 or FP16), and preferred resolution.
- VAE specified by the workflow.
Place Files in Folders
- Text encoders → models/text_encoders.
- CLIP Vision → models/clip_vision.
- Diffusion model → models/checkpoints (or as specified).
- VAE → models/vae.
Select the Model Variant
- In the workflow, point the loader to the I2V FP8 model.
- Confirm the VAE path.
Load Your Source Image (for I2V)
- Add your reference image into the image loader node.
Set Prompt and Parameters
- Write a clear, concise prompt.
- CFG: 1.0 worked well for me with this distilled model.
- Motion “Shift”: 5 worked well for me.
- Steps: set 20 or more.
Choose Resolution
- For a quick test, start with 480p.
- Move up to 720p or 1080p after verifying your setup.
Render
- Start the workflow and monitor logs for missing assets or path errors.
- Save the output when complete.

Choosing Variants and Precision

Select a combination that fits your hardware and goals.

Pipeline Choice

Image-to-Video (I2V): animate a still image from a short prompt.
Text-to-Video (T2V): generate a full video from text alone.

I used I2V FP8 for a lightweight, fast run.

Precision Choice

FP8: lower VRAM use and faster throughput; ideal for modest GPUs.
FP16: higher precision; larger VRAM footprint.

For first runs on a smaller GPU, FP8 is the safer option.

Resolution Choice

480p: best for initial tests and quick iteration.
720p: balance between speed and detail.
1080p: highest fidelity; heavier on memory and time.

Prompting and Control Settings

Keep prompts focused and specific. Describe the subject, motion, and any key visual elements that matter for your shot.

Controls That Mattered

CFG at 1.0
- Worked well for this distilled setup.
- Helps avoid over-saturation or unwanted artifacts.
Motion “Shift” at 5
- Gave smooth transitions in my test.
Steps at 20+
- Below 20, I saw less stable output with this pipeline.

If you’re experimenting with the “lightex 2V” variant mentioned in related repos, be aware that it produced black images for me. Stick with HunyuanVideo-1.5 for now.

Performance Notes

On 480p, runs completed quickly for the step count used. The model’s 8B size keeps memory needs modest. Scaling to higher resolutions works, but plan for longer renders and more memory.

If you hit slowdowns or memory warnings, try these:

Use FP8 rather than FP16.
Lower resolution from 1080p to 720p or 480p.
Keep steps in a reasonable range (20–30 for I2V).

File Organization Tips

Keeping assets organized reduces load failures and path issues.

Keep a dedicated folder for HunyuanVideo-1.5 models.
Confirm that filenames match those expected by the workflow.
If you rename files, update the ComfyUI nodes accordingly.

Troubleshooting

If your render is blank or black:

Verify you’re loading the HunyuanVideo-1.5 model, not the “lightex 2V” variant.
Check that the VAE is present and correctly linked.
Make sure the text encoders and CLIP Vision are installed in the right folders.
Update ComfyUI and restart to clear node cache issues.

If the output looks noisy or unstable:

Increase steps slightly (e.g., from 20 to 24–28).
Keep CFG at 1.0 for this distilled setup.
Ensure the prompt is specific and not contradictory.

If the model won’t load:

Confirm precision compatibility (FP8 vs FP16) with your build.
Check VRAM usage and reduce resolution or switch to FP8.

What Stood Out in Testing

Expressive motion: the model produced convincing movement from a still image.
Clean details: sharp edges and stable textures.
Better hands and anatomy than expected for 8B.

These traits carried through even at 480p. With higher resolutions, expect greater detail but budget for the extra compute.

Recommended Starting Recipe

Here’s a reliable starting point for quick validation:

Variant: HunyuanVideo-1.5 I2V FP8
Resolution: 480p
Steps: 20
CFG: 1.0
Motion “Shift”: 5
Input: one clear still image
Prompt: short, focused description of pose, motion, and attire

Once this is working, scale to 720p or 1080p and refine prompts or timing as needed.

Notes on the “lightex 2V” Mention

The ecosystem references a lightex 2V version. In my tests, it returned black frames. It’s mentioned that it might run at lower steps (around 4–8), but I couldn’t validate this. For stable work, I recommend the HunyuanVideo-1.5 packages outlined above.

Maintenance and Updates

When a new model or workflow revision appears:

Update ComfyUI Manager first.
Replace the workflow JSON with the latest official version.
Review model naming changes and re-link nodes if needed.
Keep archives of known-good setups to roll back if something breaks.

Summary

HunyuanVideo-1.5 delivers a practical video generation setup that runs locally on modest hardware. With just 8B parameters, it supports up to full HD, handles motion well, and keeps details sharp. The open-source release, multiple variants (FP8/FP16; 480p/720p/1080p; T2V/I2V), and an official ComfyUI workflow make it easy to adopt.

For a quick, reliable start, use the I2V FP8 model at 480p with 20 steps, CFG 1.0, and Motion “Shift” at 5. Once validated, scale resolution and tune settings to match your content and hardware.