Table Of Content
- What Is HunyuanVideo-1.5?
- Practical Focus
- Expressive Motion and Detail
- Open Source and Ready for ComfyUI
- Table Overview: HunyuanVideo-1.5
- Key Features of HunyuanVideo-1.5
- Availability and Packaging
- Prepare ComfyUI
- Update Steps
- Import the Official Workflow
- Dependencies You Need
- Text Encoders
- Vision Backbone
- Diffusion Model Variants
- VAE
- Folder Placement
- Folder Map
- My Test Setup
- Source and Prompt
- Key Settings That Worked
- Results and Quality
- Step-by-Step: From Zero to First Video
- Choosing Variants and Precision
- Pipeline Choice
- Precision Choice
- Resolution Choice
- Prompting and Control Settings
- Controls That Mattered
- Performance Notes
- File Organization Tips
- Troubleshooting
- What Stood Out in Testing
- Recommended Starting Recipe
- Notes on the “lightex 2V” Mention
- Maintenance and Updates
- Summary

Tencent HunyuanVideo 1.5: 8B Low‑VRAM ComfyUI workflow
Table Of Content
- What Is HunyuanVideo-1.5?
- Practical Focus
- Expressive Motion and Detail
- Open Source and Ready for ComfyUI
- Table Overview: HunyuanVideo-1.5
- Key Features of HunyuanVideo-1.5
- Availability and Packaging
- Prepare ComfyUI
- Update Steps
- Import the Official Workflow
- Dependencies You Need
- Text Encoders
- Vision Backbone
- Diffusion Model Variants
- VAE
- Folder Placement
- Folder Map
- My Test Setup
- Source and Prompt
- Key Settings That Worked
- Results and Quality
- Step-by-Step: From Zero to First Video
- Choosing Variants and Precision
- Pipeline Choice
- Precision Choice
- Resolution Choice
- Prompting and Control Settings
- Controls That Mattered
- Performance Notes
- File Organization Tips
- Troubleshooting
- What Stood Out in Testing
- Recommended Starting Recipe
- Notes on the “lightex 2V” Mention
- Maintenance and Updates
- Summary
I tested Tencent’s HunyuanVideo-1.5 and the results are solid. Motion feels expressive, details are sharp, and the model holds up well given its size.
The big appeal is practical: it’s an 8B-parameter video model that runs on lower-end GPUs and still outputs up to full HD. It’s open source, easy to pull from common model hubs, and already packaged for ComfyUI.
Below, I cover what it is, what it needs, how to set it up in ComfyUI, and the settings that worked for me.
What Is HunyuanVideo-1.5?
HunyuanVideo-1.5 is a text-and-image-to-video model from Tencent. It targets local use with modest VRAM, yet supports 480p, 720p, and 1080p outputs.
It ships in multiple variants and precisions, including FP16 and FP8, and is available for both text-to-video (T2V) and image-to-video (I2V) pipelines. The FP8 variant is especially light and works well in ComfyUI with the official workflow.
Practical Focus
- 8B parameters: compact for local setups.
- Full HD output: up to 1920×1080.
- Two pipelines: T2V and I2V.
- Multiple precision options: FP16 and FP8.
Expressive Motion and Detail
In my runs, the model produced expressive poses and stable motion. Anatomy consistency and hands were handled better than expected for this size. Edges looked crisp, and textures stayed clean.
Open Source and Ready for ComfyUI
The model and its required components are hosted openly. There’s a ready-to-use ComfyUI workflow that prompts you to fetch everything you need. Installation is straightforward once you know which files go where.
Table Overview: HunyuanVideo-1.5
| Attribute | Details |
|---|---|
| Parameters | 8B |
| Pipelines | Text-to-Video (T2V), Image-to-Video (I2V) |
| Output Resolutions | 480p, 720p, 1080p |
| Precisions Available | FP16, FP8 |
| Inference Target | Local GPU with lower VRAM |
| Required Components | Text encoders, CLIP Vision, diffusion model, VAE |
| Text Encoders | Qwen2.5-VL-7B FP8; ByT5-small (names may vary slightly by package) |
| Vision Backbone | CLIP Vision |
| Model Distribution | Open source; available on common model hubs |
| ComfyUI Support | Official workflow available; prompts auto-downloads for missing assets |
| Tested Variant | I2V FP8 |
| Steps Guidance | 20+ steps for this pipeline (worked reliably for me) |
| CFG Guidance | 1.0 for this distilled setup (worked reliably for me) |
| Motion Control | “Shift” set to 5 (worked reliably for me) |
Key Features of HunyuanVideo-1.5
- Compact and efficient: 8B parameters, suitable for lower-end cards.
- High-resolution output: up to full HD.
- Strong motion and detail: expressive results with solid anatomy and hands.
- Versatile pipelines: text-to-video and image-to-video options.
- Open source with ComfyUI workflow: easy to fetch and run.
Availability and Packaging
HunyuanVideo-1.5 is open source. You can fetch it from common hubs and run it locally. There’s an official ComfyUI workflow that bundles the graph and dependencies logic so you can import it and let the Manager fetch what’s missing.
There’s also a “lightex 2V” variant mentioned in the same ecosystem, but I couldn’t get it to render (only black frames in my tests). If you want reliable output, stick to the HunyuanVideo-1.5 packages described below.
Prepare ComfyUI
Before importing the workflow, update ComfyUI so the nodes and loaders are current. This ensures the workflow can resolve dependencies correctly.
Update Steps
- Open ComfyUI.
- Go to Manager.
- Click Update All.
- Restart ComfyUI if prompted.
Import the Official Workflow
- Drag and drop the official HunyuanVideo-1.5 workflow JSON into ComfyUI.
- A popup will list required models and components not currently installed.
- Confirm and let the Manager fetch or guide you to download them.
Dependencies You Need
The workflow will reference several assets. Fetch each of the following and place them in the correct folders.
Text Encoders
- Qwen2.5-VL-7B FP8
- ByT5-small (exact naming can differ slightly in different repos)
These are required for text understanding and alignment in the pipeline.
Vision Backbone
- CLIP Vision
This is used for visual encoding where needed in the workflow.
Diffusion Model Variants
- FP16 and FP8 variants are provided.
- Resolution-specific variants: 480p, 720p, 1080p.
- Pipeline-specific variants: Text-to-Video and Image-to-Video.
For my test, I used the Image-to-Video FP8 model. It balanced speed and VRAM well.
VAE
- Download the VAE specified by the workflow.
- This handles encoding and decoding of latents to pixel space.
Folder Placement
Once downloads finish, place files in the following ComfyUI folders.
Folder Map
- Text encoders → comfyui/models/text_encoders
- CLIP Vision → comfyui/models/clip_vision
- Diffusion model → comfyui/models/checkpoints (or the workflow’s specified folder)
- VAE → comfyui/models/vae
If your ComfyUI install uses different paths, match the structure your setup expects. The Manager popup will usually point to exact destinations.
My Test Setup
I ran the Image-to-Video FP8 variant. The goal was to animate a still image with a short descriptive prompt, then measure quality and speed at 480p.
Source and Prompt
- Source: a still image I generated earlier.
- Prompt: “A young Japanese girl stands looking to the horizon. The wind blows her long black hair and traditional Japanese clothes while she firmly holds her bow.”
The prompt focuses on stance, wind, attire, and an object in hand. That kept motion and pose coherent.
Key Settings That Worked
- CFG: 1.0 (this model is distilled; a low CFG worked well in my runs)
- Motion “Shift”: 5
- Steps: at least 20 for this pipeline
The “lightex 2V” variant mentioned in related repos is supposed to run with fewer steps (around 4–8), but I couldn’t get it to output anything but black frames. I wouldn’t recommend it at the moment.
Results and Quality
At 480p, the render looked clean with stable motion and good edge fidelity. The model handled anatomy and hands better than expected for an 8B checkpoint. For scenes with subtle wind or posture, it preserved intent without turning faces or limbs into noise.
The run was fast for the step count used. Scaling to 720p or 1080p is supported by the model variants, though compute needs will rise with resolution.
Step-by-Step: From Zero to First Video
Follow this process to get a working I2V render:
-
Update ComfyUI
- Open ComfyUI Manager.
- Click Update All.
- Restart ComfyUI.
-
Import the Workflow
- Drag the official HunyuanVideo-1.5 workflow JSON into ComfyUI.
- Accept the popup and note the missing assets list.
-
Download Components
- Text encoders: Qwen2.5-VL-7B FP8 and ByT5-small.
- CLIP Vision.
- Diffusion model: choose your pipeline (I2V) and precision (FP8 or FP16), and preferred resolution.
- VAE specified by the workflow.
-
Place Files in Folders
- Text encoders → models/text_encoders.
- CLIP Vision → models/clip_vision.
- Diffusion model → models/checkpoints (or as specified).
- VAE → models/vae.
-
Select the Model Variant
- In the workflow, point the loader to the I2V FP8 model.
- Confirm the VAE path.
-
Load Your Source Image (for I2V)
- Add your reference image into the image loader node.
-
Set Prompt and Parameters
- Write a clear, concise prompt.
- CFG: 1.0 worked well for me with this distilled model.
- Motion “Shift”: 5 worked well for me.
- Steps: set 20 or more.
-
Choose Resolution
- For a quick test, start with 480p.
- Move up to 720p or 1080p after verifying your setup.
-
Render
- Start the workflow and monitor logs for missing assets or path errors.
- Save the output when complete.
Choosing Variants and Precision
Select a combination that fits your hardware and goals.
Pipeline Choice
- Image-to-Video (I2V): animate a still image from a short prompt.
- Text-to-Video (T2V): generate a full video from text alone.
I used I2V FP8 for a lightweight, fast run.
Precision Choice
- FP8: lower VRAM use and faster throughput; ideal for modest GPUs.
- FP16: higher precision; larger VRAM footprint.
For first runs on a smaller GPU, FP8 is the safer option.
Resolution Choice
- 480p: best for initial tests and quick iteration.
- 720p: balance between speed and detail.
- 1080p: highest fidelity; heavier on memory and time.
Prompting and Control Settings
Keep prompts focused and specific. Describe the subject, motion, and any key visual elements that matter for your shot.
Controls That Mattered
- CFG at 1.0
- Worked well for this distilled setup.
- Helps avoid over-saturation or unwanted artifacts.
- Motion “Shift” at 5
- Gave smooth transitions in my test.
- Steps at 20+
- Below 20, I saw less stable output with this pipeline.
If you’re experimenting with the “lightex 2V” variant mentioned in related repos, be aware that it produced black images for me. Stick with HunyuanVideo-1.5 for now.
Performance Notes
On 480p, runs completed quickly for the step count used. The model’s 8B size keeps memory needs modest. Scaling to higher resolutions works, but plan for longer renders and more memory.
If you hit slowdowns or memory warnings, try these:
- Use FP8 rather than FP16.
- Lower resolution from 1080p to 720p or 480p.
- Keep steps in a reasonable range (20–30 for I2V).
File Organization Tips
Keeping assets organized reduces load failures and path issues.
- Keep a dedicated folder for HunyuanVideo-1.5 models.
- Confirm that filenames match those expected by the workflow.
- If you rename files, update the ComfyUI nodes accordingly.
Troubleshooting
If your render is blank or black:
- Verify you’re loading the HunyuanVideo-1.5 model, not the “lightex 2V” variant.
- Check that the VAE is present and correctly linked.
- Make sure the text encoders and CLIP Vision are installed in the right folders.
- Update ComfyUI and restart to clear node cache issues.
If the output looks noisy or unstable:
- Increase steps slightly (e.g., from 20 to 24–28).
- Keep CFG at 1.0 for this distilled setup.
- Ensure the prompt is specific and not contradictory.
If the model won’t load:
- Confirm precision compatibility (FP8 vs FP16) with your build.
- Check VRAM usage and reduce resolution or switch to FP8.
What Stood Out in Testing
- Expressive motion: the model produced convincing movement from a still image.
- Clean details: sharp edges and stable textures.
- Better hands and anatomy than expected for 8B.
These traits carried through even at 480p. With higher resolutions, expect greater detail but budget for the extra compute.
Recommended Starting Recipe
Here’s a reliable starting point for quick validation:
- Variant: HunyuanVideo-1.5 I2V FP8
- Resolution: 480p
- Steps: 20
- CFG: 1.0
- Motion “Shift”: 5
- Input: one clear still image
- Prompt: short, focused description of pose, motion, and attire
Once this is working, scale to 720p or 1080p and refine prompts or timing as needed.
Notes on the “lightex 2V” Mention
The ecosystem references a lightex 2V version. In my tests, it returned black frames. It’s mentioned that it might run at lower steps (around 4–8), but I couldn’t validate this. For stable work, I recommend the HunyuanVideo-1.5 packages outlined above.
Maintenance and Updates
When a new model or workflow revision appears:
- Update ComfyUI Manager first.
- Replace the workflow JSON with the latest official version.
- Review model naming changes and re-link nodes if needed.
- Keep archives of known-good setups to roll back if something breaks.
Summary
HunyuanVideo-1.5 delivers a practical video generation setup that runs locally on modest hardware. With just 8B parameters, it supports up to full HD, handles motion well, and keeps details sharp. The open-source release, multiple variants (FP8/FP16; 480p/720p/1080p; T2V/I2V), and an official ComfyUI workflow make it easy to adopt.
For a quick, reliable start, use the I2V FP8 model at 480p with 20 steps, CFG 1.0, and Motion “Shift” at 5. Once validated, scale resolution and tune settings to match your content and hardware.
Related Posts

Chroma 4B: Exploring End-to-End Virtual Human Dialogue Models
Chroma 4B: Exploring End-to-End Virtual Human Dialogue Models

Qwen3-TTS: Create Custom Voices from Text Descriptions Easily
Qwen3-TTS: Create Custom Voices from Text Descriptions Easily

How to Fix Google AI Studio Failed To Generate Content Permission Denied?
How to Fix Google AI Studio Failed To Generate Content Permission Denied?

