Wan 2.2 Animate V2 Update: Fix Blurry Hands and Faces

This article covers the Wan 2.2 Animate V2 update, how it improves motion and facial expression fidelity, and a practical method to fix blurry hands, faces, and masking issues with a second-pass video enhancer workflow. I’ll walk through setup, model requirements, pre-processing changes, recommended settings, and a step-by-step guide to achieve cleaner, sharper, and more consistent outputs.

The flow follows the same order as my working process: installing the new model in the One Video wrapper, configuring detections, running the updated example workflow, comparing the results to the previous version, and applying a second-pass enhancer to fix remaining artifacts.

What Is the Wan Animate V2?

Wan 2.2 Animate V2 is a model update focused on improved motion handling, more accurate facial expression transfer, stronger color consistency, and simpler pre-processing within ComfyUI. It works especially well with the One Video wrapper and supports automated detection for faces and body pose, removing the need for manual point editing.

The update also pairs well with a video-to-video enhancer pass. That second pass can fix masking artifacts, dark shadows during transitions, soft detail on faces and hands, and overall clarity in the scene.

Wan Animate V2 Overview:

Aspect	Details
Model	Wan 2.2 Animate V2 (FP8)
Wrapper	One Video wrapper (ComfyUI)
Improvements	Better motion, more accurate facial expressions, stronger color solidity, cleaner mass edits with source video backgrounds
Pre-processing	Automated segmentation with bounding box detection, VIT pose for body and face; no manual point editor needed
Required Models	VIT pose models, YOLO face/body detection models (YOLO 10 AM Oxy), Wan 2.2 Animate V2 (FP8)
Storage Paths	ComfyUI/models/detections for pose and YOLO models
Workflow Length	Auto long-length triggering if total frames exceed frame window size
Alternative Context	One Videos Context Options node (more stable, but often lower quality)
Post-Enhancement	Second-pass V2V enhancer using VAE encode + KSampler with LoRAs for clarity and identity retention
Recommended Resolution	720p for meaningful gains; keep hardware limits in mind

Key Features of Wan 2.2 Animate V2

Stronger motion replication and more accurate facial expression transfer from reference video to target character.
Improved color solidity on mass edits, particularly when keeping source video backgrounds.
Automated bounding box and pose detection that removes the manual point editor step.
Better character identity consistency across camera cuts and angle changes.

Model Availability and Compatibility

You can get Wan 2.2 Animate V2 from the same Hugging Face repository previously used for the FP8 model. In the One Video Comfy FP8 repository, the V2 model resides under the Wan 2.2 Animate directory. The file size is smaller than before, and it is fully compatible with the One Video wrapper.

I haven’t tested it inside the native ComfyUI node yet, but it should be compatible in that environment as well. The pre-processing nodes used here were authored by the same developer behind the One Video wrapper and can be installed as a separate pre-processor plugin.

Pre-processing Updates That Matter

Pre-processing is much simpler compared to earlier versions. There’s no need to open a point editor, run an initial frame, and place manual dots for pose extraction. Instead, the workflow uses a bounding box detector to auto-locate the subject and VIT pose to track the body and face.

These nodes include:

Bounding box detector for subject localization.
VIT pose for body and face landmark detection.
Pose-face detection combined with an Oxy detections model loader.

All of these connect within the example workflow provided for the One Video wrapper. The pre-processing now autogenerates across the entire clip, resulting in consistent segmentation.

What You Need to Download

Wan 2.2 Animate V2 (FP8) model from the original repository.
VIT pose models and face detection models from the Vit Pose Comfy repository on Hugging Face.
YOLO 10 AM Oxy model files from the official Wan 2.2 Animate repository.

Place the VIT pose and YOLO model files in ComfyUI/models/detections. Keep all detection models in this folder for the example workflow to find them automatically.

Basic Setup Steps

Follow this quick setup before running the workflow:

Download Wan 2.2 Animate V2 (FP8) and place it in your ComfyUI models directory according to the wrapper’s guidance.
Download VIT pose and YOLO detection models and put them in ComfyUI/models/detections.
Install the Wan Animate pre-processing custom nodes via the ComfyUI Manager or the linked repository.
Open the One Video wrapper example workflow for Wan 2.2 Animate V2.

Once these pieces are in place, you can begin testing with your source video and reference image.

Running the Example Workflow

I tested with a DJ reference video set to 500 frames to produce a longer clip. The target character comes from a reference image, which is swapped onto the person in the source video. This transfer includes detailed facial expressions, the overall look, and clothing cues.

The model automatically performs segmentation, draws a bounding box, and detects pose and facial landmarks with high accuracy. The pre-processing preview shows a stable segmentation process throughout the clip.

Long-Length Video Handling

If the total number of frames exceeds the frame window size, the wrapper automatically triggers long-length video generation. This helps with extended sequences without manual adjustments.

As an alternative, you can add the One Videos Context Options node, set total frames and frame window size to the same value, and plug it into the One Video sampler. This tends to run more stably but may reduce quality compared to using the frame window size directly in the One Video Animate node. I prefer the default approach because it consistently looks better.

Swapping Characters With a New Reference Image

When I switched to another reference character, the workflow auto-detected the subject without the point editor. There’s no need to run a frame first, place dots, and generate pose on a per-frame basis. The automated detection reduces setup time and keeps results consistent.

In the new clip, the character’s hairstyle, shirt, and jacket came through clearly. It also avoided copying accessories from the source video that weren’t present on the reference image.

Consistency Through Camera Cuts

A common pain point with model-based video is maintaining identity across angle changes and editing transitions. With Wan 2.2 Animate V2, the character remained consistent across switching shots and different camera angles. The visual style stayed coherent through the full sequence.

This behavior is stronger than prior versions and helps keep identity stable across simple edits and cuts.

Known Issue: Masking Shadows at Transitions

One area that may still need attention is masking around fast transitions. In my test at around 7 seconds, a dark shadow appeared on the subject when stepping back. This was tied to masked regions combined with a low sampling step setting (four steps), which is the common entry configuration.

When masked areas aren’t fully resolved at low steps, shadows or soft patches can appear during transitions. The fix is to apply a second pass with a V2V enhancer workflow that re-samples the clip.

Fixing Blurry Hands, Faces, and Masking With a V2V Enhancer

A second-pass enhancer can clean up soft details, reinforce identity, and fix masking artifacts. The method is straightforward:

Feed the generated video frames into a VAE encoder.
Add a KSampler with proper LoRA support.
Set a denoise value based on how much change you want.
Optionally use a captioning model to auto-generate prompts from the input video.

This approach isn’t limited to Wan 2.2 Animate outputs. It works on other AI-generated videos as well.

What Improves After the Second Pass

After the enhancer pass, I saw:

Sharper music equipment, buttons, stage elements, and lighting.
Clearer wall textures and scene details.
Stronger face definition and identity retention.
Distinct knuckles and fingers with fewer distortions.
Masking shadows removed during transitions, including the problem moment at 7 seconds.

Hands and faces in particular gain clarity, and the enhanced character identity feels consistent across frames.

Inside the Enhancer Workflow

The enhancer uses a text-to-video setup applied to existing frames. It includes:

Wan 2.2 low-noise LoRA for fine detail improvement.
Light X2V LoRA (latest version 1 2.2250928) connected to key nodes.
Realism boost LoRA from Wan 2.1 (Fusion X repo), optional and swappable for your own character LoRAs.
Reward model LoRA for HPS/MPS scoring behavior, optional but helpful.

These LoRAs plug into the sampler group, with the low-noise model attached to the sampling steps as well.

Sampler Settings and Denoise Guidance

The core is a single KSampler:

Load frames through the correct VAE encoder for your model.
Add latent noise.
Set the denoise value based on your input quality.

Guidelines:

If the first pass looks soft or has shadows/artifacts, set denoise around 0.5.
If the first pass is solid and you just want to refine, set it lower. I used 0.2 to fix shadows and sharpen faces without altering identity.

Context Window and Hardware Reality

The enhancer can include a context window node to help with longer outputs. Keep expectations realistic. Actual limits depend on your GPU memory and speed. Plan your frame counts based on available resources rather than aiming for very long sequences that may stall or degrade quality.

For my system, 500–600 frames are usually fine with this setup.

Prompting: Manual or Auto-Captioned

You can write your own positive prompt or use auto-captioning:

A miniCPMV caption node can analyze the input video and generate a caption.
Connect the caption strength output to the positive prompt node to override manual text.
This helps the model focus on what’s actually present in the frames.

This captioning trick reduces guesswork and aligns prompts with the visual content.

Resolution Settings That Matter

I recommend 720p for meaningful improvements with the Light X2V LoRA. I default to 720p width and height and switch orientation by swapping those values to match portrait or a wider format.

Sticking to 480p works but usually won’t produce a big quality jump. If you keep the same resolution across both passes, that’s fine, but you can also apply the classic two-step approach: generate at a lower resolution, then refine at a higher one for cleaner results.

Step-by-Step: Installing and Preparing Models

Use this checklist to get started:

Download Wan 2.2 Animate V2 (FP8) and place it under your ComfyUI models directory.
Get VIT pose and face models from the Vit Pose Comfy repository on Hugging Face.
Get YOLO 10 AM Oxy model files from the official Wan 2.2 Animate repository.
Create ComfyUI/models/detections and put all VIT pose and YOLO models inside.
Install the Wan Animate pre-processing custom nodes via the ComfyUI Manager or the linked repo.
Open the One Video wrapper example for Wan 2.2 Animate V2.

Step-by-Step: Generating With Wan 2.2 Animate V2

Choose a source video and a reference image for the target character.
Set frame count (e.g., 500 for longer output) and resolution.
Keep the frame window size at the default in One Video Animate for the best quality.
Load the detections and confirm the pre-processing preview shows clean subject detection and pose/face tracking.
Run the generation and review the output.

If you notice shadows during transitions or soft faces/hands, plan a second pass.

Step-by-Step: Second-Pass V2V Enhancer

Load the first-pass video into a VAE encoder node compatible with your model.
Add a KSampler and connect:
- Wan 2.2 low-noise LoRA.
- Light X2V LoRA (latest build).
- Optional realism boost (Wan 2.1) and reward model LoRA.
Choose denoise:
- 0.5 for stronger cleanup.
- 0.2 for light refinement and identity preservation.
Optionally attach miniCPMV captioning and connect caption strength to the positive prompt.
Set your target resolution (720p recommended) and run the enhancer.

Tips for Better Consistency

Use the same or higher resolution for the enhancer pass to reveal additional detail.
Keep denoise conservative if identity looks good; raise it only when correcting clear artifacts.
Favor the default frame window method over the Context Options node for better visual quality.

This keeps the character’s look consistent while improving clarity.

Modular Use and Workflow Weight

The enhancer block is modular. You can disconnect the “load video” part and attach the VAE encode + KSampler + LoRAs to other workflows for image resizing and refinement. Be aware that adding these components increases memory use and processing time. For practical work, keep a dedicated enhancer workflow for second-pass cleanup and detail boosts.

Results Summary

Compared to the earlier version of Wan 2.2 Animate:

Motion and facial expressions are more accurate.
Color rendering is stronger during mass edits with source backgrounds.
Automated segmentation and pose detection reduce setup time and eliminate manual point editing.
Identity consistency holds up across camera cuts better.
Applying a second-pass V2V enhancer fixes remaining masks, shadows, and soft detail, producing sharper hands, clearer faces, and richer scene textures.

Troubleshooting and Best Practices

If bounding box or pose detection looks off, confirm your detections folder paths and model files.
For flicker or shadow artifacts, lower sampling steps may be the cause; plan for an enhancer pass.
If the Context Options node reduces visual quality, revert to using the frame window size in the One Video Animate node.
Keep denoise values modest when you want subtle refinement; only raise them when you need stronger corrections.

Conclusion

Wan 2.2 Animate V2 delivers cleaner motion, better facial expression transfer, and simplified pre-processing in ComfyUI through the One Video wrapper. The updated workflow minimizes manual steps, improves color and identity stability across edits, and supports reliable long-length processing.

For persistent artifacts—especially masked shadows, soft faces, and unclear hands—a second-pass V2V enhancer adds the finishing touches. With VAE encoding, a focused KSampler setup, and the right LoRAs, you can restore clarity, sharpen fine details, and reinforce identity without heavy rework. The result is a sharper, more coherent video that fixes the usual weak spots while staying faithful to your reference character.