Table Of Content
- What is CausalCine: Real Time Video Narratives with Autoregression
- Overview
- Key Features
- Use Cases
- Performance and Showcases
- How CausalCine Works
- The Technology Behind It
- Getting Started
- Tips for Better Results
- FAQ
- What makes CausalCine special for longer videos
- How fast does it run
- Can I change the story during generation
- What kind of hardware does the team use in the demo
- Where can I watch more examples

CausalCine: Real-Time Video Narratives with Autoregression
Table Of Content
- What is CausalCine: Real Time Video Narratives with Autoregression
- Overview
- Key Features
- Use Cases
- Performance and Showcases
- How CausalCine Works
- The Technology Behind It
- Getting Started
- Tips for Better Results
- FAQ
- What makes CausalCine special for longer videos
- How fast does it run
- Can I change the story during generation
- What kind of hardware does the team use in the demo
- Where can I watch more examples
What is CausalCine: Real Time Video Narratives with Autoregression
CausalCine is a research project that makes longer videos by building them shot by shot in real time. It lets you add a new prompt at any time and keeps the story clear across many shots.

It streams the video as it is made so you can direct while it runs. It also remembers past shots in a smart way so new shots stay in the same world and match the story. The team shows this with an interactive demo and a large gallery of samples.
Overview
Here is a quick look at the project.
| Item | Details |
|---|---|
| Type | Research project for real time multi shot video generation |
| Purpose | Create long video stories that can be directed live with new prompts |
| Main features | Real time directing, causal multi shot generation, content aware memory, prompt anytime |
| Speed | 16 FPS streaming generation reported |
| Hardware | Demo runs on 8 NVIDIA H200 GPUs |
| Input | Text prompts given per shot and during streaming |
| Output | Multi shot videos with stable story and shared context |
| Memory | Content aware KV memory to recall earlier shots by meaning |
| Demo | Interactive demo and a large video gallery on the project page |
| Paper | Research paper linked on the project site |
| Authors | Yihao Meng, Zichen Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yue Yu, Hanlin Wang, Haobo Li, Jiapeng Zhu, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen, Huamin Qu |
| Institutions | HKUST, Ant Group, SJTU |
| Project page | Visit CausalCine site to learn more |
Key Features
-
Real time directing
-
The system streams video at 16 FPS in the team demo. You can see each shot grow frame by frame while you add new ideas.
-
Causal multi shot
-
It builds a story across many shots without breaking the flow. New content fits into the next shot while the past still guides it.
-
Content aware memory
-
It can recall past shots that matter by meaning. This helps keep faces, places, and style in line over time.
-
Prompt anytime
-
You can add fresh directions at any point. Past shots do not need to be recomputed.
-
Story focus
-
It keeps long range context so the same people and places make sense across shots. This helps keep the story stable from start to end.
If you are curious about fast audio models that react in real time, check out our note on Voxtral Mini in real time.
Use Cases
-
Live storyboarding
-
Writers and directors can sketch scenes and change prompts on the fly. This helps shape tone and timing before full production.
-
Pre viz for ads and short films
-
Teams can test ideas fast with quick prompts per shot. It helps pick camera moves and scene beats.
-
Content creation
-
Creators can build a series with the same cast and world over many shots. The memory feature helps keep identity and style steady.
-
Education and research
-
Students can learn how prompts affect story flow shot by shot. Labs can test new ways to keep context in long videos.
Looking for tools that turn a single image into motion If yes, see our short guide on image to video models.
Performance and Showcases
Below are short notes from the project page demos. Each clip shows a part of how CausalCine works and what the team measured.
Showcase 1 — Interactive Demo This clip shows the Interactive Demo. You can add prompts during generation and see how each new shot follows the past shots. It streams at a steady rate in the demo.
Showcase 2 — Sample 1 This clip is Sample 1. It shows a clean story flow across shots. Note how the scene and look stay steady.
Showcase 3 — Sample 2 This clip is Sample 2. It keeps the same setting while prompts shift the focus. The memory keeps the style and identity stable.
Showcase 4 — Sample 3 This clip is Sample 3. It moves through new shots while holding on to the story. Small prompt changes guide each new beat.
Showcase 5 — Sample 4 This clip is Sample 4. It shows how the model keeps context over time. The video does not drift away from the story.
Showcase 6 — Sample 5 This clip is Sample 5. It adds fresh prompts and still keeps the cast and place consistent. The story stays clear.
How CausalCine Works
CausalCine makes videos in a chain. Each new frame and shot depends on what came before. This is called autoregression.
You can type a prompt for the next shot while the video is still running. The system takes your new text and blends it into the next part without going back to redo the past.
To keep long stories steady, it stores compact notes from past shots in a key value memory. When it starts a new shot, it pulls the notes that match the meaning of your new prompt.
The Technology Behind It
Autoregression means the next part is built on the last part. This keeps motion and look steady over time.
The content aware memory acts like a library of past shots. When you give a new prompt, it finds the most related notes so the next shot fits the story.
The team reports 16 FPS streaming on a strong setup with 8 NVIDIA H200 GPUs. That is how they show real time directing in the demo.
If you work with edits that remove things from a clip, you may also like our quick read on how to erase objects in video with AI.
Getting Started
You can try the project through the interactive demo on the official page.
- Open the project website in a new tab.
- Find the Interactive Demo section on the page.
- Enter a short text prompt for the next shot, then watch the stream as it builds.
If you plan a test workflow, here is a simple path.
- Start with a base shot prompt like a place and time.
- Add a new prompt for the next shot when the stream reaches the cut.
- Repeat and adjust as the story grows to keep tone and pacing.
Tips for Better Results
- Keep prompts short and clear. Name the place, subject, and action.
- Reuse key names for the same person or object so the memory ties them across shots.
- Change only one or two parts per new prompt to guide the story smoothly.
FAQ
What makes CausalCine special for longer videos
It builds videos shot by shot and keeps context with a smart memory. You can add new prompts at any time and the story stays on track.
How fast does it run
The team reports 16 FPS streaming in their demo on a strong GPU setup. That speed lets you direct while it runs.
Can I change the story during generation
Yes. You can add a new prompt at any point. The system does not redo past shots and keeps the new shots linked to the old ones.
What kind of hardware does the team use in the demo
They note an example run on 8 NVIDIA H200 GPUs. This helps reach steady streaming for the demo.
Where can I watch more examples
There is a large video gallery on the project page. You can see many samples and a comparison section.
Image source: CausalCine: Real-Time Video Narratives with Autoregression
Subscribe to our newsletter
Get the latest updates and articles directly in your inbox.
Related Posts

DreamX-World: The Future of Interactive World Models
DreamX-World: The Future of Interactive World Models

MoCam: Exploring Extreme Viewpoint 4D Motion Capture Technology
MoCam: Exploring Extreme Viewpoint 4D Motion Capture Technology

Wrap-As-History: How Camera-Controlled Video Generation Transforms History?
Wrap-As-History: How Camera-Controlled Video Generation Transforms History?

