CausalCine: Real-Time Video Narratives with Autoregression

What is CausalCine: Real Time Video Narratives with Autoregression

CausalCine is a research project that makes longer videos by building them shot by shot in real time. It lets you add a new prompt at any time and keeps the story clear across many shots.

CausalCine: Real-Time Video Narratives with Autoregression

It streams the video as it is made so you can direct while it runs. It also remembers past shots in a smart way so new shots stay in the same world and match the story. The team shows this with an interactive demo and a large gallery of samples.

Overview

Here is a quick look at the project.

Item	Details
Type	Research project for real time multi shot video generation
Purpose	Create long video stories that can be directed live with new prompts
Main features	Real time directing, causal multi shot generation, content aware memory, prompt anytime
Speed	16 FPS streaming generation reported
Hardware	Demo runs on 8 NVIDIA H200 GPUs
Input	Text prompts given per shot and during streaming
Output	Multi shot videos with stable story and shared context
Memory	Content aware KV memory to recall earlier shots by meaning
Demo	Interactive demo and a large video gallery on the project page
Paper	Research paper linked on the project site
Authors	Yihao Meng, Zichen Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yue Yu, Hanlin Wang, Haobo Li, Jiapeng Zhu, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen, Huamin Qu
Institutions	HKUST, Ant Group, SJTU
Project page	Visit CausalCine site to learn more

Key Features

Real time directing
The system streams video at 16 FPS in the team demo. You can see each shot grow frame by frame while you add new ideas.
Causal multi shot
It builds a story across many shots without breaking the flow. New content fits into the next shot while the past still guides it.
Content aware memory
It can recall past shots that matter by meaning. This helps keep faces, places, and style in line over time.
Prompt anytime
You can add fresh directions at any point. Past shots do not need to be recomputed.
Story focus
It keeps long range context so the same people and places make sense across shots. This helps keep the story stable from start to end.

If you are curious about fast audio models that react in real time, check out our note on Voxtral Mini in real time.

Use Cases

Live storyboarding
Writers and directors can sketch scenes and change prompts on the fly. This helps shape tone and timing before full production.
Pre viz for ads and short films
Teams can test ideas fast with quick prompts per shot. It helps pick camera moves and scene beats.
Content creation
Creators can build a series with the same cast and world over many shots. The memory feature helps keep identity and style steady.
Education and research
Students can learn how prompts affect story flow shot by shot. Labs can test new ways to keep context in long videos.

Looking for tools that turn a single image into motion If yes, see our short guide on image to video models.

Performance and Showcases

Below are short notes from the project page demos. Each clip shows a part of how CausalCine works and what the team measured.

Showcase 1 — Interactive Demo This clip shows the Interactive Demo. You can add prompts during generation and see how each new shot follows the past shots. It streams at a steady rate in the demo.

Showcase 2 — Sample 1 This clip is Sample 1. It shows a clean story flow across shots. Note how the scene and look stay steady.

Showcase 3 — Sample 2 This clip is Sample 2. It keeps the same setting while prompts shift the focus. The memory keeps the style and identity stable.

Showcase 4 — Sample 3 This clip is Sample 3. It moves through new shots while holding on to the story. Small prompt changes guide each new beat.

Showcase 5 — Sample 4 This clip is Sample 4. It shows how the model keeps context over time. The video does not drift away from the story.

Showcase 6 — Sample 5 This clip is Sample 5. It adds fresh prompts and still keeps the cast and place consistent. The story stays clear.

How CausalCine Works

CausalCine makes videos in a chain. Each new frame and shot depends on what came before. This is called autoregression.

You can type a prompt for the next shot while the video is still running. The system takes your new text and blends it into the next part without going back to redo the past.

To keep long stories steady, it stores compact notes from past shots in a key value memory. When it starts a new shot, it pulls the notes that match the meaning of your new prompt.

The Technology Behind It

Autoregression means the next part is built on the last part. This keeps motion and look steady over time.

The content aware memory acts like a library of past shots. When you give a new prompt, it finds the most related notes so the next shot fits the story.

The team reports 16 FPS streaming on a strong setup with 8 NVIDIA H200 GPUs. That is how they show real time directing in the demo.

If you work with edits that remove things from a clip, you may also like our quick read on how to erase objects in video with AI.

Getting Started

You can try the project through the interactive demo on the official page.

Open the project website in a new tab.
Find the Interactive Demo section on the page.
Enter a short text prompt for the next shot, then watch the stream as it builds.

If you plan a test workflow, here is a simple path.

Start with a base shot prompt like a place and time.
Add a new prompt for the next shot when the stream reaches the cut.
Repeat and adjust as the story grows to keep tone and pacing.

Tips for Better Results

Keep prompts short and clear. Name the place, subject, and action.
Reuse key names for the same person or object so the memory ties them across shots.
Change only one or two parts per new prompt to guide the story smoothly.

FAQ

What makes CausalCine special for longer videos

It builds videos shot by shot and keeps context with a smart memory. You can add new prompts at any time and the story stays on track.

How fast does it run

The team reports 16 FPS streaming in their demo on a strong GPU setup. That speed lets you direct while it runs.

Can I change the story during generation

Yes. You can add a new prompt at any point. The system does not redo past shots and keeps the new shots linked to the old ones.

What kind of hardware does the team use in the demo

They note an example run on 8 NVIDIA H200 GPUs. This helps reach steady streaming for the demo.

Where can I watch more examples

There is a large video gallery on the project page. You can see many samples and a comparison section.

Image source: CausalCine: Real-Time Video Narratives with Autoregression

CausalCine: Real-Time Video Narratives with Autoregression

What is CausalCine: Real Time Video Narratives with Autoregression

Overview

Key Features

Use Cases

Performance and Showcases

How CausalCine Works

The Technology Behind It

Getting Started

Tips for Better Results

FAQ

What makes CausalCine special for longer videos

How fast does it run

Can I change the story during generation

What kind of hardware does the team use in the demo

Where can I watch more examples

Subscribe to our newsletter

Sonu Sahani

Related Posts

Relit-LiVE: Enhancing Videos by Learning Environment Together

DreamX-World: The Future of Interactive World Models

MoCam: Exploring Extreme Viewpoint 4D Motion Capture Technology