Act2Cut — Comparison & Benchmark

ACT Ⅴ

Qualitative Comparison

Text → Video, side-by-side against open-source baselines.

Model

T01

Tying hair

T02

Walk toward

T03

Dance move

HoloCine-14B

MSM-14B

★ Act2Cut

Action: [Character 1] is tying up her hair. Shot 1: MCU on hand. Shot 2: OTS on face.

Action: [Character 1] walks toward [Character 2]. Shot 1: Wide. Shot 2: Mid-Shot, dolly-in.

Action: [Character 1] performs a dance move. Shot 1: Wide. Shot 2: Low-Angle Mid-Shot.

ACT Ⅳ

Benchmark Results

ActCutBench, VistoryBench, and VBench evaluation. ★ = Act2Cut (Ours).

ActCutBench — Global Content Sub-Bench

Evaluates narrative consistency (NC), visual coherence (VC), action continuity (AC), subject correctness (SC), and aesthetics (Aes) across generated multi-shot sequences. All metrics ↑ higher is better.

Method	Base Model	NC ↑					VC ↑				AC ↑		SC ↑		Aes ↑
Method	Base Model	Env	Sce	Sty	CS	CA/OM	Env/Sce	Sty	CS	CA/OM	CA	OM	CC	CID	Aes ↑
Open-Source Methods — ActCutBench
HoloCine-14B	Wan2.2-T2V-14B	—	—	—	—	—	—	—	—	—	—	—	—	—
MSM-1.3B	Wan2.1-TI2V-1.3B	—	—	—	—	—	—	—	—	—	—	—	—	—
MSM-14B	Wan2.1-T2V-14B	—	—	—	—	—	—	—	—	—	—	—	—	—
★ Act2Cut (Ours)	Wan2.2-TI2V-5B	—	—	—	—	—	—	—	—	—	—	—	—	—
ActCutBench-Lite — Ablation
★ Act2Cut (full)	Wan2.2-TI2V-5B	—	—	—	—	—	—	—	—	—	—	—	—	—
w/o GLoS-RoPE	Wan2.2-TI2V-5B	—	—	—	—	—	—	—	—	—	—	—	—	—
w/o TBR	Wan2.2-TI2V-5B	—	—	—	—	—	—	—	—	—	—	—	—	—
w/o SCM	Wan2.2-TI2V-5B	—	—	—	—	—	—	—	—	—	—	—	—	—
w/o HCM	Wan2.2-TI2V-5B	—	—	—	—	—	—	—	—	—	—	—	—	—

ActCutBench — Cinematography Sub-Bench

Evaluates adherence to static cinematography (SC), dynamic cinematography (DC), and relation cinematography (RC) specifications per generated shot.

Method	Base Model	Static Cinematography (SC) ↑								Dynamic (DC) ↑			Relation (RC) ↑
Method	Base Model	SS	LT	FF	FV	AE	AA	AD	DoF	CM	CZ	CR	ST	Cont.	Narr.
Open-Source Methods — ActCutBench
HoloCine-14B	Wan2.2-T2V-14B	—	—	—	—	—	—	—	—	—	—	—	—	—	—
MSM-1.3B	Wan2.1-TI2V-1.3B	—	—	—	—	—	—	—	—	—	—	—	—	—	—
MSM-14B	Wan2.1-T2V-14B	—	—	—	—	—	—	—	—	—	—	—	—	—	—
★ Act2Cut (Ours)	Wan2.2-TI2V-5B	—	—	—	—	—	—	—	—	—	—	—	—	—	—

VistoryBench — Story Visualization

Comprehensive benchmark for story visualization across character consistency, scene coherence, and narrative fidelity metrics.

Method	Base Model	Char. Consist. ↑	Scene Coher. ↑	Style Consist. ↑	Narrative ↑	Overall ↑
HoloCine-14B	Wan2.2-T2V-14B	—	—	—	—	—
MSM-14B	Wan2.1-T2V-14B	—	—	—	—	—
★ Act2Cut (Ours)	Wan2.2-TI2V-5B	—	—	—	—	—

VBench — Video Quality

Holistic evaluation covering video quality, semantic alignment, temporal consistency, and perceptual fidelity for generated video sequences.

Method	Base Model	Quality ↑	Semantic ↑	Temporal ↑	Aesthetic ↑	Total ↑
HoloCine-14B	Wan2.2-T2V-14B	—	—	—	—	—
MSM-14B	Wan2.1-T2V-14B	—	—	—	—	—
★ Act2Cut (Ours)	Wan2.2-TI2V-5B	—	—	—	—	—

Metric Visualization Chart Replace with radar / bar chart when data is available

ACT Ⅵ

BibTeX

@article{zhuang2026act2cut,
  title={Act2Cut: Continuous Next-Shot Video Narrative Match on Action-Cut},
  author={Zhuang, Cailin and Hu, Yaoqi and Dong, Zheng and Zhang, Shiwen and Huang, Haibin and Zhang, Chi and Li, Xuelong},
  journal={ACM Transactions on Graphics (TOG)},
  volume={1},
  number={1},
  year={2026},
  publisher={ACM New York, NY, USA}
}