Pyramidal Flow Matching for Efficient Video Generative Modeling

1Peking University, 2Kuaishou Technology, 3Beijing University of Posts and Telecommunications

The following videos are from a training-efficient Autoregressive Video Generation model based on Flow Matching. It is trained only on open-source datasets within 20.7k A100 GPU hours.



Qualitative Results

Text-to-Video Generation (1280x768, 10s, 24fps)

Text-to-Video Generation (1280x768, 5s, 24fps)


Text-conditioned Image-to-Video Generation (1280x768, 5s, 24fps)

Quantitative Results

VBench (Huang et al., 2024)


VBench performance

User Preference


User study