MarkTechPost@AI 2024年08月27日
Pyramid Attention Broadcast: The Breakthrough Making Real-Time AI Videos Possible
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Pyramid Attention Broadcast(PAB)是一种新方法,可在不影响输出质量的情况下实现实时、高质量的视频生成,解决了现有模型计算资源和推理时间的问题。

🧠PAB针对扩散过程中注意力计算的冗余性,发现相邻扩散步骤间注意力差异呈U形模式,中间70%步骤具有显著稳定性,据此提高效率。

🎯PAB根据不同类型注意力的稳定性和差异,为其应用不同的广播范围。空间注意力因高频视觉元素变化最多,广播范围最小;时间注意力因与运动相关的中频变化,广播范围中等;交叉注意力最稳定,广播范围最大。

🚀PAB在三种基于DiT的视频生成模型中表现优异,实现了高达720p分辨率视频的实时生成,速度比基线方法提高了多达10.5倍,且保持了输出质量,还具有免训练的特点。

The field of video generation has seen remarkable progress with the advent of diffusion transformer (DiT) models, which have demonstrated superior quality compared to traditional convolutional neural network approaches. However, this improved quality comes at a significant cost in terms of computational resources and inference time, limiting the practical applications of these models. In response to this challenge, researchers have developed a novel method called Pyramid Attention Broadcast (PAB) to achieve real-time, high-quality video generation without compromising output quality.

Current acceleration methods for diffusion models often focus on reducing sampling steps or optimizing network architectures. These approaches, however, frequently require additional training or compromise output quality. Some recent techniques have revisited the concept of caching to speed up diffusion models. Still, these methods are primarily designed for image generation or convolutional architectures, making them less suitable for video DiTs. The unique challenges posed by video generation, including the need for temporal coherence and the interaction of multiple attention mechanisms, necessitate a new approach.

PAB addresses these challenges by targeting redundancy in attention computations during diffusion. The method is based on a key observation: attention differences between adjacent diffusion steps exhibit a U-shaped pattern, with significant stability in the middle 70% of steps. This indicates considerable redundancy in attention computations, which PAB exploits to improve efficiency. 

The Pyramid Attention Broadcast method identifies the stable middle segment of the diffusion process where attention outputs show minimal differences between steps. It then broadcasts attention outputs from certain steps to subsequent steps within this stable segment, eliminating the need for redundant computations. PAB applies varied broadcast ranges for different types of attention based on their stability and differences. Spatial attention, which varies the most due to high-frequency visual elements, receives the smallest broadcast range. Temporal attention, showing mid-frequency variations related to movements, gets a medium range. Cross-attention, being the most stable as it links text with video content, is given the largest broadcast range. Additionally, the researchers introduce a broadcast sequence parallel technique for more efficient distributed inference. This approach significantly decreases generation time and has lower communication costs compared to existing parallelization methods. By leveraging the unique characteristics of PAB, broadcast sequence parallelism enables more efficient, scalable distributed inference for real-time video generation.

PAB demonstrates superior results across three state-of-the-art DiT-based video generation models: Open-Sora, Open-Sora-Plan, and Latte. The method achieves real-time generation for videos up to 720p resolution, with speedups of up to 10.5x compared to baseline methods. Importantly, PAB maintains output quality while significantly reducing computational costs. The researchers’ experiments show that PAB consistently delivers excellent and stable speedup across these popular open-source video DiTs. The Pyramid Attention Broadcast method achieves remarkable speedups without sacrificing output quality by identifying and exploiting redundancies in the attention mechanism. The method’s ability to reach real-time generation speeds of up to 20.6 FPS for high-resolution videos opens up new possibilities for practical applications of AI video generation. What sets PAB apart is its training-free nature, making it immediately applicable to existing models without the need for resource-intensive fine-tuning.

The development of PAB addresses a critical bottleneck in DiT-based video generation, potentially accelerating the adoption of these models in real-world scenarios where speed is crucial. As the demand for high-quality, AI-generated video content continues to grow across industries, techniques like PAB will play a vital role in making these technologies more accessible and practical for everyday use. The researchers anticipate that their simple yet effective method will serve as a robust baseline and facilitate future research and application for video generation, paving the way for more efficient and versatile AI-driven video creation tools.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Find Upcoming AI Webinars here

The post Pyramid Attention Broadcast: The Breakthrough Making Real-Time AI Videos Possible appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Pyramid Attention Broadcast 视频生成 实时AI视频 注意力机制
相关文章