MarkTechPost@AI 01月20日
Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了纽约大学、麻省理工学院和谷歌的研究人员提出的一个扩散模型推理时扩展的新框架。该框架通过引入基于搜索的方法,超越了简单增加去噪步骤的传统方式,利用验证器反馈和算法来发现更优的噪声候选,从而提高生成性能。该框架在保持固定去噪步骤的同时,探索额外的计算资源用于搜索操作,并通过随机搜索算法和最佳N策略选择最优噪声候选。研究表明,该框架在不同基准测试中均能有效提升样本质量,并揭示了不同验证器之间的固有偏差,强调了开发特定任务验证方法的重要性。

🔍 引入搜索机制:该框架的核心创新在于引入搜索机制,通过验证器反馈和算法来发现更优的噪声候选,从而提高生成性能,这与传统增加去噪步骤的方法不同。

⚙️ 灵活的组件组合:该框架允许组件组合进行定制,以适应特定的应用场景,例如在ImageNet图像生成任务中,采用预训练的SiT-XL模型和二阶Heun采样器。

📊 多样化的验证器评估:研究使用了Inception Score (IS)和Fréchet Inception Distance (FID)两种验证器,并通过DrawBench和T2I-CompBench等基准测试,展示了不同验证器在评估样本质量时的差异和偏好。

Generative models have revolutionized fields like language, vision, and biology through their ability to learn and sample from complex data distributions. While these models benefit from scaling up during training through increased data, computational resources, and model sizes, their inference-time scaling capabilities face significant challenges. Specifically, diffusion models, which excel in generating continuous data like images, audio, and videos through a denoising process, encounter limitations in performance improvement when simply increasing the number of function evaluations (NFE) during inference. The traditional approach of adding more denoising steps prevents these models from achieving better results despite additional computational investment.

Various approaches have been explored to enhance the performance of generative models during inference. Test-time compute scaling has proven effective for LLMs through improved search algorithms, verification methods, and compute allocation strategies. Researchers have pursued multiple directions in diffusion models including fine-tuning approaches, reinforcement learning techniques, and implementing direct preference optimization. Moreover, sample selection and optimization methods have been developed using Random Search algorithms, VQA models, and human preference models. However, these methods either focus on training-time improvements or limited test-time optimizations, leaving room for more detailed inference-time scaling solutions.

Researchers from NYU, MIT, and Google have proposed a fundamental framework for scaling diffusion models during inference time. Their approach moves beyond simply increasing denoising steps and introduces a novel search-based methodology for improving generation performance through better noise identification. The framework operates along two key dimensions: utilizing verifiers for feedback and implementing algorithms to discover superior noise candidates. This approach addresses the limitations of conventional scaling methods by introducing a structured way to use additional computational resources during inference. The framework’s flexibility allows component combinations to be tailored to specific application scenarios.

The framework’s implementation centers on class-conditional ImageNet generation using a pre-trained SiT-XL model with 256 × 256 resolution and a second-order Heun sampler. The architecture maintains a fixed 250 denoising steps while exploring additional NFEs dedicated to search operations. The core search mechanism employs a Random Search algorithm, implementing a Best-of-N strategy to select optimal noise candidates. The system utilizes two Oracle Verifiers for verification: Inception Score (IS) and Fréchet Inception Distance (FID). IS selection is based on the highest classification probability from a pre-trained InceptionV3 model, while FID selection minimizes divergence against pre-calculated ImageNet Inception feature statistics.

The framework’s effectiveness has been shown through comprehensive testing on different benchmarks. On DrawBench, which features diverse text prompts, the LLM Grader evaluation shows that searching with various verifiers consistently improves sample quality, though with different patterns across setups. ImageReward and Verifier Ensemble perform well, showing improvements across all metrics due to their nuanced evaluation capabilities and alignment with human preferences. The results reveal different optimal configurations on T2I-CompBench, focusing on text-prompt accuracy rather than visual quality. ImageReward emerges as the top performer, while Aesthetic Scores show minimal or negative impact, and CLIP provides modest improvements.

In conclusion, researchers establish a significant advancement in the diffusion models by introducing a framework for inference-time scaling through strategic search mechanisms. The study shows that computational scaling via search methods can achieve substantial performance improvements across different model sizes and generation tasks, with varying computational budgets yielding distinct scaling behaviors. The research concludes that while the approach proves successful, it also reveals the inherent biases in different verifiers and emphasizes the importance of developing task-specific verification methods. This insight opens new avenues for future research in developing more targeted and efficient verification systems for various vision generation tasks.


Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

[Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

The post Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

扩散模型 推理时扩展 搜索机制 验证器 生成模型
相关文章