钛媒体:引领未来商业与生活新知 01月28日
DeepSeek Releases Open-Source Multimodal AI Model Janus-Pro, Surpassing DALL-E 3 and Stable Diffusion
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepSeek公司发布了最新的开源多模态AI模型Janus-Pro,该模型有两个版本,分别为10亿和70亿参数。Janus-Pro在GenEval和DPG-Bench等基准测试中表现卓越,超越了OpenAI的DALL-E 3和Stable Diffusion,在图像生成和理解方面均展现出优势。该模型采用了SigLIP-L架构处理图像,并借鉴LlamaGen进行图像生成。Janus-Pro的发布正值OpenAI的GPT-4o尚未公开之际,进一步引发了人们对其开源的关注。DeepSeek在多模态生成AI研究方面处于领先地位,其早期模型Janus为理解和生成多模态内容奠定了基础。Janus-Pro通过解耦视觉编码,优化了理解和生成任务,并提升了整体性能。

🚀Janus-Pro模型有两个版本,参数分别为10亿和70亿,满足不同计算需求,且在图像生成和理解方面均超越了OpenAI的DALL-E 3和Stable Diffusion。

🖼️ Janus-Pro采用SigLIP-L架构进行图像处理,并借鉴LlamaGen进行图像生成,通过解耦视觉编码,优化了理解和生成任务,提升了模型的灵活性和性能。

⚙️ Janus-Pro基于DeepSeek-LLM-1.5b-base和DeepSeek-LLM-7b-base构建,使用HAI-LLM框架在PyTorch上进行分布式训练,训练过程使用了16至32个节点,每个节点配备8个Nvidia A100 GPU。

🌐 DeepSeek的快速发展可能会加剧与OpenAI、Meta和Nvidia等行业巨头的竞争,但该公司也面临网络安全挑战,目前已限制中国境外新用户注册。

TMTPOST -- In the early hours of Tuesday, the AI community was abuzz as Hugging Face announced the release of DeepSeek's latest open-source multimodal AI model, Janus-Pro. Available in two configurations with 1 billion and 7 billion parameters, the model marks a significant leap in AI capabilities.

The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

Janus-Pro integrates cutting-edge advancements in multimodal AI. The model's ability to process and understand images is powered by the innovative SigLIP-L architecture, while its image generation capabilities draw inspiration from LlamaGen. The model is offered in two sizes, with configurations at 1.5 billion and 7 billion parameters, catering to a range of computational needs.

This launch comes at a time when OpenAI's highly anticipated multimodal image-generation model, GPT-4o, remains unavailable to the public, adding to the excitement surrounding Janus-Pro's open-source debut.

DeepSeek has been at the forefront of multimodal generative AI research. The company launched its original Janus model in late 2024 as a unified framework for understanding and generating multimodal content. Built on DeepSeek-LLM-1.3b-base, Janus utilized a massive dataset of 500 billion text tokens for training. Its design decoupled visual encoding to optimize both understanding and generation tasks, employing advanced techniques like SigLIP-L for visual input and an innovative rectified flow for image generation.

This progress culminated in Janus-Pro, an enhanced self-regressive framework with significant architectural refinements. By decoupling visual encoding into independent pathways, Janus-Pro eliminates previous conflicts in understanding and generation tasks while maintaining a unified Transformer architecture. This modularity improves flexibility and task-specific performance.

Janus-Pro is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, trained using HAI-LLM, a high-performance distributed training framework on PyTorch. The training involved clusters of 16 to 32 nodes, each equipped with 8 Nvidia A100 GPUs, and required 7–14 days depending on the model size.

The complete Janus-Pro codebase is now available on GitHub: Janus GitHub Repository.

DeepSeek’s rapid advancements in multimodal AI may heighten competition with industry giants such as OpenAI, Meta, and Nvidia. However, the company has faced challenges, including recent large-scale cyberattacks on its online services. To mitigate these issues, DeepSeek has temporarily restricted new user registrations outside China, requiring international users to register using virtual numbers.

With Janus-Pro setting new standards for multimodal AI, the industry eagerly anticipates further developments, including potential advancements in text-to-image and text-to-video capabilities. 

更多精彩内容,关注钛媒体微信号(ID:taimeiti),或者下载钛媒体App

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Janus-Pro 多模态AI 开源模型 DeepSeek 图像生成
相关文章