原创 SiliconCloud 2025-03-17 18:40 北京
“Wan 2.1 的开源将改变视频生成领域的游戏规则。”
开源视频大模型的水平也在逐渐接近最佳闭源视频模型。近期,阿里云万相团队正式开源了视觉基座模型万相 2.1(Wan 2.1)。万相 2.1 在视频生成方面的表现与 Sora、Luma、Pika 等领先的闭源模型相媲美,甚至在一些方面更胜一筹。
硅基流动大模型云服务平台 SiliconCloud 第一时间上线了推理加速版 Wan2.1-T2V-14B(标准版,首周优惠价为 1 元 / Video)与 Wan2.1-T2V-14B-Turbo(极速版,首周优惠价为 0.75 元 / Video),支持 720P 分辨率的文生视频。图生视频 Wan2.1-I2V-14B 后续即将上线。
值得一提的是,Wan2.1-T2V-14B-Turbo 使用了免训练缓存方法 TeaCache(https://github.com/ali-vilab/TeaCache),该方法可估计和利用跨时间步的模型输出间的波动差异,从而加速推理过程。
平台上线 Wan2.1 后免去开发者的部署门槛,只需在开发应用时轻松调用 API,即可带来更高效的用户体验。平台还支持开发者自由对比组合上百款大模型,为你的生成式AI应用选择最佳实践。
在线体验
https://cloud.siliconflow.cn/models
API 文档
https://docs.siliconflow.cn/cn/userguide/capabilities/video
直观感受一下硅基流动 SiliconCloud 上的加速版 Wan2.1-T2V-14B 效果。
Prompt: A woman with light skin, wearing a blue jacket and a black hat with a veil, looks down and to her right, then back up as she speaks; she has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her jacket; the camera remains stationary on her face as she speaks; the background is out of focus, but shows trees and people in period clothing; the scene is captured in real-life footage.
Prompt:Microscopic photography, coral tube worms and clownfish swim in the colorful underwater world. The coral tube worms are brightly colored, and their tentacles sway gently, as if dancing in the water; the clownfish's body shines with fluorescence, and it quickly shuttles between the corals. The picture is full of fantasy visual effects, real and natural, 4k high-definition picture quality, showing the wonder and beauty of the underwater world. Close-up, rich details of the underwater environment.
Prompt:In a dimly lit street at night, Pikachu stands motionless wearing a blindfold. The camera starts with a medium shot showing his entire body, then performs a zoom in to his face, with Pikachu remaining completely still throughout the camera zoom. Once the camera is tightly framed on Pikachu’s face, he raises his paws and removes the blindfold, revealing his beautiful, bright blue eyes.
Prompt: With a marble wall as the background, a can of spray paint appears, and the paint slowly sprays out on the wall. As the paint spreads, the word "Creativity" gradually appears in the picture. The edges of the letters are full of graffiti-style lines and colors, transitioning from dark to light, showing a unique street art charm. The background is a graffiti wall on a city street corner, with soft light. The overall picture is full of modern art, as if it is a street art work in progress. Close-up shots capture the details of the spray paint and the delicate texture of the letters, creating a free and creative atmosphere.
Prompt:A clear, turquoise river flows through a rocky canyon, cascading over a small waterfall and forming a pool of water at the bottom.The river is the main focus of the scene, with its clear water reflecting the surrounding trees and rocks. The canyon walls are steep and rocky, with some vegetation growing on them. The trees are mostly pine trees, with their green needles contrasting with the brown and gray rocks. The overall tone of the scene is one of peace and tranquility.
模型特点及性能
根据万相团队的数据显示,万相 2.1-14B 专业版万相模型在指令遵循、复杂运动生成、物理建模、文字视频生成等方面表现突出,在权威评测集 VBench 中, 它以 86.22% 的总分大幅领先 Sora、Luma、Pika 等国内外模型,稳居榜首位置。
在运动质量、视觉质量、风格和多目标等 14 个主要维度和 26 个子维度测试中,万相均达到了业界领先表现,并且实现了 5 项第一。特别是,万相大幅提升了在复杂运动和物理规律遵循上的表现,能够稳定展现各种复杂的人物肢体运动,如旋转、跳跃、转身、翻滚等,能够精准还原碰撞、反弹、切割等复杂真实物理场景。
这些出色表现离不开技术创新。基于主流 DiT 架构与线性噪声轨迹 Flow Matching 范式,万相大模型通过一系列技术创新实现了生成能力的重大进步,包括自研高效的因果3D VAE、可扩展的预训练策略、大规模数据链路构建以及自动化评估指标,这些创新共同提升了模型的最终性能表现。
AI 社区评价
Wan 2.1 发布后,一些开发者率先进行了体验。有人表示,Wan 2.1 这波生成效果属实“杀疯了”,让视频生成进入了下一阶段。RIP Sora。
有开发者表示,Wan 2.1 的开源将改变视频生成领域的游戏规则,推动更多创新作品的诞生。
也有用户表示,Wan 2.1 还不支持在 ComfyUI 运行,模型参数也很大,需要等量化版。
现在,免除部署难题,你可以直接在 SiliconCloud 上在线体验这款模型了。
Token 工厂 SiliconCloud
DeepSeek-R1 蒸馏版等免费用
作为一站式大模型云服务平台,SiliconCloud 致力于为开发者提供极速响应、价格亲民、品类齐全、稳定丝滑的大模型 API。
除了 Wan2.1-T2V-14B、Wan2.1-T2V-14B-Turbo ,SiliconCloud 已上架包括 QwQ-32B、DeepSeek-R1 & V3、DeepSeek-R1-Distill、Janus-Pro-7B、CosyVoice2、QVQ-72B-Preview、DeepSeek-VL2、HunyuanVideo、Qwen2.5-7B/14B/32B/72B、InternLM2.5-20B-Chat、BCE、BGE、SenseVoice-Small 在内的上百款语言模型、图片/视频模型、音频模型、代码/数学模型以及向量与重排序模型。
其中,DeepSeek-R1 蒸馏版(8B、7B、1.5B)、Qwen2.5(7B)等多款大模型 API 免费使用,让开发者与产品经理聚焦产品创新,无需担心研发阶段和大规模推广所带来的算力成本,实现“Token 自由”。
近期更新
• SiliconCloud上线加速版阿里QWQ-32B
• 硅基流动助力华为小艺接入DeepSeek-R1
• DeepSeek-R1 & V3支持Function Calling
• 硅基流动支持DeepSeek-R1 & V3私有化部署
• 硅基流动上线DeepSeek-R1 & V3企业级服务
• DeepSeek API支持批量推理,R1价格直降75%
让超级产品开发者实现“Token自由”
邀好友用SiliconCloud,狂送2000万Token/人
即刻体验DeepSeek-R1&V3
cloud.siliconflow.cn
扫码加入用户交流群