2025-03-30 23:44 浙江
模型上新619个、数据集上新93个、应用上新151个、文章发布7篇
🙋魔搭ModelScope本期社区进展:
📟619个模型:DeepSeek-V3-0324、通义千问2.5-Omni-7B全模态、通义千问2.5-VL-32B-Instruct、InfiniteYou等;
📁93个数据集:HQ-Edit、EconomicIndex、AM-DeepSeek-R1-Distilled-1.4M等;
🎨151个创新应用:美声唱法判别器、IndexTTS-Demo、超强文档转文本等;
📄 7篇内容:
WritingBench:阿里最新大模型写作能力多维测评工具,开源32B深度思考写作模型
看听说写四维突破:Qwen2.5-Omni 端到端多模态模型开源!
Qwen2.5-VL-32B: 更聪明、更轻量!
DeepSeek-V3小版本升级,非推理模型王者归来
今日论文推荐:MAPS、RoboFactory、OpenVLThinker等
4G显存部署Flux,2分钟Wan2.1-14B视频生成,DiffSynth-Engine推理引擎开源!
上周多模态论文推荐:MAPS、MapGlue、OmniGeo、OThink-MR1
模型链接:
https://modelscope.cn/models/deepseek-ai/DeepSeek-V3-0324示例代码:
使用SGLang进行推理(官方推荐)
# Installation
pip install "sglang[all]>=0.4.3" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
# Launch
python3 -m sglang.launch_server --model /Your_Model_Path/DeepSeek-V3-0324 --tp 8 --trust-remote-code
通义千问2.5-Omni-7B
Qwen2.5-Omni 是通义千问团队推出的Qwen新一代端到端多模态旗舰模型,支持文本、图像、音频和视频的跨模态理解,并以流式方式生成文本和自然语音响应,通过TMRoPE技术实现音视频精准同步,具备实时交互、自然流畅的语音生成、卓越的全模态性能以及强大的端到端语音指令跟随能力。在同等规模的单模态模型进行基准测试时,表现出卓越的性能。Qwen2.5-Omni在音频能力上优于类似大小的Qwen2-Audio,并与Qwen2.5-VL-7B保持同等水平。模型地址:
https://modelscope.cn/models/Qwen/Qwen2.5-Omni-7B示例代码:
使用transformers推理模型,环境安装:pip uninstall transformers
pip install git+https://github.com/huggingface/transformers@3a1ead0aabed473eafe527915eea8c197d424356
pip install accelerate
pip install qwen-omni-utils[decord]
推理代码(25G显存):
import soundfile as sf
from modelscope import Qwen2_5OmniModel, Qwen2_5OmniProcessor
from qwen_omni_utils import process_mm_info
# default: Load the model on the available device(s)
# model = Qwen2_5OmniModel.from_pretrained("Qwen/Qwen2.5-Omni-7B", torch_dtype="auto", device_map="auto")
# We recommend enabling flash_attention_2 for better acceleration and memory saving.
model = Qwen2_5OmniModel.from_pretrained(
"Qwen/Qwen2.5-Omni-7B",
torch_dtype="auto",
device_map="auto",
attn_implementation="flash_attention_2",
)
processor = Qwen2_5OmniProcessor.from_pretrained("Qwen/Qwen2.5-Omni-7B")
conversation = [
{
"role": "system",
"content": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.",
},
{
"role": "user",
"content": [
{"type": "video", "video": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-Omni/draw.mp4"},
],
},
]
# Preparation for inference
text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
audios, images, videos = process_mm_info(conversation, use_audio_in_video=True)
inputs = processor(text=text, audios=audios, images=images, videos=videos, return_tensors="pt", padding=True)
inputs = inputs.to(model.device).to(model.dtype)
# Inference: Generation of the output text and audio
text_ids, audio = model.generate(**inputs, use_audio_in_video=True)
text = processor.batch_decode(text_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(text)
sf.write(
"output.wav",
audio.reshape(-1).detach().cpu().numpy(),
samplerate=24000,
)
更多部署实战详见:
看听说写四维突破:Qwen2.5-Omni 端到端多模态模型开源!
通义千问2.5-VL-32B-Instruct
在 Qwen2.5-VL 系列的基础上,通义千问团队使用强化学习持续优化模型,并使用 Apache 2.0 协议开源备受喜爱的32B参数规模的新 VL 模型 Qwen2.5-VL-32B-Instruct,该版本模型表现回复更符合人类主观偏好、数学推理能力、图像细粒度理解与推理更强。同步官方还提供了AWQ量化版本。
模型地址:
Qwen2.5-VL-32B-Instructhttps://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct-AWQ
示例代码:
使用 Transformersfrom modelscope import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
# default: Load the model on the available device(s)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-32B-Instruct", torch_dtype="auto", device_map="auto"
)
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
# "Qwen/Qwen2.5-VL-32B-Instruct",
# torch_dtype=torch.bfloat16,
# attn_implementation="flash_attention_2",
# device_map="auto",
# )
# default processer
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
# The default range for the number of visual tokens per image in the model is 4-16384.
# You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost.
# min_pixels = 256*28*28
# max_pixels = 1280*28*28
# processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
}
]
# Preparation for inference
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
更多部署、微调实战教程详见:
Qwen2.5-VL-32B: 更聪明、更轻量!02
数据集推荐
HQ-Edit
HQ-Edit 是一个专为修图打造的高清图片库,包含各种美化前后的对比图,能帮AI学会修人像、换背景、做特效,让修图变得更智能简单!数据集链接:
https://modelscope.cn/datasets/AI-ModelScope/HQ-EditEconomicIndex
EconomicIndex 是一个基于Claude.ai真实对话数据的经济研究数据集,通过分析AI在不同职业任务中的使用情况(如增强vs.自动化),揭示AI对劳动力市场的影响,并开源供政策制定和研究参考。数据集链接:
https://modelscope.cn/datasets/Anthropic/EconomicIndexAM-DeepSeek-R1-Distilled-1.4M
AM-DeepSeek-R1-Distilled-1.4M 是一个由 AI-ModelScope 团队提供的数据集,包含约140万条经过蒸馏处理的数据,可能用于相关模型的训练和验证。数据集链接:
https://modelscope.cn/datasets/AI-ModelScope/AM-DeepSeek-R1-Distilled-1.4M美声唱法判别器
体验直达:
https://modelscope.cn/studios/ccmusic-database/bel_canto
小程序:
IndexTTS-Demo
体验直达:
https://modelscope.cn/studios/IndexTeam/IndexTTS-Demo小程序:
超强文档转文本
体验直达:
https://modelscope.cn/studios/ds4sd/SmolDocling-256M-Demo小程序:
WritingBench:阿里最新大模型写作能力多维测评工具,开源32B深度思考写作模型
看听说写四维突破:Qwen2.5-Omni 端到端多模态模型开源!
今日论文推荐:MAPS、RoboFactory、OpenVLThinker等