2025-06-22 15:58 浙江
模型上新1154个、数据集上新185个、应用上架63个、文章发布9篇
🙋魔搭ModelScope本期社区进展:
📟1154个模型:Kimi-Dev-72B、MiniMax-M1、Lingshu-7B等;
📁185个数据集:EQ-bench_ca、thaimos-tts-annotation、ReasonMed等;
🎨63个创新应用:MiniMax-M1、Nanonets-OCR-s、AscendMira:你的专属美妆魔镜等;
📄 9 篇内容:
- 利用OpenVINO™高效推理MiniCPM4系列模型Nanonets-OCR-s开源!复杂文档转Markdown SoTA,颠覆复杂文档工作流
- 2025魔搭开发者大会!来了!MiniMax-M1开源:支持百万级上下文窗口的混合MoE推理模型!
- ModelScope魔搭25年6月发布月报同“西游”,见“万相”冠军|皮影西游LoRA创作分享
01
模型推荐
MiniMax-M1
MiniMax-M1 是MiniMax近期开源发布的全球首个开源的大规模混合架构推理模型,支持百万级上下文输入和最长 8 万 Token 的推理输出,总参数量 4560 亿,单次激活 459 亿 Tokens。它在长上下文理解、软件工程和工具使用等复杂任务中表现优异,性价比极高,并通过创新的强化学习算法 CISPO 实现高效训练。
模型链接:
https://modelscope.cn/models/MiniMax/MiniMax-M1-80k
示例代码:
介绍使用ms-swift对MiniMax-M1-40k进行推理。在推理之前,请确保环境已准备妥当:
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .
使用transformers作为推理后端:
显存占用: 8 * 80GiB
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7'
from transformers import QuantoConfig
from swift.llm import PtEngine, RequestConfig, InferRequest
quantization_config = QuantoConfig(weights='int8')
messages = [{
'role': 'system',
'content': 'You are a helpful assistant.'
}, {
'role': 'user',
'content': 'who are you?'
}]
engine = PtEngine('MiniMax/MiniMax-M1-40k', quantization_config=quantization_config)
infer_request = InferRequest(messages=messages)
request_config = RequestConfig(max_tokens=128, temperature=0)
resp = engine.infer([infer_request], request_config=request_config)
response = resp[0].choices[0].message.content
print(f'response: {response}')
"""
<think>
Okay, the user asked "who are you?" I need to respond in a way that's helpful and clear. Let me start by introducing myself as an AI assistant. I should mention that I'm here to help with information, answer questions, and assist with tasks. Maybe keep it friendly and open-ended so they know they can ask for more details if needed. Let me make sure the response is concise but informative.
</think>
I'm an AI assistant designed to help with information, answer questions, and assist with various tasks. Feel free to ask me anything, and I'll do my best to help! 😊
"""
更多推理实战教程详见:
MiniMax-M1开源:支持百万级上下文窗口的混合MoE推理模型!
Kimi-Dev-72B
Kimi-Dev-72B 是Kimi最新开源的一款大型编程语言模型,专为软件工程任务设计,通过大规模强化学习优化,能够在真实代码库中自动修复漏洞并通过测试验证,其在 SWE-bench Verified 数据集上以 60.4% 的性能刷新了开源模型的最高纪录。
模型链接:https://modelscope.cn/models/moonshotai/Kimi-Dev-72B
示例代码:
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "moonshotai/Kimi-Dev-72B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Nanonets-OCR-s
Nanonets-OCR-s是一款强大的OCR模型,该模型基于Qwen2.5-VL-3B微调,9G显存可跑,能够通过智能内容识别和语义标记,将杂乱的文档转换为现代人工智能应用所需的干净、结构化且上下文丰富的 Markdown 格式。它的功能远超传统的文本提取,是目前图像转 Markdown 领域的SoTA模型。模型链接:
https://www.modelscope.cn/studios/nanonets/Nanonets-ocr-s
示例代码:使用transformers推理下载模型
modelscope download --model nanonets/Nanonets-OCR-s --local_dir nanonets/Nanonets-OCR-s
推理脚本
from PIL import Image
from transformers import AutoTokenizer, AutoProcessor, AutoModelForImageTextToText
model_path = "nanonets/Nanonets-OCR-s"
model = AutoModelForImageTextToText.from_pretrained(
model_path,
torch_dtype="auto",
device_map="auto",
attn_implementation="flash_attention_2"
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path)
def ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=4096):
prompt = """Extract the text from the above document as if you were reading it naturally. Return the tables in html format. Return the equations in LaTeX representation. If there is an image in the document and image caption is not present, add a small description of the image inside the <img></img> tag; otherwise, add the image caption inside <img></img>. Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY</watermark>. Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number> or <page_number>9/22</page_number>. Prefer using ☐ and ☑ for check boxes."""
image = Image.open(image_path)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "image", "image": f"file://{image_path}"},
{"type": "text", "text": prompt},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], padding=True, return_tensors="pt")
inputs = inputs.to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
return output_text[0]
image_path = "/path/to/your/document.jpg"
result = ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=15000)
print(result)