2025-06-22 15:58 浙江

模型上新1154个、数据集上新185个、应用上架63个、文章发布9篇

🙋魔搭ModelScope本期社区进展：

📟1154个模型：Kimi-Dev-72B、MiniMax-M1、Lingshu-7B等；

📁185个数据集：EQ-bench_ca、thaimos-tts-annotation、ReasonMed等；

🎨63个创新应用：MiniMax-M1、Nanonets-OCR-s、AscendMira：你的专属美妆魔镜等；

📄 9 篇内容：

利用OpenVINO™高效推理MiniCPM4系列模型

Nanonets-OCR-s开源！复杂文档转Markdown SoTA，颠覆复杂文档工作流

2025魔搭开发者大会！来了！

MiniMax-M1开源：支持百万级上下文窗口的混合MoE推理模型！

ModelScope魔搭25年6月发布月报

同“西游”，见“万相”冠军｜皮影西游LoRA创作分享

同“西游”，见“万相”亚军｜悟空传美学增强专用LoRA创作分享

同“西游”，见“万相”季军｜水墨烟雾西游LoRA创作分享

同“西游”，见“万相”季军｜赛博悟空西游LoRA创作分享

模型推荐

MiniMax-M1

MiniMax-M1 是MiniMax近期开源发布的全球首个开源的大规模混合架构推理模型，支持百万级上下文输入和最长 8 万 Token 的推理输出，总参数量 4560 亿，单次激活 459 亿 Tokens。它在长上下文理解、软件工程和工具使用等复杂任务中表现优异，性价比极高，并通过创新的强化学习算法 CISPO 实现高效训练。

模型链接：

https://modelscope.cn/models/MiniMax/MiniMax-M1-80k

示例代码：

介绍使用ms-swift对MiniMax-M1-40k进行推理。在推理之前，请确保环境已准备妥当：

git clone https://github.com/modelscope/ms-swift.git

cd ms-swift

pip install -e .

使用transformers作为推理后端：

显存占用: 8 * 80GiB

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7'
from transformers import QuantoConfig
from swift.llm import PtEngine, RequestConfig, InferRequest
quantization_config = QuantoConfig(weights='int8')
messages = [{
    'role': 'system',
    'content': 'You are a helpful assistant.'
}, {
    'role': 'user',
    'content': 'who are you?'
}]
engine = PtEngine('MiniMax/MiniMax-M1-40k', quantization_config=quantization_config)
infer_request = InferRequest(messages=messages)
request_config = RequestConfig(max_tokens=128, temperature=0)
resp = engine.infer([infer_request], request_config=request_config)
response = resp[0].choices[0].message.content
print(f'response: {response}')
"""
<think>
Okay, the user asked "who are you?" I need to respond in a way that's helpful and clear. Let me start by introducing myself as an AI assistant. I should mention that I'm here to help with information, answer questions, and assist with tasks. Maybe keep it friendly and open-ended so they know they can ask for more details if needed. Let me make sure the response is concise but informative.
</think>
I'm an AI assistant designed to help with information, answer questions, and assist with various tasks. Feel free to ask me anything, and I'll do my best to help! 😊
"""

更多推理实战教程详见：

MiniMax-M1开源：支持百万级上下文窗口的混合MoE推理模型！

Kimi-Dev-72B

Kimi-Dev-72B 是Kimi最新开源的一款大型编程语言模型，专为软件工程任务设计，通过大规模强化学习优化，能够在真实代码库中自动修复漏洞并通过测试验证，其在 SWE-bench Verified 数据集上以 60.4% 的性能刷新了开源模型的最高纪录。

模型链接：

https://modelscope.cn/models/moonshotai/Kimi-Dev-72B

示例代码：

from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "moonshotai/Kimi-Dev-72B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Nanonets-OCR-s

Nanonets-OCR-s是一款强大的OCR模型，该模型基于Qwen2.5-VL-3B微调，9G显存可跑，能够通过智能内容识别和语义标记，将杂乱的文档转换为现代人工智能应用所需的干净、结构化且上下文丰富的 Markdown 格式。它的功能远超传统的文本提取，是目前图像转 Markdown 领域的SoTA模型。

模型链接：

https://www.modelscope.cn/studios/nanonets/Nanonets-ocr-s

示例代码：

使用transformers推理

下载模型

modelscope download --model nanonets/Nanonets-OCR-s --local_dir nanonets/Nanonets-OCR-s

推理脚本

from PIL import Image
from transformers import AutoTokenizer, AutoProcessor, AutoModelForImageTextToText
model_path = "nanonets/Nanonets-OCR-s"
model = AutoModelForImageTextToText.from_pretrained(
    model_path, 
    torch_dtype="auto", 
    device_map="auto", 
    attn_implementation="flash_attention_2"
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path)
def ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=4096):
    prompt = """Extract the text from the above document as if you were reading it naturally. Return the tables in html format. Return the equations in LaTeX representation. If there is an image in the document and image caption is not present, add a small description of the image inside the <img></img> tag; otherwise, add the image caption inside <img></img>. Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY</watermark>. Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number> or <page_number>9/22</page_number>. Prefer using ☐ and ☑ for check boxes."""
    image = Image.open(image_path)
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": [
            {"type": "image", "image": f"file://{image_path}"},
            {"type": "text", "text": prompt},
        ]},
    ]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=[text], images=[image], padding=True, return_tensors="pt")
    inputs = inputs.to(model.device)
    output_ids = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
    output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
    return output_text[0]
image_path = "/path/to/your/document.jpg"
result = ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=15000)
print(result)

from modelscope import Qwen2_5_VLForConditionalGeneration, AutoProcessor

from qwen_vl_utils import process_vision_info

# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(

"lingshu-medical-mllm/Lingshu-7B",

torch_dtype=torch.bfloat16,

attn_implementation="flash_attention_2",

device_map="auto",

)

processor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-7B")

messages = [

{

"role": "user",

"content": [

{

"type": "image",

"image": "example.png",

{"type": "text", "text": "Describe this image."},

}

]

# Preparation for inference

text = processor.apply_chat_template(

messages, tokenize=False, add_generation_prompt=True

)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(

text=[text],

images=image_inputs,

videos=video_inputs,

padding=True,

return_tensors="pt",

)

inputs = inputs.to(model.device)

# Inference: Generation of the output

generated_ids = model.generate(**inputs, max_new_tokens=128)

generated_ids_trimmed = [

out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)

]

output_text = processor.batch_decode(

generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False

)

print(output_text)

from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info
import PIL
from modelscope import AutoProcessor
processor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-7B")
llm = LLM(model="lingshu-medical-mllm/Lingshu-7B", limit_mm_per_prompt = {"image": 4}, tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True,)
sampling_params = SamplingParams(
            temperature=0.7,
            top_p=1,
            repetition_penalty=1,
            max_tokens=1024,
            stop_token_ids=[],
        )
text = "What does the image show?"
image_path = "example.png"
image = PIL.Image.open(image_path)
message = [
    {
        "role":"user",
        "content":[
            {"type":"image","image":image},
            {"type":"text","text":text}
            ]
            }
]
prompt = processor.apply_chat_template(
    message,
    tokenize=False,
    add_generation_prompt=True,
)
image_inputs, video_inputs = process_vision_info(message)
mm_data = {}
mm_data["image"] = image_inputs
processed_input = {
  "prompt": prompt,
  "multi_modal_data": mm_data,
}
outputs = llm.generate([processed_input], sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

02
数据集推荐
EQ-bench_ca

EQ-bench_ca 是一个用于评估模型在因果关系理解任务上的性能的数据集，由 BSC-LT 团队创建，专注于测试模型对因果逻辑的推理能力。

数据集链接：

https://modelscope.cn/datasets/BSC-LT/EQ-bench_ca

thaimos-tts-annotation

thaimos-tts-annotation是一个用于泰语语音合成（TTS）的数据集，包含泰语语音的标注信息，旨在支持泰语语音合成模型的开发和优化。

数据集链接：

https://modelscope.cn/datasets/scb10x/thaimos-tts-annotation

ReasonMed

ReasonMed 是迄今为止最大的开源医学推理数据集，包含 370,000 条高质量的问题-答案示例，附有多步思维链（CoT）理由和简洁总结。这些是从由三个竞争性的大型语言模型（Qwen-2.5-72B、DeepSeek-R1-Distill-Llama-70B 和 HuatuoGPT-o1-70B）生成的 175万 初始推理路径中提炼出来的，使用了严格的多代理验证和精炼流程。

数据集链接：

https://modelscope.cn/datasets/AI-ModelScope/ReasonMed

创空间

MiniMax-M1

MiniMax-M1 是一个支持百万级上下文窗口的混合 MoE 推理模型的在线体验平台，用户可以在此测试其长文本处理和复杂任务推理能力。

体验链接：

https://modelscope.cn/studios/MiniMax/MiniMax-M1

Nanonets-OCR-s

Nanonets-OCR-s 是一个在线演示平台，提供光学字符识别（OCR）功能的体验，用户可以上传图片进行文字识别和提取。

体验链接：
https://modelscope.cn/studios/nanonets/Nanonets-ocr-s

04
社区精选文章

利用OpenVINO™高效推理MiniCPM4系列模型

Nanonets-OCR-s开源！复杂文档转Markdown SoTA，颠覆复杂文档工作流

2025魔搭开发者大会！来了！
MiniMax-M1开源：支持百万级上下文窗口的混合MoE推理模型！

ModelScope魔搭25年6月发布月报
同“西游”，见“万相”冠军｜皮影西游LoRA创作分享
同“西游”，见“万相”亚军｜悟空传美学增强专用LoRA创作分享

同“西游”，见“万相”季军｜赛博悟空西游LoRA创作分享

同“西游”，见“万相”季军｜水墨烟雾西游LoRA创作分享

MiniMax-M1开源：支持百万级上下文窗口的混合MoE推理模型！

02
数据集推荐
EQ-bench_ca

数据集链接：

体验链接：
https://modelscope.cn/studios/nanonets/Nanonets-ocr-s

04
社区精选文章

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签

MiniMax-M1开源：支持百万级上下文窗口的混合MoE推理模型！

02数据集推荐EQ-bench_ca

数据集链接：

体验链接：https://modelscope.cn/studios/nanonets/Nanonets-ocr-s

04社区精选文章

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签

02
数据集推荐
EQ-bench_ca

体验链接：
https://modelscope.cn/studios/nanonets/Nanonets-ocr-s

04
社区精选文章