TechCrunch News 01月23日
Hugging Face claims its new AI models are the smallest of their kind
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Hugging Face团队发布可分析图像、短视频和文本的小型AI模型SmolVLM-256M和SmolVLM-500M,适用于内存受限设备,能执行多种任务,使用特定数据集训练,在某些基准测试中表现优于较大模型,可在网上及Hugging Face下载,但其可能存在一些缺陷。

🎯Hugging Face发布SmolVLM-256M和SmolVLM-500M,参数分别为2.56亿和5亿。

💻模型适用于内存约1GB以下的受限设备,如笔记本电脑。

📄可执行描述图像、回答PDF相关问题等任务,使用特定数据集训练。

📈在某些基准测试中优于Idefics 80B,但小型模型可能存在缺陷。

A team at AI dev platform Hugging Face has released what they’re claiming are the smallest AI models that can analyze images, short videos, and text.

The models, SmolVLM-256M and SmolVLM-500M, are designed to work well on “constrained devices” like laptops with under around 1GB of RAM. The team says that they’re also ideal for developers trying to process large amounts of data very cheaply.

SmolVLM-256M and SmolVLM-500M are just 256 million parameters and 500 million parameters in size, respectively. (Parameters roughly correspond to a model’s problem-solving abilities, such as its performance on math tests.) Both models can perform tasks like describing images or video clips and answering questions about PDFs and the elements within them, including scanned text and charts.

To train SmolVLM-256M and SmolVLM-500M, the Hugging Face team used The Cauldron, a collection of 50 “high-quality” image and text datasets, and Docmatix, a set of file scans paired with detailed captions. Both were created by Hugging Face’s M4 team, which develops multimodal AI technologies.

Benchmarks comparing the new SmolVLM models to other multimodal models. Image Credits:SmolVLM

The team claims that both SmolVLM-256M and SmolVLM-500M outperform a much larger model, Idefics 80B, on benchmarks including AI2D, which tests the ability of models to analyze grade-school-level science diagrams. SmolVLM-256M and SmolVLM-500M are available on the web as well as for download from Hugging Face under an Apache 2.0 license, meaning they can be used without restrictions.

Small models like SmolVLM-256M and SmolVLM-500M may be inexpensive and versatile, but they can also contain flaws that aren’t as pronounced in larger models. A recent study from Google DeepMind, Microsoft Research, and the Mila research institute in Quebec found that many small models perform worse than expected on complex reasoning tasks. The researchers speculated that this could be because smaller models recognize surface-level patterns in data, but struggle to apply that knowledge in new contexts.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Hugging Face SmolVLM 小型AI模型 图像分析 文本处理
相关文章