Meta releases its first open AI model that can process images

The Verge - Artificial Intelligences 2024年09月26日

Meta releases its first open AI model that can process images

Meta发布了其首个开源多模态AI模型Llama 3.2，该模型能够处理图像和文本，为开发者提供更强大的AI应用开发工具。Llama 3.2的发布将推动AI应用的创新，例如增强现实应用、视觉搜索引擎和文档分析等。

😄 Llama 3.2能够处理图像和文本，开发者可以通过它创建更先进的AI应用，例如增强现实应用、视觉搜索引擎和文档分析等。

😊 Meta表示，开发者可以轻松地使用Llama 3.2，只需要添加新的多模态功能，就可以让Llama处理图像并进行交流。

🤔 Llama 3.2包含两个视觉模型（分别拥有110亿个参数和900亿个参数）以及两个轻量级的纯文本模型（分别拥有10亿个参数和30亿个参数）。较小的模型专为高通、联发科和其他Arm硬件设计，Meta希望它们能够在移动设备上使用。

😮 Llama 3.2的发布将与Meta在硬件（如Ray-Ban Meta眼镜）上构建AI功能的计划相一致。

😉 虽然Llama 3.2发布了，但之前的Llama 3.1仍然有其价值，它包含一个拥有4050亿个参数的版本，理论上在文本生成方面更强大。

Illustration by Alex Castro / The Verge

Just two months after releasing its last big AI model, Meta is back with a major update: its first open-source model capable of processing both images and text.

The new model, Llama 3.2, could allow developers to create more advanced AI applications, like augmented reality apps that provide real-time understanding of video, visual search engines that sort images based on content, or document analysis that summarizes long chunks of text for you.

Meta says it’s going to be easy for developers to get the new model up and running. Developers will have to do little except add this “new multimodality and be able to show Llama images and have it communicate,” Ahmad Al-Dahle, vice president of generative AI at Meta, told The Verge.

Other AI developers, including OpenAI and Google, already launched multimodal models last year, so Meta is playing catch-up here. The addition of vision support will also play a key role as Meta continues to build out AI capabilities on hardware like its Ray-Ban Meta glasses.

Llama 3.2 includes two vision models (with 11 billion parameters and 90 billion parameters) and two lightweight text-only models (with 1 billion parameters and 3 billion parameters). The smaller models are designed to work on Qualcomm, MediaTek, and other Arm hardware, with Meta clearly hoping to see them put to use on mobile.

There’s still a place for the (slightly) older Llama 3.1, though: that model, released in July, included a version with 405 billion parameters, which will theoretically be more capable when it comes to generating text.

Alex Heath contributed reporting

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Meta Llama 3.2 多模态AI 开源 AI应用

相关文章

Meet HPT 1.5 Air: A New Open-Sourced 8B Multimodal LLM with Llama 3

Meta is Bringing Video Call Support and Other Features to its RayBan Smart Glasses

Meta’s Advanced AI Assistant Makes its Way to Facebook, Instagram, WhatsApp, and Messenger

Gemma: Introducing new state-of-the-art open models

Open Source Generative AI at Hugging Face with Jeff Boudier - #624

Exploring the FastAI Tooling Ecosystem with Hamel Husain - #532

国金证券：AI工具或模型迭代有望带来投资机会

Meta或将关闭企业通讯应用Workplace

MiniMax上线C端产品海螺AI

欧盟根据《数字服务法》对Meta展开正式调查，涉平台潜在成瘾性与未成年人保护等