TechCrunch News 03月05日
Cohere claims its new Aya Vision AI model is best-in-class
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Cohere for AI发布了多模态AI模型Aya Vision,旨在缩小不同语言在AI模型性能上的差距,尤其是在涉及文本和图像的多模态任务中。Aya Vision能够执行图像描述、问题解答、文本翻译和摘要生成等任务,支持23种主要语言,并通过WhatsApp免费提供。该模型有两个版本:Aya Vision 32B和Aya Vision 8B,其中32B版本在某些视觉理解基准测试中表现优于Meta的Llama-3.2 90B Vision等更大模型。Cohere使用合成注释训练Aya Vision,从而减少资源消耗并实现有竞争力的性能。同时,Cohere还发布了新的基准测试套件AyaVisionBench,用于评估模型在跨语言和多模态理解方面的能力。

🌍Aya Vision模型由AI初创公司Cohere的非营利研究实验室发布,它是一个多模态“开放”AI模型,旨在执行诸如编写图像标题、回答照片相关问题、翻译文本以及生成23种主要语言的摘要等任务。

🖼️Aya Vision拥有Aya Vision 32B和Aya Vision 8B两个版本。其中,Aya Vision 32B性能更优,在某些视觉理解基准测试中,其性能超过了Meta的Llama-3.2 90B Vision等两倍于自身大小的模型;Aya Vision 8B在一些评估中,得分也高于体量是其10倍的模型。

🧪Cohere还发布了一个新的基准套件AyaVisionBench,旨在探测模型在“视觉语言”任务中的技能,如识别两张图像之间的差异以及将屏幕截图转换为代码。该数据集为评估多语言和真实世界环境中的视觉语言模型提供了一个强大的基准。

Cohere for AI, AI startup Cohere’s nonprofit research lab, this week released a multimodal “open” AI model, Aya Vision, the lab claimed is best-in-class.

Aya Vision can perform tasks like writing image captions, answering questions about photos, translating text, and generating summaries in 23 major languages. Cohere, which is also making Aya Vision available for free through WhatsApp, called it “a significant step towards making technical breakthroughs accessible to researchers worldwide.”

“While AI has made significant progress, there is still a big gap in how well models perform across different languages — one that becomes even more noticeable in multimodal tasks that involve both text and images,” Cohere wrote in a blog post. “Aya Vision aims to explicitly help close that gap.”

Aya Vision comes in a couple of flavors: Aya Vision 32B and Aya Vision 8B. The more sophisticated of the two, Aya Vision 32B, sets a “new frontier,” Cohere said, outperforming models 2x its size including Meta’s Llama-3.2 90B Vision on certain visual understanding benchmarks. Meanwhile, Aya Vision 8B scores better on some evaluations than models 10x its size, according to Cohere.

Both models are available from AI dev platform Hugging Face under a Creative Commons 4.0 license with Cohere’s acceptable use addendum. They can’t be used for commercial applications.

Cohere said that Aya Vision was trained using a “diverse pool” of English datasets, which the lab translated and used to create synthetic annotations. Annotations, also known as tags or labels, help models understand and interpret data during the training process. For example, annotation to train an image recognition model might take the form of markings around objects or captions referring to each person, place, or object depicted in an image.

Cohere’s Aya Vision model can perform a range of visual understanding tasks.Image Credits:Cohere

Cohere’s use of synthetic annotations — that is, annotations generated by AI — is on trend. Despite its potential downsides, rivals including OpenAI are increasingly leveraging synthetic data to train models as the well of real-world data dries up. Research firm Gartner estimates that 60% of the data used for AI and an­a­lyt­ics projects last year was syn­thet­i­cally created.

According to Cohere, training Aya Vision on synthetic annotations enabled the lab to use fewer resources while achieving competitive performance.

“This showcases our critical focus on efficiency and [doing] more using less compute,” Cohere wrote in its blog. “This also enables greater support for the research community, who often have more limited access to compute resources.”

Together with Aya Vision, Cohere also released a new benchmark suite, AyaVisionBench, designed to probe a model’s skills in “vision-language” tasks like identifying differences between two images and converting screenshots to code.

The AI industry is in the midst of what some have called an “evaluation crisis,” a consequence of the popularization of benchmarks that give aggregate scores that correlate poorly to proficiency on tasks most AI users care about. Cohere asserts that AyaVisionBench is a step toward rectifying this, providing a “broad and challenging” framework for assessing a model’s cross-lingual and multimodal understanding.

With any luck, that’s indeed the case.

“[T]he dataset serves as a robust benchmark for evaluating vision-language models in multilingual and real-world settings,” Cohere researchers wrote in a post on Hugging Face. “We make this evaluation set available to the research community to push forward multilingual multimodal evaluations.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Cohere Aya Vision 多模态AI AI模型
相关文章