MarkTechPost@AI 2024年10月23日
CMU Researchers Release Pangea-7B: A Fully Open Multimodal Large Language Models MLLMs for 39 Languages
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

CMU 研究者推出 PANGEA,旨在解决语言和文化差距问题。现有模型存在语言文化代表性不足的局限,PANGEA 基于新数据集 PANGEAINS 训练,包含 39 种语言的 600 万指令样本,并有评估套件 PANGEABENCH。PANGEA 在多语言场景中表现出色,且在多元文化理解方面有优势,其开源有望推动该领域发展,但仍有改进空间。

🎯PANGEA 是为弥补语言文化差距而设计的多语言多模态 LLM,它在视觉理解任务方面有重要作用,利用新数据集 PANGEAINS 进行训练,该数据集包含多种语言的丰富指令样本。

📚为评估 PANGEA 能力,研究者引入 PANGEABENCH 评估套件,涵盖 14 个数据集、47 种语言,结果显示 PANGEA 在多语言场景中表现优于许多现有模型,在多元文化理解上也有出色表现。

💪PANGEAINS 数据集的构建采用多种策略,解决了多语言多模态学习中的数据稀缺、文化差异、灾难性遗忘和评估复杂等主要挑战,确保模型能在不同语言文化背景下恰当理解和响应。

🌟PANGEA 在多语言 MLLM 领域有较强竞争力,其 70 亿参数模型在英语任务和多语言任务上有显著提升,在某些方面可与专有模型媲美,但在多模态聊天和复杂推理任务方面仍需改进。

Despite recent advances in multimodal large language models (MLLMs), the development of these models has largely centered around English and Western-centric datasets. This emphasis has resulted in a significant gap in linguistic and cultural representation, with many languages and cultural contexts around the world remaining underrepresented. Consequently, existing models often perform poorly in multilingual environments and fail to align with the socio-cultural norms of underrepresented languages. This presents a substantial limitation, particularly given the increasing adoption of these models globally, where equitable representation is crucial for effective real-world applications.

A team of researchers from Carnegie Mellon University introduced PANGEA, a multilingual multimodal LLM designed to bridge linguistic and cultural gaps in visual understanding tasks. PANGEA is trained on a newly curated dataset, PANGEAINS, which contains 6 million instruction samples across 39 languages. The dataset is specifically crafted to improve cross-cultural coverage by combining high-quality English instructions, machine-translated instructions, and culturally relevant multimodal tasks. In addition, to evaluate PANGEA’s capabilities, the researchers introduced PANGEABENCH, an evaluation suite spanning 14 datasets covering 47 languages. This comprehensive evaluation provides insight into the model’s performance on both multimodal and multilingual tasks, showing that PANGEA outperforms many existing models in multilingual scenarios.

PANGEA was developed using PANGEAINS, a rich and diverse dataset that includes instructions for general visual understanding, document and chart question answering image captioning, and more. The dataset was designed to address the major challenges of multilingual multimodal learning: data scarcity, cultural nuances, catastrophic forgetting, and evaluation complexity. To build PANGEAINS, the researchers employed several strategies: translating high-quality English instructions, generating culturally aware tasks, and incorporating existing open-source multimodal datasets. The researchers also developed a sophisticated pipeline to filter culturally diverse images and generate detailed multilingual and cross-cultural captions, ensuring that the model understands and responds appropriately in different linguistic and cultural contexts.

The results of PANGEA’s evaluation on PANGEABENCH demonstrate its strengths. PANGEA-7B, the 7-billion parameter model, showed significant improvements over existing open-source models, achieving an average improvement of 7.3 points on English tasks and 10.8 points on multilingual tasks. PANGEA also excels in multicultural understanding, as evidenced by its performance on the CVQA and xChat benchmarks. Interestingly, the model’s performance in multilingual settings did not drop as significantly as that of other models, demonstrating its balanced cross-language capabilities. Moreover, PANGEA matches or even outperforms proprietary models like Gemini-1.5-Pro and GPT4o in several areas, indicating that it is a strong competitor in the multilingual MLLM space.

PANGEA represents a significant step forward in creating inclusive and robust multilingual multimodal LLMs. The researchers successfully addressed data scarcity and cultural representation challenges by leveraging machine translation and culturally aware data generation strategies, creating a comprehensive dataset that spans 39 languages. The open-sourcing of PANGEAINS, PANGEABENCH, and PANGEA models is expected to facilitate further development and innovation in this field, promoting equity and accessibility across linguistic and cultural boundaries. Despite its promising performance, there are still areas for improvement, such as enhancing performance in multimodal chat and complex reasoning tasks, which the researchers hope to address in future iterations.


Check out the Paper, Project Page, and Model Card on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post CMU Researchers Release Pangea-7B: A Fully Open Multimodal Large Language Models MLLMs for 39 Languages appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

PANGEA 多语言多模态 PANGEAINS PANGEABENCH
相关文章