Llama 3.1 Released: Meta’s New Open-Source AI Model that You can Fine-Tune, Distill, and Deploy Anywhere and available in 8B, 70B, and 405B

MarkTechPost@AI 2024年07月24日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

Meta 发布了 Llama 3.1，这是 Llama 系列中最强大的模型。这个最新的迭代，特别是 405B 模型，代表了开源 AI 能力的重大进步，将 Meta 置于 AI 创新的最前沿。Llama 3.1 旨在使 AI 民主化，使尖端技术可供各种用户和应用程序使用。

🚀 **Llama 3.1 的关键特性:** Llama 3.1 405B 模型以其出色的灵活性、控制力和性能而著称，甚至可以与最先进的闭源模型相媲美。它旨在支持各种应用程序，包括合成数据生成和模型蒸馏，从而使社区能够探索新的工作流程和创新。支持八种语言和扩展的 128K 上下文长度，Llama 3.1 非常灵活和健壮，可以满足各种用例，例如长篇文本摘要和多语言对话代理。

🤝 **合作生态系统:** Meta 发布 Llama 3.1 得益于全面的合作伙伴生态系统，包括 AWS、NVIDIA、Databricks、Dell 和 Google Cloud，这些合作伙伴从第一天起就提供服务来支持该模型。这种协作方法确保用户和开发人员拥有利用 Llama 3.1 充分潜力的工具和平台，为 AI 创新营造繁荣的环境。

🛡️ **安全与责任:** Llama 3.1 引入了新的安全工具，例如 Llama Guard 3 和 Prompt Guard。这些功能旨在帮助开发人员负责任地构建，确保 AI 应用程序安全可靠。Meta 对负责任的 AI 开发的承诺还体现在他们对 Llama Stack API 的评论请求中，该 API 旨在标准化并促进第三方与 Llama 模型的集成。

🧠 **强大的性能:** Llama 3.1 的开发涉及跨越 150 多个基准数据集的严格评估，涵盖多种语言和现实世界场景。405B 模型展示了与 GPT-4 和 Claude 3.5 Sonnet 等领先 AI 模型相媲美的性能，展示了其通用知识、可控性、数学、工具使用和多语言翻译能力。

🏗️ **高效训练:** 训练 Llama 3.1 405B 模型是一项巨大的工程，涉及超过 16000 个 H100 GPU 和处理超过 15 万亿个标记。为了确保效率和可扩展性，Meta 对训练堆栈进行了元优化，采用标准的解码器专用 Transformer 模型架构，并进行迭代的训练后处理。这些流程提高了合成数据生成的质量和模型性能，为开源 AI 设置了新的基准。

🎯 **对齐和改进:** 为了提高模型的有用性和指令遵循能力，Meta 采用了一项多轮对齐过程，包括监督微调 (SFT)、拒绝采样 (RS) 和直接偏好优化 (DPO)。这些技术与高质量的合成数据生成和过滤相结合，使 Meta 能够生成一个在短上下文基准和扩展的 128K 上下文场景中都表现出色的模型。

🔮 **未来愿景:** Meta 将 Llama 3.1 视为更广泛的 AI 系统的一部分，该系统包括面向开发人员的各种组件和工具。这种生态系统方法允许创建自定义代理和新的代理行为，并得到包含示例应用程序和新安全模型的完整参考系统的支持。Llama Stack 的持续开发旨在标准化用于构建 AI 工具链组件的接口，促进互操作性和易用性。

Meta announced the release of Llama 3.1, the most capable model in the LLama Series. This latest iteration of the Llama series, particularly the 405B model, represents a substantial advancement in open-source AI capabilities, positioning Meta at the forefront of AI innovation.

Meta has long advocated for open-source AI, a stance underscored by Mark Zuckerberg’s assertion that open-source benefits developers, Meta, and society. Llama 3.1 embodies this philosophy by offering state-of-the-art capabilities in an openly accessible model. The release aims to democratize AI, making cutting-edge technology available to various users and applications.

The Llama 3.1 405B model stands out for its exceptional flexibility, control, and performance, rivaling even the most advanced closed-source models. It is designed to support various applications, including synthetic data generation and model distillation, thus enabling the community to explore new workflows and innovations. With support for eight languages and an expanded context length of 128K, Llama 3.1 is versatile and robust, catering to diverse use cases such as long-form text summarization and multilingual conversational agents.

Meta’s release of Llama 3.1 is bolstered by a comprehensive ecosystem of partners, including AWS, NVIDIA, Databricks, Dell, and Google Cloud, all offering services to support the model from day one. This collaborative approach ensures that users and developers have the tools and platforms to leverage Llama 3.1’s full potential, fostering a thriving environment for AI innovation.

Llama 3.1 introduces new security and safety tools, such as Llama Guard 3 and Prompt Guard. These features are designed to help developers build responsibly, ensuring that AI applications are safe and secure. Meta’s commitment to responsible AI development is further reflected in their request for comment on the Llama Stack API, which aims to standardize and facilitate third-party integration with Llama models.

The development of Llama 3.1 involved rigorous evaluation across over 150 benchmark datasets, spanning multiple languages and real-world scenarios. The 405B model demonstrated competitive performance with leading AI models like GPT-4 and Claude 3.5 Sonnet, showcasing its general knowledge, steerability, math, tool use, and multilingual translation capabilities.

Training the Llama 3.1 405B model was monumental, involving over 16 thousand H100 GPUs and processing over 15 trillion tokens. To ensure efficiency and scalability, we meta-optimized the training stack, adopting a standard decoder-only transformer model architecture with iterative post-training procedures. These processes enhanced the quality of synthetic data generation and model performance, setting new benchmarks for open-source AI.

To improve the model’s helpfulness and instruction-following capabilities, Meta employed a multi-round alignment process involving Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO). Combined with high-quality synthetic data generation and filtering, these techniques enabled Meta to produce a model that excels in both short-context benchmarks and extended 128K context scenarios.

Meta envisions Llama 3.1 as part of a broader AI system that includes various components and tools for developers. This ecosystem approach allows the creation of custom agents and new agentic behaviors, supported by a full reference system with sample applications and new safety models. The ongoing development of the Llama Stack aims to standardize interfaces for building AI toolchain components, promoting interoperability and ease of use.

In conclusion, Meta’s dedication to open-source AI is driven by a belief in its potential to spur innovation and distribute power more evenly across society. The open availability of Llama model weights allows developers to customize, train, and fine-tune models to suit their specific needs, fostering a diverse range of AI applications. Examples of community-driven innovations include AI study buddies, medical decision-making assistants, and healthcare communication tools, all developed using previous Llama models.

Check out the Details and Model. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post Llama 3.1 Released: Meta’s New Open-Source AI Model that You can Fine-Tune, Distill, and Deploy Anywhere and available in 8B, 70B, and 405B appeared first on MarkTechPost.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签