Nvidia’s Newest Foundation Model Can Actually Spell ‘Strawberry’

EnterpriseAI 2024年10月22日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

Nvidia发布了一款名为Llama-3.1-Nemotron-70B-Instruct的新AI模型，该模型能够轻松解决“草莓问题”，即正确识别单词“strawberry”中的字母“R”数量，而OpenAI的GPT-4o模型却无法做到。Nemotron-70B在多个基准测试中超越了GPT-4o和Anthropic的Claude 3.5 Sonnet，展现了其强大的推理能力。该模型基于Meta的开源Llama基础模型，并通过强化学习和Nvidia的HelpSteer2-preference技术进行了微调。Nvidia还推出了NIM（Nvidia推理微服务），为客户提供与AI交互的接口，并允许对多个LLM进行微调。与此同时，OpenAI也发布了名为o1（代号为“Strawberry”）的新模型，该模型声称拥有博士级别的STEM学科能力。

🍓 **Nemotron-70B模型成功解决了“草莓问题”**：该模型能够准确识别单词“strawberry”中的字母“R”数量，而OpenAI的GPT-4o模型却无法做到，展现了其强大的推理能力。 Nemotron-70B模型是基于Meta的开源Llama基础模型，并通过强化学习和Nvidia的HelpSteer2-preference技术进行了微调，使其能够更准确地遵循指令，并提升模型的安全性。

🤖 **Nemotron-70B在多个基准测试中超越了GPT-4o和Claude 3.5 Sonnet**：在Chatbot Arena Hard基准测试中，Nemotron-70B得分85.0；在AlpacaEval 2 LC基准测试中，得分57.6；在GPT-4-Turbo MT-Bench基准测试中，得分8.98。这些结果表明Nemotron-70B在多个领域的表现都优于其他领先的AI模型。

🚀 **Nvidia推出了NIM（Nvidia推理微服务）**：NIM是一个可下载的容器，为客户提供与AI交互的接口，并允许对多个LLM进行微调，为企业提供定制化的AI解决方案。NIM易于安装，并提供对底层模型数据的完全控制，保证了可预测的吞吐量和延迟性能。

🍓 **OpenAI也发布了名为o1（代号为“Strawberry”）的新模型**：该模型声称拥有博士级别的STEM学科能力，并且能够准确识别单词“strawberry”中的字母“R”数量。

💡 **模型的性能和可靠性需要针对具体应用进行测试**：尽管Nemotron-70B模型在多个基准测试中取得了优异的成绩，但大型语言模型的基准测试仍处于发展阶段，模型的实际应用效果还需要根据具体应用进行测试。

🏆 **Nvidia在AI硬件市场占据主导地位，并积极进军AI模型领域**：Nemotron模型在基准测试中的出色表现，表明Nvidia正在成为AI解决方案的“一站式商店”。

A new AI model from Nvidia knows just how many R’s are in the word strawberry, a feat that OpenAI’s GPT-4o model has yet to achieve. In what is known as the "strawberry problem," GPT-4o and a few other established models often give the false answer that ‘strawberry’ only has two R’s.

Launched on Hugging Face on Oct. 15, the new Nvidia model is called Llama-3.1-Nemotron-70B-Instruct and is based on Meta’s open source Llama foundation models, specifically Llama-3.1-70B-Instruct Base. The Llama series of AI models were designed as open source foundations for developers to build upon.

The Hugging Face model page asserts that Nemotron-70B surpasses GPT-4o and Anthropic’s Claude 3.5 Sonnet on a few different benchmarks. Nemotron-70B scores 85.0 on the Chatbot Arena Hard benchmark, 57.6 on AlpacaEval 2 LC, and 8.98 on the GPT-4-Turbo MT-Bench. The page also notes that Nemotron-70B was fine-tuned using reinforcement learning from human feedback, as well as a new alignment technique from Nvidia called HelpSteer2-preference which the company says trains the model to more closely follow instructions.

The benchmark results are promising in this case for the AI research concept of alignment, which describes the effectiveness of a model’s outputs corresponding with user requirements and expectations for reliability and safety. Alignment can be improved through greater customization, enabling enterprises to tailor AI models for specific use cases. The ultimate goal is to provide accurate, helpful responses and eliminate hallucinations.

The Nemotron-70B model solves the "strawberry problem" with ease, demonstrating its advanced reasoning capabilities.

However, it is important to note that benchmarking for large language models is still a developing area of research, and the usefulness of specific models should be tested for individual applications.

Nvidia is currently dominating the AI hardware market, and if its Nemotron models continue to score well in benchmarks, it could mean even more competition in the already booming LLM space. The Nemotron models also show how the company seems intent on becoming a one-stop shop for AI solutions.

One important aspect of Nvidia’s foray into AI models is NIM (Nvidia inference microservices), a downloadable container providing the interface for customers to interact with AI. NIM allows fine-tuning for multiple LLMs using guardrails and optimizations. Nvidia says NIM is easy to install, provides full control of underlying model data, and delivers predictable throughput and latency performance.

OpenAI also released a new model this month called o1, which interestingly enough, was codenamed Strawberry. The model, the first of a planned series of models with advanced reasoning capabilities, is offered in preview for paid ChatGPT users with two versions: o1-preview and o1-mini. OpenAI claims the new Strawberry model, trained with a bespoke dataset, has demonstrated a PhD-level capacity in many STEM subjects.

And don’t worry---it can also accurately tell you how many R’s are in the word strawberry.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签