EnterpriseAI 2024年09月05日
Google DeepMind’s GenRM Revolutionizes AI Accuracy with Self-Validating Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

大语言模型虽能生成类人文本和处理复杂推理任务,但存在事实和逻辑错误。Google DeepMind的GenRM方法通过结合生成和验证,提高了推理任务的准确性和可靠性。

🧐大语言模型发展迅速,但易出现事实和逻辑错误,限制了其在一些对准确性要求高的领域的应用,如医疗和金融。

📚研究人员尝试多种解决准确性挑战的方法,如验证器和判别奖励模型等,但这些传统方法存在一定局限性。

🌟Google DeepMind与多所大学合作推出GenRM方法,利用下一个标记预测训练验证器,提升了模型的生成和验证能力,支持思维链推理,在多种测试中表现出色。

🎯GenRM方法在解决标准验证器可能遗漏的错误方面表现出有效性,且能很好地适应更大的数据集和增加的模型容量,研究人员计划将其扩展到更多应用领域。

Large language models (LLMs) can create human-like text and tackle complex reasoning tasks. The technology has evolved rapidly in recent years, benefiting from advancements in machine learning (ML) algorithms, increased computational power, and the availability of vast datasets for training.

However, even with advanced capabilities, LLM models are prone to factual and logical errors, especially for complex reasoning tasks. This has limited the use of LLMs in applications where accuracy and reliability are paramount, such as healthcare and finance. 

Several studies, including research published by Oxford University, have highlighted a critical vulnerability in LLMs - AI hallucinations. This issue causes LLMs to deviate from contextual logic and external facts, resulting in incorrect or irrelevant output.

Researchers have attempted various solutions to address accuracy challenges, including techniques like verifiers and discriminative reward models. 

Verifiers work by assessing the correctness of the LLM outputs and filtering out errors to ensure factual consistency and logical coherence. Reward models assist in training LLMs by offering feedback on the quality of their outputs.

One of the key limitations of these traditional methods is that they are trained to distinguish between correct and incorrect responses based on predefined criteria, without creating new text or refining output. This means that these methods do not leverage the text generation capabilities that LLMs are fundamentally designed for. 

Another widely-used approach is the LLM-as-a-Judge method, in which pre-trained language models assess the accuracy of solutions. While this method offers flexibility, it often falls short compared to more specialized verifiers, especially in reasoning tasks that demand detailed and nuanced judgment.

A research team from Google’s Deepmind in collaboration with the University of Toronto, Mila, and the University of California, Los Angeles, has introduced a new approach that enhances the accuracy and reliability of LLMs in reasoning tasks.

The new method called the Generative Reward Model (GenRM), trains verifiers using next-token prediction to harness the text-generation capabilities of LLMs. The researchers have outlined the new method in a paper available on arXiv.

GenRM enables the model to predict the next word or token in a sequence based on the provided context. By simultaneously generating and evaluating potential solutions, GenRM offers a unified training strategy that enhances both the model’s generative and verification abilities.

This approach also supports Chain-of-Thought (CoT) reasoning, where the model is prompted to generate a thought process before the answer. This makes the verification process more comprehensive and systematic.

The new model was tested in various settings, including algorithmic problem-solving tasks and preschool mathematics. The researchers claim that the new model improved the problem-solving success rate from 16% to 64% compared to discriminative reward models and the LLM-as-a-Judge method. The model also outperformed GPT-4 and Gemini 1.5 Pro.

The performance boost of the GenRM model demonstrates its effectiveness in addressing errors that standard verifiers may miss, especially in complex reasoning tasks. The researchers also observed that GenRM scales well with larger datasets and increased model capacity, broadening its applicability for various reasoning scenarios.

“GenRM is a more performant alternative to discriminative reward models, and unlocks the use of powerful tools, such as chain-of-thought reasoning and majority voting for better verification,” wrote the researchers in their paper. “ GenRM also unifies generation and verification into a single LLM, and demonstrates that such a unification benefits both generation and verification.”

Google DeepMind’s GenRM method advances GenAI by combining generation and verification, improving accuracy and reliability in reasoning tasks. This approach provides a strong foundation for future AI research and applications where precision is critical.

The researchers plan on extending the generative verification framework to a broader range of applications, including answering open-ended questions and coding. They also plan on studying how generative verifiers can be integrated into existing LLM self-improvement algorithms. 

Related Items 

OpenFold Advances Protein Modeling with AI and Supercomputing Power 

Breaking the Language Barrier: The Unprecedented Capabilities Large Language Models like ChatGPT Offer Businesses 

Predicting the Financial Market with Large Language Models 

 

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GenRM 大语言模型 推理准确性 AI研究
相关文章