Google DeepMind’s GenRM Revolutionizes AI Accuracy with Self-Validating Models

Large language models (LLMs) can create human-like text and tackle complex reasoning tasks. The technology has evolved rapidly in recent years, benefiting from advancements in machine learning (ML) algorithms, increased computational power, and the availability of vast datasets for training.

However, even with advanced capabilities, LLM models are prone to factual and logical errors, especially for complex reasoning tasks. This has limited the use of LLMs in applications where accuracy and reliability are paramount, such as healthcare and finance.

Several studies, including research published by Oxford University, have highlighted a critical vulnerability in LLMs - AI hallucinations. This issue causes LLMs to deviate from contextual logic and external facts, resulting in incorrect or irrelevant output.

Researchers have attempted various solutions to address accuracy challenges, including techniques like verifiers and discriminative reward models.

Verifiers work by assessing the correctness of the LLM outputs and filtering out errors to ensure factual consistency and logical coherence. Reward models assist in training LLMs by offering feedback on the quality of their outputs.

One of the key limitations of these traditional methods is that they are trained to distinguish between correct and incorrect responses based on predefined criteria, without creating new text or refining output. This means that these methods do not leverage the text generation capabilities that LLMs are fundamentally designed for.

Another widely-used approach is the LLM-as-a-Judge method, in which pre-trained language models assess the accuracy of solutions. While this method offers flexibility, it often falls short compared to more specialized verifiers, especially in reasoning tasks that demand detailed and nuanced judgment.

A research team from Google’s Deepmind in collaboration with the University of Toronto, Mila, and the University of California, Los Angeles, has introduced a new approach that enhances the accuracy and reliability of LLMs in reasoning tasks.

The new method called the Generative Reward Model (GenRM), trains verifiers using next-token prediction to harness the text-generation capabilities of LLMs. The researchers have outlined the new method in a paper available on arXiv.

GenRM enables the model to predict the next word or token in a sequence based on the provided context. By simultaneously generating and evaluating potential solutions, GenRM offers a unified training strategy that enhances both the model’s generative and verification abilities.

This approach also supports Chain-of-Thought (CoT) reasoning, where the model is prompted to generate a thought process before the answer. This makes the verification process more comprehensive and systematic.

The new model was tested in various settings, including algorithmic problem-solving tasks and preschool mathematics. The researchers claim that the new model improved the problem-solving success rate from 16% to 64% compared to discriminative reward models and the LLM-as-a-Judge method. The model also outperformed GPT-4 and Gemini 1.5 Pro.

The performance boost of the GenRM model demonstrates its effectiveness in addressing errors that standard verifiers may miss, especially in complex reasoning tasks. The researchers also observed that GenRM scales well with larger datasets and increased model capacity, broadening its applicability for various reasoning scenarios.

“GenRM is a more performant alternative to discriminative reward models, and unlocks the use of powerful tools, such as chain-of-thought reasoning and majority voting for better verification,” wrote the researchers in their paper. “ GenRM also unifies generation and verification into a single LLM, and demonstrates that such a unification benefits both generation and verification.”

Google DeepMind’s GenRM method advances GenAI by combining generation and verification, improving accuracy and reliability in reasoning tasks. This approach provides a strong foundation for future AI research and applications where precision is critical.

The researchers plan on extending the generative verification framework to a broader range of applications, including answering open-ended questions and coding. They also plan on studying how generative verifiers can be integrated into existing LLM self-improvement algorithms.

Breaking the Language Barrier: The Unprecedented Capabilities Large Language Models like ChatGPT Offer Businesses

Predicting the Financial Market with Large Language Models

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签