MarkTechPost@AI 02月22日
Meta AI Releases ‘NATURAL REASONING’: A Multi-Domain Dataset with 2.8 Million Questions To Enhance LLMs’ Reasoning Capabilities
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Meta AI推出NATURAL REASONING数据集,旨在提升LLM的推理能力。该数据集涵盖多领域,解决了现有推理数据集的局限性,其方法在增强推理能力方面显示出成效,且评估结果表明其性能优于其他数据集。

🎯NATURAL REASONING是从预训练语料库中提取的280万个推理问题的数据集,涵盖多领域。

💡该数据集通过反译代表真实世界推理问题,结合了可验证和开放式问题,有助于开发增强LLM推理能力的算法。

✅NATURAL REASONING方法通过知识蒸馏和监督微调实现更优性能,还可作为特定领域种子数据提取的来源。

Large language models (LLMs) have shown remarkable advancements in reasoning capabilities in solving complex tasks. While models like OpenAI’s o1 and DeepSeek’s R1 have significantly improved challenging reasoning benchmarks such as competition math, competitive coding, and GPQA, critical limitations remain in evaluating their true reasoning potential. The current reasoning datasets focus on problem-solving tasks but fail to encompass domains that require open-ended reasoning. Moreover, these datasets suffer from limited diversity in both scale and difficulty levels, making it challenging to evaluate and enhance the reasoning capabilities of LLMs across different domains and complexity levels.

Previous attempts to enhance LLM reasoning capabilities mostly focus on two approaches: synthetic data generation and unsupervised self-training. In synthetic data generation, STaR and MetaMath methods augment existing datasets with new chain-of-thought rationales and question variations. Still, they heavily depend on pre-existing high-quality datasets. While approaches like OpenMathInstruct-2, NuminaMath, and Xwin-Math generate new data from seed examples, they struggle with scaling to novel domains. In unsupervised self-training, most methods rely on human-annotated final answers or external reward models, making them resource-intensive and costly, particularly for complex multi-step problems that require human evaluation of LLM outputs.

Researchers from Meta, and New York University have proposed NATURALREASONING, a comprehensive dataset of 2.8 million reasoning questions extracted from pretraining corpora. This dataset spans diverse fields including Mathematics, Physics, Computer Science, and Economics & Business. Unlike synthetic datasets like MetaMathQA and OpenMathInstruct-2, NATURALREASONING represents authentic real-world reasoning problems through backtranslation from pretraining corpora. It uniquely combines verifiable and open-ended questions, including theorem proving, making it valuable for developing algorithms that enhance LLMs’ reasoning abilities beyond simple verification tasks and enabling knowledge distillation from stronger to weaker models.

The efficacy of the NATURALREASONING method is shown in two ways to enhance reasoning capabilities. First, it utilizes knowledge distillation and supervised finetuning to achieve steeper scaling trends than existing datasets. Second, it functions as a source for domain-specific seed data extraction. For targeting science reasoning benchmarks like GPQA, the method samples 250 benchmark questions and retrieves 1K similar decontaminated questions from NATURALREASONING using cosine similarity between question embeddings. These questions are then deduplicated and clustered into 15K groups. The evaluation protocol uses zero-shot testing across various benchmarks including MATH, GPQA, GPQA-Diamond, and MMLUPro, using greedy decoding for consistent performance measurement.

The evaluation results show that with just 1.5 million training examples, models trained on NATURALREASONING outperform Llama3.1-8B-Instruct but other datasets like OpenMathInstruct-2 and WebInstruct fail to achieve comparable performance even with 2.8 million data points. While math-specific datasets like OpenMathInstruct-2 show strong performance on math benchmarks (improving from 50.83 to 59.25 on MATH), they struggle to generalize, with GPQA accuracy plateauing around 26-27% and inconsistent MMLU-Pro performance. Moreover, datasets like WebInstruct show diminishing returns, with GPQA performance peaking at 29.02% with 500K samples but declining to 26.12% at 2.8M samples.

In conclusion, researchers introduced NATURALREASONING, a dataset that represents a significant advancement in developing comprehensive reasoning datasets for LLMs. The dataset’s collection of 2.8 million questions spans multiple domains including mathematics, physics, computer science, economics, and social sciences. The results show that using the NATURALREASONING method for knowledge distillation leads to consistent improvements in reasoning benchmark performance as data size increases. Its effectiveness extends to enabling unsupervised self-training of LLMs through external reward models and self-rewarding techniques, marking a step forward to enhance LLMs’ reasoning capabilities in diverse domains.


Check out the Paper and Dataset. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post Meta AI Releases ‘NATURAL REASONING’: A Multi-Domain Dataset with 2.8 Million Questions To Enhance LLMs’ Reasoning Capabilities appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NATURAL REASONING LLM推理能力 多领域数据集
相关文章