Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics

cs.AI updates on arXiv.org 07月11日 12:04

Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics

本研究评估了9种RAG配置在105对天文问答数据上的表现，发现OpenAI模型配置最佳，准确率达91.4%。研究还开发了LLMaaJ系统作为人类评估的替代，并公开了相关数据与工具。

arXiv:2507.07155v1 Announce Type: cross Abstract: We evaluate 9 Retrieval Augmented Generation (RAG) agent configurations on 105 Cosmology Question-Answer (QA) pairs that we built specifically for this purpose.The RAG configurations are manually evaluated by a human expert, that is, a total of 945 generated answers were assessed. We find that currently the best RAG agent configuration is with OpenAI embedding and generative model, yielding 91.4\% accuracy. Using our human evaluation results we calibrate LLM-as-a-Judge (LLMaaJ) system which can be used as a robust proxy for human evaluation. These results allow us to systematically select the best RAG agent configuration for multi-agent system for autonomous scientific discovery in astrophysics (e.g., cmbagent presented in a companion paper) and provide us with an LLMaaJ system that can be scaled to thousands of cosmology QA pairs. We make our QA dataset, human evaluation results, RAG pipelines, and LLMaaJ system publicly available for further use by the astrophysics community.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RAG 天文问答 OpenAI LLMaaJ 高精度

相关文章

OpenAI加入C2PA指導委員會，測試Deepfake圖像偵測工具

How popular is ChatGPT? Part 1: more popular than Taylor Swift

OpenAI set to unveil AI-driven challenger to Google Search

OpenAI faces complaint over fictional outputs

OpenAI计划下周宣布ChatGPT和GPT-4更新，但不会推出GPT-5和搜索引擎

苹果据悉接近与OpenAI达成协议，将ChatGPT应用于iPhone

OpenAI据悉正开发AI语音助手

Databricks Announces Major Updates to Its AI Suite to Boost AI Model Accuracy

Comment on What should the UK’s £100 million Foundation Model Taskforce do? by Government-issued digital money gets closer - The World News Papers

Comment on What should the UK’s £100 million Foundation Model Taskforce do? by Il denaro digitale emesso dal governo si sta avvicinando - Darios Cafe Blogs