A Geodyssey – Enterprise Search Discovery, Text Mining, Machine Learning 03月13日 20:28
Large Language Models: Due to the risks, NASA decides against fine tuning a generative earth science LLM.
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NASA经过初步评估,认为开发专属的NASA科学任务理事会(SMD)解码器(生成式)模型的成本和风险超过了收益。因此,NASA选择使用编码器模型,并结合Meta的Llama或OpenAI的GPT等现成的LLM,采用检索增强生成(RAG)的架构。这意味着AI生成的答案来自大型语言模型外部的内容,从而最大限度地减少幻觉(虚假信息)。该研究探讨了LLM在NASA科学任务理事会(SMD)中的适当使用,考虑是开发定制模型还是微调现有的开源模型。

🚀大型语言模型(LLM)的快速采用正在加速科学研究的进展,同时也对责任性、透明性和可复制性等核心科学规范提出了挑战。LLM在科学交流和问题解决方面具有革命性的潜力,但也带来了关于作者身份和科学工作完整性的复杂性。

🤝NASA与IBM Research合作开发了INDUS模型,该模型专为文档检索和分类等特定科学任务而定制,旨在提升科研效率。

🛡️NASA正在探索一种检索增强生成(RAG)策略,该策略将INDUS等编码器模型与GPT等生成模型相结合,通过将响应建立在权威来源的基础上,从而最大限度地降低风险,无需开发专用的科学生成语言模型。

Large Language Models: Due to the risks, NASA decides against fine tuning a generative earth science LLM.

“Based on our initial assessment, the costs and risks associated with developing an exclusive NASA Science Mission Directorate (SMD) decoder (generative) model currently outweigh the benefits.”

In a paper published yesterday in the American Geophysical Union (AGU) – Perspectives of Earth and Space Scientists, instead they opt for an encoder model with an off the shelf LLM such as Meta’s Llama or OpenAI’s GPT using a Retrieval Augmented Generation (RAG) arrangement.

In other words, the AI generated answers come from content which is external to the Large Language Model to minimise hallucinations (false information).

Abstract
The rapid adoption of artificial intelligence (AI) in scientific research is accelerating progress but also challenging core scientific norms such as accountability, transparency, and replicability. Large language models (LLMs) like ChatGPT are revolutionizing scientific communication and problem-solving, but they introduce complexities regarding authorship and the integrity of scientific work. LLMs have the potential to transform various research practices, including literature surveys, meta-analyses, and data management tasks like entity resolution and query synthesis. Despite their advantages, LLMs present challenges such as content verification, transparency, and accurate attribution. This study explores the appropriate use of LLMs for NASA’s Science Mission Directorate (SMD), considering whether to develop a custom bespoke model or fine-tune an existing open-source model. This article reviews the outcomes and lessons learned from this effort, providing insights for other research groups navigating similar decisions.

Key Points
Generative AI and Language Models have accelerated scientific discovery but also introduced new challenges

NASA collaborated with IBM Research to develop INDUS models tailored to specific science tasks such as document retrieval and classification

NASA is exploring a Retrieval-Augmented Generation strategy that combines encoder models like INDUS with generative models like GPT to minimize risks by grounding responses in authoritative sources eliminating the need to develop a dedicated generative language model for science

Congrats to the authors on an excellent paper. This has significant implications for the geological sciences. This fits with paper I published in December on Ethical Recommendations for Large Language Models in the Geological Sciences.

https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024CN000258

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 NASA 检索增强生成 科学伦理 人工智能风险
相关文章