A Geodyssey – Enterprise Search Discovery, Text Mining, Machine Learning 2024年11月28日
Presentation of uncertainty in geoscience Large Language Models (LLM)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

地学大语言模型(LLM)在回答问题时,如何表达其不确定性是其伦理设计的重要组成部分。近期发布的SimpleQA数据集包含了4300多个事实问题和答案,研究人员利用该数据集测试了LLM的置信度与其准确率之间的关系。结果显示,LLM的置信度与其准确率之间存在正相关,即LLM越自信,其准确率越高。但置信度与准确率的关联度仍有提升空间,这为后续研究提供了方向。文章建议在地学领域数字助理的设计中,除了提供答案外,还可以加入置信度指标,类似于我们对特定人士的信任程度,从而帮助用户更好地理解答案的不确定性。此外,文章还探讨了其他表达不确定性的方法,例如检索增强生成(RAG)和探索性搜索等,并指出在文献中呈现观点一致性或矛盾性也是一种表达不确定性的方式。

🤔 **SimpleQA数据集的应用:** 该数据集包含4300多个事实问题和答案,被用于测试地学大语言模型(LLM)的置信度与其准确率之间的关系,发现两者之间存在正相关,即LLM越自信,其准确率越高。

💡 **LLM置信度的表达:** 文章建议在地学领域数字助理的设计中,除了提供答案外,还可以加入置信度指标,类似于我们对特定人士的信任程度,例如,一个人过去经验表明其准确率约为80%,而他声称对某个答案90%有信心,那么我们对该答案的信心则为72%。

🔍 **其他表达不确定性的方法:** 文章指出,除了置信度外,还可以通过其他方法来表达不确定性,例如检索增强生成(RAG)和探索性搜索等。探索性搜索的目标通常没有唯一的正确答案,例如“我们对美国地质天然氢了解多少?”。

📚 **文献中观点的一致性和矛盾性:** 文章认为,在文献中呈现观点一致性或矛盾性也是一种表达不确定性的方式,例如,展示不同研究者对某个问题的不同观点,从而让用户意识到该问题的复杂性和不确定性。

Presentation of uncertainty in geoscience Large Language Models (LLM) is likely to be an important part of ethical design. A factual dataset was open-sourced by Wei et al (2024) “SimpleQA” a few weeks ago containing over 4,300 generic factual questions and answers.

One use of these data was to test the stated confidence of an LLM to it’s actual accuracy. The positive correlation in the graph shows LLMs appear to have some ‘understanding’ of ‘confidence’. In other words when LLMs state they are more confident, they are more accurate. That said, it’s still below the xy line which presents areas for further research.

This could be one way (in addition to many others) to convey elements of uncertainty in responses in the user interface. Rather than designing digital assistants using LLMs that just provide a response, we may include a level of ‘confidence’ in that answer against typical baseline accuracy heuristics.

For example, take a person who you think from past experience is roughly accurate 80% of the time, and they tell you they are 90% confident in an answer. This is useful information giving us a 72% confidence rating, with all sorts of nuances and caveats on this of course. We could apply similar concepts for responses from LLMs as best practice design perhaps.

A key concept is ensuring the user sees some element of uncertainty associated with the answer, highlighting that the convincing “anthropomorphic” AI generated answer may need verification.

There are other ways to convey uncertainty in domain digital assistants, this may be a useful aid. The impact of Retrieval Augmented Generation (RAG) and exploratory search goals are areas for further research. Exploratory search goals do not have a right answer e.g. “What do we know about geological natural hydrogen in the US?”. Conveying uncertainty may also include the presentation of agreement or contradiction in the literature. Many research areas to investigate.

#geology #geologivcal #earthsciences

https://openai.com/index/introducing-simpleqa/

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

地学大语言模型 不确定性 置信度 SimpleQA RAG
相关文章