
Presentation of uncertainty in geoscience Large Language Models (LLM) is likely to be an important part of ethical design. A factual dataset was open-sourced by Wei et al (2024) “SimpleQA” a few weeks ago containing over 4,300 generic factual questions and answers.
One use of these data was to test the stated confidence of an LLM to it’s actual accuracy. The positive correlation in the graph shows LLMs appear to have some ‘understanding’ of ‘confidence’. In other words when LLMs state they are more confident, they are more accurate. That said, it’s still below the xy line which presents areas for further research.
This could be one way (in addition to many others) to convey elements of uncertainty in responses in the user interface. Rather than designing digital assistants using LLMs that just provide a response, we may include a level of ‘confidence’ in that answer against typical baseline accuracy heuristics.
For example, take a person who you think from past experience is roughly accurate 80% of the time, and they tell you they are 90% confident in an answer. This is useful information giving us a 72% confidence rating, with all sorts of nuances and caveats on this of course. We could apply similar concepts for responses from LLMs as best practice design perhaps.
A key concept is ensuring the user sees some element of uncertainty associated with the answer, highlighting that the convincing “anthropomorphic” AI generated answer may need verification.
There are other ways to convey uncertainty in domain digital assistants, this may be a useful aid. The impact of Retrieval Augmented Generation (RAG) and exploratory search goals are areas for further research. Exploratory search goals do not have a right answer e.g. “What do we know about geological natural hydrogen in the US?”. Conveying uncertainty may also include the presentation of agreement or contradiction in the literature. Many research areas to investigate.
#geology #geologivcal #earthsciences