MarkTechPost@AI 2024年12月03日
Contextual SDG Research Identification: An AI Evaluation Agent Methodology
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

当前,大学在全球学术竞争中面临巨大压力,联合国可持续发展目标(SDG)成为衡量高校社会影响力的关键指标。传统基于关键词的SDG研究识别方法存在局限性,难以区分真正贡献与表面提及。弗吉尼亚理工大学图书馆的研究人员提出了一种创新的AI评估代理方法,利用小型LLM模型评估研究摘要,区分真正贡献与表面提及。该方法使用Scopus数据库构建数据集,并利用三个小型LLM模型(Phi-3.5-mini-instruct、Mistral-7B-Instruct-v0.3和Llama-3.2-3B-Instruct)进行评估。研究结果表明,不同LLM模型对相关性的判断存在差异,但都展现了在精准识别SDG研究方面的潜力,为解决传统方法的局限性提供了新思路。

🤔 **传统SDG研究识别方法局限性:**基于关键词的布尔搜索方法存在局限性,容易将表面提及SDG相关术语的文章误分类为SDG相关研究,难以区分真正贡献与表面提及。

💡 **AI评估代理方法:**弗吉尼亚理工大学图书馆的研究人员提出了一种利用AI评估代理的方法,通过结构化指南评估研究摘要,识别与SDG目标直接相关的具体行动或发现,从而区分真正贡献与表面提及。

📚 **数据集构建与模型选择:**研究人员利用Scopus数据库构建了包含2万篇期刊文章和会议论文摘要的数据集,并选择了三个小型LLM模型(Phi-3.5-mini-instruct、Mistral-7B-Instruct-v0.3和Llama-3.2-3B-Instruct)作为评估代理,这些模型具有较小的内存占用、本地托管能力和较大的上下文窗口。

📊 **不同LLM模型结果差异:**研究结果显示不同LLM模型对相关性的判断存在差异,例如Phi-3.5-mini表现较为平衡,Mistral-7B分类更宽泛,Llama-3.2则更为严格。这表明不同模型在识别SDG相关研究方面存在差异,需要进一步优化。

🚀 **未来发展方向:**该研究证明了小型LLM模型作为评估代理的潜力,能够提高SDG研究识别精度。但同时也存在一些局限性,例如提示设计对结果的影响、仅使用摘要而非全文等,需要进一步研究和完善。

Universities face intense global competition in the contemporary academic landscape, with institutional rankings increasingly tied to the United Nations’ Sustainable Development Goals (SDGs) as a critical social impact assessment benchmark. These rankings significantly influence crucial institutional parameters such as funding opportunities, international reputation, and student recruitment strategies. The current methodological approach to tracking SDG-related research output relies on traditional keyword-based Boolean search queries applied across academic databases. However, this approach presents substantial limitations, as it frequently allows superficially relevant papers to be categorized as SDG-aligned, despite the lack of meaningful substantive contributions to actual SDG targets.

Existing research has explored various approaches to address the limitations of traditional Boolean search methodologies for identifying SDG-related research. Query expansion techniques utilizing Large Language Models (LLMs) have emerged as a potential solution, attempting to generate semantically relevant terms and broaden search capabilities. Multi-label SDG classification studies have compared different LLMs to improve tagging accuracy and minimize false positives. Retrieval-augmented generation (RAG) frameworks using models like Llama2 and GPT-3.5 have been explored to identify textual passages aligned with specific SDG targets. Despite these advancements, existing methods struggle to distinguish meaningful research contributions from superficial mentions.

Researchers from the University Libraries at Virginia Tech Blacksburg have proposed an innovative approach to SDG research identification using an AI evaluation agent. This method utilizes LLM specifically designed to distinguish between abstracts that demonstrate genuine contributions to SDG targets and those with merely surface-level mentions of SDG-related terms. The proposed approach utilizes structured guidelines to evaluate research abstracts, focusing on identifying concrete, measurable actions or findings directly aligned with SDG objectives. Using data science and big data text analytics, the researchers aim to process scholarly bibliographic data with a nuanced understanding of language and context.

The research methodology involves a detailed data retrieval and preprocessing approach using Scopus as the primary source. The researchers collected a dataset of 20k journal articles and conference proceeding abstracts for each of the 17 SDGs, utilizing search queries developed by Elsevier’s SDG Research Mapping Initiative. The approach acknowledges the interconnected nature of SDGs, allowing documents with shared keywords to be labeled across multiple goal categories. The evaluation agent has been implemented using three compact LLMs: Microsoft’s Phi-3.5-mini-instruct, Mistral-7B-Instruct-v0.3, and Meta’s Llama-3.2-3B-Instruct. These models are selected for their small memory footprint, local hosting capabilities, and extensive context windows, enabling precise abstract classification through instruction-based prompts.

The research results reveal significant variations in relevance interpretation across different LLMs. For example, Phi-3.5-mini shows a balanced approach, labeling 52% of abstracts as ‘Relevant’ and 48% as ‘Non-Relevant’. In contrast, Mistral-7B shows a more expansive classification, assigning 70% of abstracts to the ‘Relevant’ category, while Llama-3.2 exhibits a highly selective approach, marking only 15% as ‘Relevant’. Moreover, Llama-3.2 demonstrates minimal intersection with other models, indicating stricter filtering criteria. The ‘Non-Relevant’ classifications show higher model alignment, with a substantial proportion of abstracts consistently categorized as non-relevant across all three LLMs.

In conclusion, researchers demonstrate the potential of small, locally hosted LLMs as evaluation agents for enhancing the precision of research contributions classification across Sustainable Development Goal (SDG) targets. By addressing the contextual and semantic limitations inherent in traditional keyword-based methodologies, these models showcase a complex ability to differentiate between genuine research contributions and superficial mentions within extensive bibliographic datasets. Despite the promising results, the researchers acknowledge several important limitations, including potential sensitivities in prompt design that could impact generalizability, using abstracts rather than full-text articles, and the current focus on SDG 1.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques’ Read the Full Report (Promoted)

The post Contextual SDG Research Identification: An AI Evaluation Agent Methodology appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

可持续发展目标 人工智能 大型语言模型 研究识别 AI评估代理
相关文章