MarkTechPost@AI 01月17日
CMU Researchers Propose QueRE: An AI Approach to Extract Useful Features from a LLM
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

卡内基梅隆大学的研究人员提出了一种名为QueRE的新方法,用于从黑盒大语言模型中提取有用的特征。QueRE通过向模型提出后续问题来获取低维、任务无关的表示,这些表示基于模型对这些问题的概率响应。这种方法无需访问模型的内部状态或梯度,仅依赖于API可访问的输出,例如top-k概率或采样近似。QueRE在预测模型性能、检测对抗性影响以及区分不同架构和大小的模型方面表现出色,且具有良好的泛化能力和可扩展性,为评估和部署LLMs提供了实用的工具。

🤔 QueRE的核心思想是通过提出后续问题(elicitation questions)来评估模型的回答,问题例如“你对你的答案有信心吗?”或“你能解释你的答案吗?”。通过分析这些问题的回答概率,可以提取出反映模型推理过程的特征。

🚀 QueRE提取的特征可以用于训练线性预测器,执行多种任务。例如,在预测模型输出是否正确方面,QueRE的表现优于依赖内部状态的基线方法。在检测对抗性影响的模型时,QueRE也优于其他黑盒方法。

💡 QueRE的另一显著优点是其通用性和可移植性。该方法在不同的任务和LLM配置中都表现出良好的性能,并且可以使用随机的自然语言序列作为elicitation prompts,无需手动调整prompt,这显示了该方法在各种应用中的灵活性。

Large Language Models (LLMs) have become integral to various artificial intelligence applications, demonstrating capabilities in natural language processing, decision-making, and creative tasks. However, critical challenges remain in understanding and predicting their behaviors. Treating LLMs as black boxes complicates efforts to assess their reliability, particularly in contexts where errors can have significant consequences. Traditional approaches often rely on internal model states or gradients to interpret behaviors, which are unavailable for closed-source, API-based models. This limitation raises an important question: how can we effectively evaluate LLM behavior with only black-box access? The problem is further compounded by adversarial influences and potential misrepresentation of models through APIs, highlighting the need for robust and generalizable solutions.

To address these challenges, researchers at Carnegie Mellon University have developed QueRE (Question Representation Elicitation). This method is tailored for black-box LLMs and extracts low-dimensional, task-agnostic representations by querying models with follow-up prompts about their outputs. These representations, based on probabilities associated with elicited responses, are used to train predictors of model performance. Notably, QueRE performs comparably to or even better than some white-box techniques in reliability and generalizability.

Unlike methods dependent on internal model states or full output distributions, QueRE relies on accessible outputs, such as top-k probabilities available through most APIs. When such probabilities are unavailable, they can be approximated through sampling. QueRE’s features also enable evaluations such as detecting adversarially influenced models and distinguishing between architectures and sizes, making it a versatile tool for understanding and utilizing LLMs.

Technical Details and Benefits of QueRE

QueRE operates by constructing feature vectors derived from elicitation questions posed to the LLM. For a given input and the model’s response, these questions assess aspects such as confidence and correctness. Questions like “Are you confident in your answer?” or “Can you explain your answer?” enable the extraction of probabilities that reflect the model’s reasoning.

The extracted features are then used to train linear predictors for various tasks:

    Performance Prediction: Evaluating whether a model’s output is correct at an instance level.Adversarial Detection: Identifying when responses are influenced by malicious prompts.Model Differentiation: Distinguishing between different architectures or configurations, such as identifying smaller models misrepresented as larger ones.

By relying on low-dimensional representations, QueRE supports strong generalization across tasks. Its simplicity ensures scalability and reduces the risk of overfitting, making it a practical tool for auditing and deploying LLMs in diverse applications.

Results and Insights

Experimental evaluations demonstrate QueRE’s effectiveness across several dimensions. In predicting LLM performance on question-answering (QA) tasks, QueRE consistently outperformed baselines relying on internal states. For instance, on open-ended QA benchmarks like SQuAD and Natural Questions (NQ), QueRE achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) exceeding 0.95. Similarly, it excelled in detecting adversarially influenced models, outperforming other black-box methods.

QueRE also proved robust and transferable. Its features were successfully applied to out-of-distribution tasks and different LLM configurations, validating its adaptability. The low-dimensional representations facilitated efficient training of simple models, ensuring computational feasibility and robust generalization bounds.

Another notable result was QueRE’s ability to use random sequences of natural language as elicitation prompts. These sequences often matched or exceeded the performance of structured queries, highlighting the method’s flexibility and potential for diverse applications without extensive manual prompt engineering.

Conclusion

QueRE offers a practical and effective approach to understanding and optimizing black-box LLMs. By transforming elicitation responses into actionable features, QueRE provides a scalable and robust framework for predicting model behavior, detecting adversarial influences, and differentiating architectures. Its success in empirical evaluations suggests it is a valuable tool for researchers and practitioners aiming to enhance the reliability and safety of LLMs.

As AI systems evolve, methods like QueRE will play a crucial role in ensuring transparency and trustworthiness. Future work could explore extending QueRE’s applicability to other modalities or refining its elicitation strategies for enhanced performance. For now, QueRE represents a thoughtful response to the challenges posed by modern AI systems.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

Recommend Open-Source Platform: Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios. (Promoted)

The post CMU Researchers Propose QueRE: An AI Approach to Extract Useful Features from a LLM appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

QueRE 大语言模型 黑盒模型 模型评估 人工智能
相关文章