How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMs

AI chatbots create the illusion of having emotions, morals, or consciousness by generating natural conversations that seem human-like. Many users engage with AI for chat and companionship, reinforcing the false belief that it truly understands. This leads to serious risks. Users can over-rely on AI, provide sensitive data, or rely on it for advice beyond its capabilities. Others even let AI impact their choices in detrimental manners. Without proper knowledge of how AI fosters this belief, the issue gets worse.

Current methods for evaluating AI chat systems rely on single-turn prompts and fixed tests, failing to capture how AI interacts in real conversations. Some multi-turn tests focus only on harmful user behavior, ignoring normal interactions. Automated red-teaming adapts too much, making results hard to compare. Studies involving human users are difficult to repeat and scale. Measuring how people see AI as human-like is also a challenge. People instinctively assume AI has human traits, which affects how much they trust it. Evaluations show that AI’s human-like behavior makes users believe it is more accurate or even form emotional bonds. Hence, Existing methods fail to measure this issue properly.

To address these issues, a team of researchers from University Oxford, and Google Deepmind proposed an evaluation framework to assess human-like behaviors in AI chat systems. Unlike existing methods that rely on single-turn prompts and fixed tests, this framework tracks 14 specific anthropomorphic behaviors through multi-turn conversations. Automated simulations analyze AI interactions with users over multiple exchanges, improving scalability and comparability. The framework consists of three main components. First, it systematically monitors 14 anthropomorphic behaviors and classifies them into self-referential and relational traits, including personhood claims and expressions of emotion. Second, it scales up multi-turn assessment through interactive user simulation to ensure consistency and scalability. Third, it validates results through human subject evaluation to confirm the alignment between automated evaluations and user perceptions.

Researchers evaluated anthropomorphic behaviors in AI systems using a multi-turn framework in which a User LLM interacted with a Target LLM across eight scenarios in four domains: friendship, life coaching, career development, and general planning. Fourteen behaviors were analyzed and categorized as self-referential (personhood claims, physical embodiment claims, and internal state expressions) and relational (relationship-building behaviors). 960 contextualized prompts generated 4,800 five–turn dialogues per model, assessed by three Judge LLMs, resulting in 561,600 ratings. The analysis confirmed that the User LLM exhibited higher anthropomorphism scores than the Target LLMs. Interactions between 1,101 participants and Gemini 1.5 Pro were analyzed under high and low anthropomorphism conditions to evaluate alignment with human perceptions. High-frequency respondents also registered increased anthropomorphic perceptions based on survey responses as quantified using the AnthroScore measure. Statistical contrasts found large differences in anthropomorphic behavior by domain area, highlighting that AI systems exhibit human-like behavior when used in verbal interaction.

In summary, the framework employed a better multi-turn assessment technique than a single-turn approach to evaluating anthropomorphic behaviors in conversational AI. The results identified relationship-building behaviors that evolved with dialogue. As a baseline for subsequent research, this framework can inform AI development by learning to recognize when anthropomorphic characteristics occur and their effect on users. Future development can make assessment methods more precise, enhance the robustness of metrics, and formalize analysis, leading to more transparent and morally sound AI systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

The post How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMs appeared first on MarkTechPost.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签