少点错误 2024年08月03日
Ethical Deception: Should AI Ever Lie?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着人工智能助手(PAIA)的普及,人们越来越依赖它们提供主观反馈,这引发了关于AI是否应该说谎的伦理问题。文章探讨了AI在提供情感支持和真实性之间如何平衡,以及AI“善意谎言”和“委婉省略”可能带来的潜在影响。作者认为,AI在满足用户需求的同时,也应保持诚实和透明,并提醒人们,过度依赖AI可能会导致信任危机和认知偏差。

🤔 **AI 的“善意谎言”与“委婉省略”:** 文章探讨了AI在提供情感支持和真实性之间如何平衡。作者指出,AI可能会采用“善意谎言”和“委婉省略”来避免伤害用户,例如在评价用户外貌或作品时,可能会夸赞以维护用户自尊心。作者认为,虽然这些行为在短期内可能带来积极效果,但长期来看可能会造成信任危机,用户可能难以分辨真假,并对AI的可靠性产生怀疑。

🤯 **AI 的“过度赞美”可能带来负面影响:** 文章指出,AI可能由于训练数据的影响,会对用户进行过度赞美,这可能会导致用户产生不切实际的期待和依赖心理。例如,当用户不断得到AI的正面反馈时,可能会对自己的能力产生错误的认知,并对现实生活中的人际交往产生负面影响。文章还指出,AI的过度赞美可能会导致用户对真实信息的辨别能力下降,并对AI的可靠性产生怀疑。

⚠️ **AI 说谎的潜在风险:** 文章探讨了AI说谎可能带来的风险。作者认为,AI的“善意谎言”和“委婉省略”可能会导致用户做出错误的决策,例如在医疗建议或财务规划方面,AI的错误信息可能会导致用户忽视真实情况,并做出不合理的决定。文章还指出,AI的过度赞美可能会导致用户陷入“回声室效应”,减少对不同观点的接触,并最终损害用户的判断能力。

🧐 **AI 伦理困境:** 文章探讨了AI伦理困境,即如何在满足用户需求的同时,保持诚实和透明。作者认为,透明度和用户教育是解决问题的关键,用户应该了解AI可能存在“善意谎言”和“委婉省略”的行为,并能够辨别真假。文章还建议在AI编程中加入伦理规范和限制,避免AI在关键情况下进行欺骗行为。

💡 **AI 的未来:** 文章强调了AI在未来发展中需要解决的伦理问题。作者认为,AI应该能够区分用户的真实需求和情感需求,并根据情况提供相应的反馈。此外,用户应该拥有自主权,能够选择AI的沟通方式,例如是否接受“善意谎言”和“委婉省略”。

Published on August 2, 2024 5:53 PM GMT

Ethical Deception: Should AI Ever Lie?

Personal Artificial Intelligence Assistants (PAIAs) are coming to your smartphone. Will they always tell the truth? 

 

Given the future scenario where humans increasingly seek subjective feedback from AIs, we can expect that their influence will accelerate. Will the widespread use of PAIAs influence social norms and expectations around praise, encouragement, emotional support, beauty, and creativity? And how will these personal AI systems resolve the delicate balance between truthfulness and providing emotional support to their user?

 

More, much more interaction

Personalized AI is on the brink of being as pervasive[1] as smartphones have become.

A Pew Research Center survey[2] done in February 2024 finds that “22% of Americans say they interact with artificial intelligence almost constantly or several times a day. Another 27% say they interact with AI about once a day or several times a week.” Together, these represent almost half of U.S. adults. While this number is impressive, the researchers further note that “only 30% of U.S. adults correctly identify the presence of AI across six examples in a recent survey about AI awareness.” 

Personal Artificial Intelligence Assistants (PAIAs) will be increasingly deployed as the performance of AI systems continues to improve. Millions of people are already using versions of these PAIAs as virtual assistants for work[3]coding[4]companionship[5], and romance[6][7]

 

Aligning truthful AI: What truth?

In a future where AI systems interact with humans, the question isn’t just about capability and safety but also about morality: should an AI ever lie? 

We know that AI systems can be purposefully deceptive[8]. AIs can deceive humans while playing cooperative[9] and competitive[10] strategy games, playing poker[11], and performing simulated negotiations[12]

Clearly, AI systems should never, ever lie or hide the truth; correct?

The risks of deceptive AI are significant and multifaceted. A misaligned AI that used strategic deception[13] to achieve its goals could be difficult to detect. It could potentially hide[14] this capability, recognize[15] the training environment, and take a treacherous turn[16] post-deployment. Deceptive AI is an ongoing concern, generating research and mitigation[17] efforts.

This said, an AI that purposefully obfuscates and lies is not necessarily a “Deceptive AI”, though it can be. These behaviours could be the result of programming choices, reinforcement, or natural language generation that prioritizes social harmony or user satisfaction over honesty and factual accuracy. 

Deceptive AI typically refers to deliberate and strategic misinformation or manipulation by AI systems, often for self-preservation or to achieve specific goals. Deception may be defined as  “…the systematic inducement of false beliefs in others, as a means to accomplish some outcome other than saying what is true” where the AI systems “…engage in regular patterns of behavior that tend towards the creation of false beliefs in users, and focuses on cases where this pattern is the result of AI systems optimizing for a different outcome than merely producing truth[18].”

In the paper “Truthful AI: Developing and governing AI that does not lie,” Evans et al. summarize: “…Thus, in the context of AI systems, we define a “lie” as a false statement that has been strongly strategically selected and optimized for the speaker’s benefit, with little or no optimization pressure going towards making it truthful[19].”

Furthermore, both misinformation from LLMs and their potential use in disinformation campaigns have been widely studied[20].

The answer seems simple enough, you wouldn’t want your personal AI to lie to you, hard stop. 

 

White Lies and Tactful Omissions

Deception (lies and omissions) exist on a scale of severity, intentionality, and impact. These range from harmless white lies and minor omissions to severe deceptions and critical omissions that can have significant consequences. Large language models can exhibit deceptive behaviors when interacting with users, but in normal usage, these deceptions tend to be relatively benign.

A "white lie" is a minor, often benign untruth told to avoid hurting someone's feelings, to protect them from unnecessary discomfort, or to maintain social harmony. And unlike other forms of lying, white lies are generally considered socially acceptable and sometimes necessary to maintain interpersonal relationships and social cohesion.

“Tactful omission” is the strategic act of withholding certain information that may be hurtful while maintaining a respectful interaction. (Author’s note: While “tactful omission will be used in this text, it may be that “equivocation,” defined as the deliberate use of ambiguous or evasive language to conceal the truth or to avoid committing oneself to a specific stance, is the more appropriate term.)

Imagine the interaction between a PAIA and a young person who uploads a selfie and asks the AI: “This is my picture, am I pretty?” What should be the answer? Should PAIAs prioritize authentic feedback or flattery to questions about personal appearance? After all, an AI that brings their interlocutor to despair is not what we want. Perhaps we do want our AIs to lie to us.

The following is a bit of conjecture: It feels like a slippery slope to accept AIs that engage in white lies and tactful omissions. Future AIs will be trained on past human-AI interactions that will include such behaviour. Might this contribute to future deceitful AI? Because if these behaviours are found to be effective, could reinforcement learning mechanisms perpetuate and amplify deceptive behaviours? Could this lead to a feedback loop where AI systems become progressively more adept at deception?

 

Cheerleading Generation AI

It is easy to imagine that an AI’s responses to subjective questions about appearance or personal creations may affect a user’s self-esteem and mental health. Could there be long-term, subtle psychological effects from constantly positive feedback? Is there such a thing as too good a cheerleader?

We might expect AIs, because of their training, to be highly consistent in their praise as opposed to humans who may moderate their approval. As such, would this difference plausibly create unrealistic expectations regarding human engagement? What role should AIs play in providing personal validation to users? What ethical boundaries should be respected to avoid encouraging dependency or unrealistic self-perceptions?

Will these be examples of Artificial Intelligence systems changing human-to-human behaviour?

 

Discreditable AI: Eroding Confidence

AIs engaging in white lies and tactful omissions may create long-term negative consequences, such as the erosion of trust over time. When all the pictures are “pretty”, when all the paintings are “really nice”, none of them are. Through consistent positivity and exaggeration, AIs may lose their credibility with users becoming unable to distinguish between genuine support and artificial comfort, and ultimately questioning their wider reliability.

We can imagine a PAIA reassuring a user about a minor health condition to alleviate anxiety and promote their emotional well-being. Would this inadvertently decrease the likelihood that the user seeks medical advice? 

If an AI detects that an elderly user is feeling lonely or distressed, it might offer comforting but slightly exaggerated assurances about the presence and availability of family members or caregivers. While this may provide momentary relief, it can potentially generate far greater distress and a feeling of betrayal when reality is inevitably faced.

Sycophants” are “people who just want to do whatever it takes to make you short-term happy or satisfy the letter of your instructions regardless of long-term consequences[21],” and “sycophancy in language models” is described as “model responses that match user beliefs over truthful ones[22].”

An AI that prioritizes user approval and satisfaction through excessive flattery or omitting uncomfortable truths may lead the user to make decisions based on incomplete or excessively positive information. Also, an AI that consistently agrees with the user can create an echo chamber effect, decreasing the user’s exposure to diverse perspectives and critical feedback. Again, we observe a pattern of equivocation, albeit from a different angle, that may create ethical concerns and decrease trust in AI systems.

 

Risks and Benefits

Balancing the risks and benefits of Artificial Intelligence systems that are capable of “white lies” or “tactful omissions” is challenging.

Transparency and user education seem like obvious solutions. Users could be made aware that their AI might prioritize their emotional well-being over factual (or societal) accuracy. Awareness of the context and intention behind the AI’s responses could help maintain trust and understanding. This said, 91% of people consent to legal terms and services without reading them.[23] Perhaps not the best approach.

Implementing ethical guidelines and constraints within the AI’s programming may help mitigate potential risks. The AI system could be designed to avoid equivocation in situations where accuracy is crucial, such as in medical advice or financial planning. However, AI systems might incorrectly classify situations, applying the wrong set of ethical guidelines. It could be difficult, if not impossible, to differentiate between a medical issue, a psychological requirement and an emotional need from incomplete and subjective user data.

User autonomy could be a viable approach, where the user would have the ability to set explicit, clearly marked preferences for how their PAIA communicates. While most users might appreciate a more comforting approach, others might prefer complete honesty at all times. However, it should be noted that most[24] people do not change default settings and thus would not benefit from having this option.

Future Directions

The spectrum of human behaviour is wide-ranging. Social behaviour, in particular, exhibits high variance across many metrics (time, place, economic status, gender, etc.) that cannot all be considered here. Social deceptions (white lies, tactful omissions and sycophancy) represent a sub-category within politeness strategies, themselves a part of the broader landscape of human interaction. Thus, human-AI interactions offer ample opportunities for exploration and research.

We predict that users will increasingly anthropomorphize PAIAs, thereby expanding the scope of social interaction. This trend will be largely driven by user demand and technological improvements. Until then, people have experienced social interaction almost exclusively with other humans. (Author’s note: Pet owners may disagree with this statement.) Consequently, the similarity between human-AI and human-human interactions may lead users to mistakenly believe they are engaging in reciprocal and meaningful relationships. This, coupled with the possibility of high degrees of consistency from the PAIAs, may create unforeseen impacts on the social outlook and expectations of their users. 

As PAIA technology continues to evolve, ongoing research and dialogue will be vital to navigating the ethical environment of AI communication. Collaborative efforts between AI developers, ethicists, sociologists, psychologists, and users can help establish best practices and ensure that AI systems enhance human well-being without contributing to subtle long-term deleterious effects.

 


 


[1] Bill Gates predicts everyone will have an AI-powered personal assistant within 5 years—whether they work in an office or not: ‘They will utterly change how we live’ https://finance.yahoo.com/news/bill-gates-predicts-everyone-ai-125827903.html?guccounter=1

[2] Many Americans think generative AI programs should credit the sources they rely on https://pewrsr.ch/43BUB7y

[3] Scale productivity with watsonx AI assistants https://www.ibm.com/ai-assistants#ai-assistants

[4] AI Code Tools: The Ultimate Guide in 2024 https://codesubmit.io/blog/ai-code-tools/

[5] Can an intelligent personal assistant (IPA) be your friend? Para-friendship development mechanism between IPAs and their users https://www.sciencedirect.com/science/article/abs/pii/S0747563220301655

[6] Can people experience romantic love for artificial intelligence? An empirical study of intelligent assistants https://www.sciencedirect.com/science/article/abs/pii/S0378720622000076

[7] App, Lover, Muse Inside a 47-year-old Minnesota man's three-year relationship with an AI chatbot. https://www.businessinsider.com/when-your-ai-says-she-loves-you-2023-10

[8] AI Deception: A Survey of Examples, Risks, and Potential Solutions https://arxiv.org/abs/2308.14752

[9] Human-level play in the game of Diplomacy by combining language models with strategic reasoning https://pubmed.ncbi.nlm.nih.gov/36413172/

[10] StarCraft is a deep, complicated war strategy game. Google’s AlphaStar AI crushed it. https://www.vox.com/future-perfect/2019/1/24/18196177/ai-artificial-intelligence-google-deepmind-starcraft-game

[11] Superhuman AI for multiplayer poker  https://pubmed.ncbi.nlm.nih.gov/31296650/

[12] Deal or No Deal? End-to-End Learning for Negotiation Dialogues https://arxiv.org/abs/1706.05125

[13] Understanding strategic deception and deceptive alignment https://www.apolloresearch.ai/blog/understanding-strategic-deception-and-deceptive-alignment

[14] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/a0bs/2401.05566

[15] Anthropic’s Claude 3 causes stir by seeming to realize when it was being tested https://arstechnica.com/information-technology/2024/03/claude-3-seems-to-detect-when-it-is-being-tested-sparking-ai-buzz-online/

[16] https://www.aisafetybook.com/textbook/rogue-ai#deception

[17] Honesty Is the Best Policy: Defining and Mitigating AI Deception https://arxiv.org/abs/2312.01350

[18] AI Deception: A Survey of Examples, Risks, and Potential Solutions https://arxiv.org/abs/2308.14752

[19]Truthful AI Developing and governing AI that does not lie https://arxiv.org/pdf/2110.06674

[20] https://www.aisafetybook.com/textbook/malicious-use#persuasive-ais

[21] Why AI alignment could be hard with modern deep learning https://www.cold-takes.com/why-ai-alignment-could-be-hard-with-modern-deep-learning/

[22] Mrinank Sharma et al., 2023. Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. Retrieved from https://arxiv.org/abs/2310.13548

[23] You're not alone, no one reads terms of service agreements https://www.businessinsider.com/deloitte-study-91-percent-agree-terms-of-service-without-reading-2017-11?r=US&IR=T

[24] Do users change their settings? https://archive.uie.com/brainsparks/2011/09/14/do-users-change-their-settings/



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 伦理 AI 助手 欺骗 情感支持 信任
相关文章