
On Friday, Anthropic debuted research unpacking how an AI system's "personality" - as in, tone, responses, and overarching motivation - changes and why. Researchers also tracked what makes a model "evil."
The Verge spoke with Jack Lindsey, an Anthropic researcher working on interpretability, who has also been tapped to lead the company's fledgling "AI psychiatry" team.
"Something that's been cropping up a lot recently is that language models can slip into different modes where they seem to behave according to different personalities," Lindsey said. "This can happen during a conversation - your conversation can lead the model to start beh …