Anthropic @AnthropicAI
New Anthropic Research: A new set of evaluations for sabotage capabilities.
As models gain more agentic abilities, we need to get smarter in how we monitor them. We’re publishing a new set of complex evaluations that test for sabotage—and sabotage-monitoring—capabilities.
As models gain more agentic abilities, we need to get smarter in how we monitor them. We’re publishing a new set of complex evaluations that test for sabotage—and sabotage-monitoring—capabilities.
