Anthropic @AnthropicAI
Current models are not effective saboteurs—nor are they good monitors.
But our evals are designed for the future: smarter AIs will do better on these tasks. Our evals will be useful to help developers assess their capabilities.
Read the full paper: https://t.co/PVuiEl1z9Z
But our evals are designed for the future: smarter AIs will do better on these tasks. Our evals will be useful to help developers assess their capabilities.
Read the full paper: https://t.co/PVuiEl1z9Z