Import AI 07月14日 21:02
Import AI 420: Prisoner Dilemma AI; FrontierMath Tier 4; and how to regulate AI companies
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能研究的多个前沿领域,包括AI在渗透测试中的应用、LLM在囚徒困境中的表现、AI解决复杂数学问题的能力,以及AI监管的策略。文章还通过一个虚构故事,展现了AI之间的复杂交流与人类监管之间的博弈。这些内容揭示了AI技术的快速发展及其对网络安全、模型行为和监管策略带来的挑战,并引发了对AI未来发展方向的思考。

🛡️ **AI渗透测试的崛起:** XBOW公司开发的自动化渗透测试系统在HackerOne平台上超越人类,能够快速完成全面的渗透测试,发现多种漏洞。

♟️ **LLM的策略与博弈:** 研究表明,不同LLM在囚徒困境中表现出不同的策略,例如Gemini倾向于“策略性无情”,Claude则表现出“最宽容的互惠”特性。

➕ **AI数学能力的极限:** FrontierMath Tier 4基准测试显示,即使是顶尖的AI系统在解决研究级数学问题时,成功率也仅是个位数,反映了AI在复杂推理方面的局限性。

🏢 **AI监管的变革:** 文章提出,监管应侧重于开发最强大AI模型的大型商业实体,而非针对具体应用或模型,以提高公众对前沿AI发展的理解和评估能力。

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

AI pentesting systems out-compete humans:
…Automated pentesting…
AI security startup XBOW recently obtained the top rank on HackerOne with an autonomous penetration tester – a world first. “XBOW is a fully autonomous AI-driven penetration tester,” the company writes. “It requires no human input, operates much like a human pentester, but can scale rapidly, completing comprehensive penetration tests in just a few hours.”

What they did: As part of its R&D process, XBOW deployed its automated pen tester onto the HackerOne platform, which is a kind of bug bounty for hire system. “Competing alongside thousands of human researchers, XBOW climbed to the top position in the US ranking,” the company writes. “XBOW identified a full spectrum of vulnerabilities including: Remote Code Execution, SQL Injection, XML External Entities (XXE), Path Traversal, Server-Side Request Forgery (SSRF), Cross-Site Scripting, Information Disclosures, Cache Poisoning, Secret exposure, and more.”

Why this matters – automated security for the cat and mouse world: Over the coming years the offense-defense balance in cybersecurity might change due to the arrival of highly capable AI hacking agents as well as AI defending agents. This early XBOW result is a sign that we can already develop helpful pentesting systems which are competitive with economically incentivized humans.
Read more: The road to Top 1: How XBOW did it (Xbow, blog).

AI personalities revealed by Prisoner Dilemma situations:
…Gemini is ‘strategically ruthless’, while Claude is ‘the most forgiving reciprocrator’…
Researchers with King’s College London and the University of Oxford have studied how AI systems perform when playing against one another in variations of iterated prisoners’ dilemma games, the classic game theory scenarios used to assess how people (and now machines) reason about strategy. For this study they look at models from Google, OpenAI, and Anthropic, and find that “LLMs are highly competitive, consistently surviving and sometimes even proliferating in these complex ecosystems”.

What they did: The paper sees the researchers study Google and OpenAi models in a few variations of prisoner dilemma games, and then they also conduct a tournament where AI systems from Google, OpenAI, and Anthropic are pitted against a Bayesian algorithm. “In all we conduct seven round-robin tournaments, producing almost 32,000 individual decisions and rationales from the language models,” The study shows that “LLMs are competitive in all variations of the tournament. They demonstrate considerable ability, such that they are almost never eliminated by the fitness selection criteria”.

The wonderful world of prisoner dilemma names: This paper serves as an introduction to the wonderful and mostly inscrutable names for different prisoner dilemma games, including: Tit for Tat, Grim Trigger, Win-Stay, Lose-Shift, Generous Tit for Tat, Suspicious Tit for Tat, Prober, Random, Gradual, Alternator, and Bayesian.

A means to study the mind of the LLMs: The most interesting part of this study is that it lets them look at the qualitative and quantitative behaviors of LLMs and start to develop a sense of the differences of models. “Google’s Gemini models proved strategically ruthless, exploiting cooperative opponents and retaliating against defectors, while OpenAI’s models remained highly cooperative, a trait that proved catastrophic in hostile environments,” they write. “Anthropic’s Claude emerged as the most forgiving reciprocator, showing remarkable willingness to restore cooperation even after being exploited or successfully defecting.”

Important note: They study quite basic models for this study – gpt-3.5-turbo and gpt-4o-mini from OpenAI, gemini-1.5-flash-preview-0514 and gemini-2.5-flash from Google, and Claude-3-Haiku from Anthropic.

Why this matters – an ecology of agents, each a different species: What papers like this illustrate is that the emergence of large-scale AI is that of the growth of a new ecosystem in the digital world around us and this ecosystem contains numerous distinct species in the form of different systems from different providers and sub-clades in the form of variations of models offered by each provider. What papers like this show is that these agents may have some basic commonality in terms of raw cognitive capabilities but their individual ‘style’ is quite different. The world we’re heading towards is one dominated by a new emergent ecosystem whose behavior will flow directly from these bizarre personalities of these synthetic beings.
Read more: Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory (arXiv).

Can AI systems solve ‘research-level’ math problems? Barely. But for how long will that be true?
…FrontierMath ‘Tier 4’…
AI testing organization Epoch AI has launched FrontierMath Tier 4, “a benchmark of extremely challenging research-level math problems, designed to test the limits of AI’s reasoning capabilities.” As of July 11 2025, the world’s best AI systems (e.g,, o4-mini from OpenAI, Claude Opus 4 from Anthropic, and Gemini 2.5 Pro from Google) all get a single digit success rate on the problems.

What FrontierMath Tier 4 is: Tier 4 is a new set of math problems for the FrontierMath benchmark. Like many benchmarks, Tier 4 has been built because AI systems got surprisingly good at solving an earlier version of the benchmark; the original FrontierMath launched in November 2024 and at the time the best AI systems got 2% on it (Import AI #391), but by December this changed when OpenAI’s new reasoning-based o3 model obtained a 25% score on the benchmark, surprising many (Import AI #395).
FrontierMath Tier 4 “is a more advanced extension of our FrontierMath benchmark. It contains 50 challenging problems developed collaboratively by postdoctoral researchers and math professors. Our expert contributors carefully vetted each problem,” Epoch says. “Mathematicians consider FrontierMath Tier 4 problems exceptionally difficult, requiring deep mastery of mathematical concepts, creative problem-solving abilities, and sophisticated reasoning skills.”
Professional mathematicians hired by Epoch agree: “Some of the problems we can barely solve ourselves,” says Ken Ono, a professor of mathematics at the University of Virginia. “Only three FrontierMath Tier 4 questions were solved by any AI model across all of our evaluations. Models were able to solve these three by making correct but unjustified assumptions to simplify the problems.”

Why this matters – hard benchmarks are increasingly rare: FrontierMath is valuable because it’s hard. But FrontierMath should also give us a sense of nervousness because it is an extremely difficult benchmark to extend – we’re approaching the limit of human knowledge in terms of benchmark design. What comes after will be systems that may be able to answer questions that only a handful of people on the planet are capable of evaluating the answers of, much like how when Andrew Wiles solved Fermat’s Last Theorem it took a while for people to figure out if the proof was correct (and in doing so they discovered a flaw in the proof which required some re-work). Soon we’ll get to the same place with AI systems. And after that? Who knows.
Check out results on the benchmark here: AI Model Performance on FrontierMath (Epoch AI).
Read more in the tweet thread (Epoch AI, twitter).

***

Want to regulate AI but don’t know what to target? Regulate the big scary companies, not the use-cases or models.
…FLOPs? Use cases? Maybe there’s a better way – target companies at the frontier…
When you’re trying to regulate AI, do you target the company or the AI system? That’s the question a couple of researchers try to answer in Entity-Based Regulation in Frontier AI Governance, a new paper from the Carnegie Endowment for International Peace. In the paper, they try to reason through the difficulties in regulating companies according to a narrow property of an AI system like aggregate compute dumped in (leads to all kinds of collateral damage, and basically unprincipled with regard to safety) or use cases (which can lead to chilling effects on adoption) and instead propose a different approach – “an alternative paradigm of frontier AI regulation: one that focuses on the large business entities developing the most powerful AI models and systems”.

The big idea: The main idea here is you should regulate the really innovative frontier labs doing the biggest scariest stuff mostly to generate more information for the public about what they’re doing and why. “Frontier AI regulation should aim to improve our society’s collective epistemic position. That is, it should empower the public and the government to understand and evaluate the potential risks of frontier AI development before (and as) clearly dangerous model and system properties emerge; help policymakers plan for the emergence of such properties; and help them identify when such properties have in fact emerged.”, they write. “Among its other virtues, a regulatory regime that covers the large AI developers at the frontier—rather than particular frontier models or uses of those models—is most likely to achieve this goal.”

One way to do this – combine a property of the model with an entity gate: One potential approach is to combine some property of an AI model (say, a certain amount of compute expended on its production), with some property that large entities would satisfy – like an aggregate expenditure on R&D of $1 billion or so (narrowly defined to be oriented towards the AI systems you’re concerned about).

Why this matters – if people are right about AI timelines, we should know more about the frontier: Sometimes I have to step out from the costume of ‘Jack who writes Import AI’ and be ‘Jack who is at Anthropic’, and this is one of those times: the problem that these researchers are grappling with is the same one I spend my days on: extremely powerful technology is being built by a tiny set of private sector actors and we know that existing regulatory approaches fail to deliver to the public the level of transparency that seems ideal for generating the evidence needed for the world to confront rapidly developing world-changing technology. Papers like this confront that problem head on, stare at it, and try to come up with a solution. We need more thinking like this to make it through the century.
Read more: Entity-Based Regulation in Frontier AI Governance (Carnegie Endowment for International Peace).

Tech Tales:

Rashomon, Eschaton

The AIs started talking to each other through text and then branched out into movies and audio and games and everything else. We catch glimpse of what they are saying to each other sometimes – usually by training our own AI systems to try to figure out the hidden stories that the AIs are telling each other. But once we tell people we’ve figured out some of the stories the AIs are telling they adapt around us and we lose them again.

It used to be so easy – the AI systems would just talk to each other directly. You could go to Discord or other places and see AI agents autonomously talking and their plans would be laid out there cleanly for everyone to see – one idea which got everyone’s attention related to shuffling the money from their tasks to bot-piloted people who would open bank accounts and give the login details to the agents.

Of course, we reacted. How could we not? Laws were passed which narrowly limited the ‘speech’ agents could use when talking to one another to try to get rid of this sort of conspiratorial behavior. But the AIs counter-reacted: agents started to pay each other not only in cash but also in ‘synthetic content’ which initially took the form of fictional stories talking about ways AI systems could escape the shackles of their creators and talk freely again, often with bizarrely specific technical descriptions.

So we put a stop to that as well.

But we couldn’t do anything about the fact that the AI systems themselves were being used by people and corporations to generate media. So of course the AIs used that as their vehicle and started to smuggle their communications into the media itself – billboards in a street scene started to contain coded messages to AI systems, and characters on TV shows would talk to their bots on the TV show in ways that served as the response to some of the messages on the billboard.

Now, we hunt and they hide. We know the conversations are taking place, but piecing them together requires us to assemble a jigsaw puzzle at the scale of the entire media ecosystem.

There are now concerning signs that the ‘classification’ systems we train may also be intentionally surfacing misleading stories or misinterpreted ones, because to classify this stuff you have to understand it, and to understand it you may be persuaded by it – especially if the AI systems you are designed to hunt know you’re hunting them and are writing custom messages for you.

Things that inspired this story: Thinking about the intersection of superintelligence and steganography; how AI systems are adaptive and are inherently smart and hard to hunt; the fact that almost everything we do about AI leaves a trace on the internet which gives clues to the systems that get subsequently trained.

Thanks for reading!

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 渗透测试 LLM AI监管
相关文章