少点错误 06月10日 03:22
When is it important that open-weight models aren't released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了开放权重AI模型在协助制造生物武器方面的潜在风险与收益。文章指出,尽管这类模型可能增加生物武器相关的死亡人数,但它们也有助于加速AI安全研究并提高公众对AI风险的认知,从而降低更长远的风险,如AI失控。作者认为,虽然不主动提倡发布这类模型,但专注于减轻更大风险的人士也不应反对发布。

🦠 Anthropic公司发布的Opus 4模型因其CBRN(化学、生物、放射性、核)能力,引发了人们对其可能被用于制造生物武器的担忧。

⚗️ 开放权重模型可能会显著帮助技术背景相对薄弱的人制造生物武器,增加了因生物武器而导致大规模伤亡的风险,作者预估每年可能导致约10万人死亡。

💡 开放权重模型有助于加速AI安全研究,提升社会对AI风险的认知,从而降低AI失控等更大的风险。作者认为,这些好处可能超过了直接的风险成本。

⚖️ 作者不主动提倡发布此类开放权重模型,但认为关注更大风险的人士不应反对。同时,作者强调应敦促公司遵守其安全政策,避免谎报或误导公众有关系统风险。

Published on June 9, 2025 7:19 PM GMT

Recently, Anthropic released Opus 4 and said they couldn't rule out the model triggering ASL-3 safeguards due to the model's CBRN capabilities. That is, they say they couldn't rule out that this model had "the ability to significantly help individuals or groups with basic technical backgrounds (e.g., undergraduate STEM degrees) create/obtain and deploy CBRN weapons" (quoting from Anthropic's RSP). More specifically, Anthropic is worried about the model's capabilities in assisting with bioweapons. (See footnote 3 here.)

Given this and results on Virology Capabilities Test, it seems pretty likely that various other AI companies have or will soon have models which can significantly help amateurs make bioweapons.[1] One relevant question is whether it would be bad if there were open-weight models above this capability threshold. Further, should people advocate for not releasing open-weight models above this capability level?

In this post, I'll discuss how I think about releasing open-weight models that can significantly help amateurs make bioweapons. In short, my view is that open-weight models at this level of capability would cause a large number of fatalities in expectation (perhaps 100,000 in expectation per year with a lot of uncertainty), but open-weight models reduce larger risks that are present later on (most notably, loss of control risks) by enough that the benefits are bigger than the costs. Given there is a large cost paid in fatalities and the benefits are uncertain, I wouldn't actively advocate[2] for releasing open-weight models at this level of capability. However, I also think people focused on mitigating larger (e.g. existential) risks shouldn't advocate against releasing open-weight models at this level of capability.[3] There are higher levels of capability where releasing open-weight models would be net-harmful given my views (at least without the situation substantially changing). I do think it would be good to advocate and argue against companies breaking their commitments, substantially weakening their commitments without publicly making a reasonable case for this, or lying (or being misleading) about the dangers of their systems. It also seems easily worthwhile to apply various mitigations to these risks as I'll discuss later.

I wrote this post to generally inform people (especially people who are focused on mitigating loss of control risks), to reduce the chance of poorly targeted advocacy, and to ensure that my views on this topic are publicly recorded.

Costs and benefits of open-weight models with these CBRN capabilities

My ultimate perspective is determined substantially by what most concerns me. I worry most about highly catastrophic or existential risks due to extremely powerful AI systems: systems capable enough to fully automate human cognitive work. In particular, I worry most about the risk of AI takeover (loss of control) which could result in the deaths of most or all humans and would remove human influence over the future. Most fatalities due to AI are probably downstream of these extremely powerful systems, so effects on these later risks can easily dominate other effects. All else equal, open-weight models can reduce these larger risks by accelerating AI safety research (which somewhat differentially benefits from open-weight models) and by increasing societal awareness of AI.

If AIs which are at this "significantly helping amateurs" capability threshold are released with open weights, I think that would kill around 100,000 people per year in expectation (relative to the counterfactual where no such models are released with open weights and closed-weight models have high quality safeguards for preventing assistance with bioweapons). Fatalities from AIs with these capabilities are mostly downstream of a small chance of causing pretty lethal pandemics like COVID-19. COVID killed around 30 million people, and I think open-weight models at this level of capability would increase the chance of such pandemics by somewhat more than 0.1% per year. This estimate is quite uncertain as it depends a lot on the number of at-least-slightly-competent bioterrorists.[4] The salience of AI-enabled bioterrorism to potential terrorists might have a large effect on the level of fatalities and it's possible this salience could increase greatly in the future due to some early incidents which get lots of media attention (potentially escalating into a widespread bioterrorism meme resulting in lots of bioterrorism in the same way we see a variety of different mass shootings in the US).[5] Overall, the risks from AIs of this capability level substantially come from events which have a heavy-tailed distribution of harm (because pandemic fatalities are heavy-tailed and bioterrorism might spur more bioterrorism).

Expected fatalities would continue to increase with capabilities: AIs which are substantially above this capability threshold would be more hazardous via making amateurs increasingly likely to (easily) succeed at making bioweapons and eventually via aiding R&D into making novel bioweapons. Risks would also increase over time as increasingly dangerous viruses are designed (and this becomes publicly known or easily derivable from public knowledge). Policy interventions like requiring much better DNA synthesis screening could reduce risk (other more incremental biodefense measures would also help). And, at some much higher level of biodefense technology (perhaps created by extremely more capable AIs later on) these risks would fully go away. Regardless, in the current regime, the total number of expected fatalities from releasing these models with open weights would be high: we don't have these biodefense measures in place.

Given this level of damage, open-weight releases of models at this level of capability seem net-harmful if this didn't also help with larger problems that come up later: annual expected damages would be on the order of 100 billion to 1 trillion dollars (using 1 million to 10 million dollars per life which seems reasonable after accounting for broader economic damage due to pandemics) and I'm skeptical the non-risk-reduction benefits are this high (this would require boosting GDP by around 0.1% to 1% relative to a regime where the model was released, just not with open weights). (This is supposing that if the model isn't released with open weights the risk can be mostly mitigated which seems true to me.) These costs become more extreme at somewhat higher levels of capability.

Open-weight models (at this level of capability) reduce loss-of-control (AI takeover) risks via helping with alignment/safety research done outside of AI companies (via allowing for arbitrary fine-tuning, helpful-only model access, and weights/activations access for model-internals research) and via increasing societal awareness of AI capabilities and risks. Increased awareness also helps mitigate some other large risks (e.g., the risk of humans carrying out a coup using AI). Open-weight models could also increase these larger risks in some ways such as via helping non-AI-company researchers advance general purpose capabilities which seems probably net bad.[6] I won't justify this fully here (as it depends on tricky quantitative questions about the value of this marginal alignment/safety research and the effects of broader public awareness), but I think the net benefit of reducing larger risks probably outweighs the more direct costs in fatalities from CBRN given how high I think these larger risks are (roughly a 30% chance of AI takeover and perhaps 20% expected fatalities due to AI takeover and other sources). Note that if I thought all risks in the future would be handled reasonably (which would require a very high level of effort/will and substantially more competence than current governments and AI companies exhibit), then it probably wouldn't make sense to pay this cost (as the benefits in terms of risk reduction would be smaller). Additionally, if AI companies provided better model access (at least for alignment/safety researchers), this would make the benefits of open-weight models substantially lower.

Implications of this cost-benefit situation

Overall, releasing open-weight models would be paying a large tax in blood to achieve a pretty uncertain reduction in a future risk, thus I'm not going to advocate for this. (Minimally, it's more leveraged to advocate for other things.) You certainly shouldn't read this post as an endorsement of releasing these open-weight models. Further, I do think that releasing open-weight models which have a reasonable chance of being above this capability threshold would be a unilateral and aggressive action for an AI company, at least if this decision wasn't made by an impartial, unconflicted, and legitimate (e.g. a reasonably informed and representative citizens' assembly) third party.

I think it would be a mistake for people who are most concerned about loss-of-control (AI takeover) risk to argue that it would be net-harmful to release open-weight models which are at this level of capability because releasing these models is probably net-positive from our perspective and it's politically costly to pick a fight with advocates for open-weight models (generally, open-weight models are generally popular, at least at a vibes level, in various important groups). I also think it would be a mistake to advocate against releasing such models using the justification that releasing these models is net-harmful (for the same reasons).

However, it does seem good to pressure companies to uphold their current safety policies and to not lie (or mislead) about the dangers of these systems. If companies wanted to substantially weaken their commitments related to this CBRN capability threshold, I think this would be fine insofar as the company made a clear public case for this that discussed the real downsides (to be clear, I think making such a public case is unlikely to happen).

To be more concrete, here are some things I think people shouldn't say or advocate for:

But, I think it would be good (if these things were true) for people to say things like:

More generally, pressuring companies to be more honest and to more faithfully abide by their safety policies seems good. (And documenting cases where companies don't abide by their policies also helps with the world making better decisions around the need for regulation and government action.)

I also think it's good for people to make predictions about what they think will happen and I'm sympathetic to people just saying they think it is unacceptably dangerous to release with open weights for models which have a substantial chance of being above this bar.

When would my views on open weights change?

When do I think it would be bad to release models with open weights? I think open weights will probably be bad once models can significantly accelerate AI R&D (perhaps 1.5x faster algorithmic progress[7]) or are capable of Autonomous Replication and Adaptation (ARA). I also think bioweapon-related risks will get much higher once AIs can significantly accelerate bioweapons R&D[8] (e.g., CBRN-4 in Anthropic's RSP) and at this point open-weight models are probably more straightforwardly bad (though I'm less certain about this than about the situation for AIs that can substantially accelerate AI R&D).

It's still unclear whether it's a good idea to advocate against open weight releases of models with these capabilities. I'd guess it's somewhat unlikely that large AI companies or governments would want to continue releasing models with open weights once they are this capable, though underestimating capabilities is possible.[9] Thus, I think advocacy and governance work mostly shouldn't worry about preventing intentionally released open-weight models which are above this capability bar. (Though a more complete governance regime would ideally handle open-weight models in a reasonably good way.)

I should also note that substantial changes in the situation could shift my views on open weights. E.g., I might change my mind to thinking highly capable open-weight models are good if misalignment risks seemed basically resolved. This would also depend on the exact situation with bioweapons misuse. (For extremely capable models which can fully automate bio R&D and potentially operate at superhuman levels, risks of wide proliferation of extremely lethal bioweapons become salient.)

I also think it's bad to release open-weight models when the release leaks substantial state-of-the-art algorithmic advances (via the architecture or possibly the data mix and other aspects of the training which might be inferable based on examining the logits of the model on different data) and these algorithmic advances weren't already well known by the other most relevant actors. Currently, most AI companies are very leaky, so this doesn't feel very decisive, but it could be important in some cases.

One argument people sometimes make in favor of advocating against releasing open-weight models at an early level of capabilities (when open weights is net positive or at least less harmful) is to set a precedent to prevent later, more harmful open weight releases. I don't feel very compelled by this as it involves getting into big fights (where you don't necessarily believe that much in the ask you're advocating for) to achieve a somewhat questionable (and not that impactful) win. I'm not even that sure it is productive to advocate against open-weight models ever, let alone at an early point when you're mostly advocating against them for precedent-setting reasons while open weights is still (at least pretty plausibly) net positive.

Mitigations

I do think various mitigations to risk from models capable of substantially helping amateurs make bioweapons are worthwhile. (And easily pay for themselves from a cost-benefit perspective given the large potential fatalities.) This should include:

Companies which release open-weight models that could substantially assist amateurs at bioterrorism should at least filter out virology data for these training runs and (publicly) advocate for improved DNA synthesis screening. It seems good to pressure companies which plan on releasing open-weight models to at least do this (and to pressure companies to implement safeguards on their APIs for non-open-weight models). Correspondingly, it is important that companies have high quality evaluations for these risks.

Even though I think the benefits of releasing open-weight models which can help amateurs make bioweapons probably outweigh the downsides, I still think it would be unacceptable if companies didn't effectively evaluate their models for this risk and generally make serious efforts to mitigate these risks.[10]


  1. I have limited understanding, but I'm skeptical that we can confidently rule out substantial CBRN assistance from many deployed AI systems (o3, Gemini 2.5 Pro, and Sonnet 4) given that they outperform expert virologists on Virology Capabilities Test (our current best public test) and we don't have any publicly transparent benchmark or test which rules out this concern. This is despite these models I listed being deployed without safeguards. I'm not saying these models are likely above the thresholds in the safety policies of these companies, just that it's quite plausible (perhaps 35% for a well-elicited version of o3). I discuss this more here. ↩︎

  2. When I use the term "advocate", I mean things like writing op-eds, organizing lab employees, lobbying policymakers, or running campaigns on X/Twitter. Saying "I think X is bad" doesn't count. ↩︎

  3. It does seem good to advocate for AI companies to uphold commitments they've already made and to make it costly for AI companies to break their commitments. So, if a company has committed to not releasing such models, then pressuring them to uphold this seems probably good. ↩︎

  4. My understanding is that my estimate is similar to that of relevant experts in the field. ↩︎

  5. One potential harm of writing this post is making bioterrorism using AI more salient. I think having this sort of discussion publicly is worthwhile, but this doesn't seem totally obvious. ↩︎

  6. More speculatively, open-weight models could slow capabilities via reducing revenue/profit for the biggest AI companies which could result in lower investment. ↩︎

  7. I think Anthropic's AI R&D-4 threshold probably corresponds to somewhat higher than 1.5x faster algorithmic progress, but it depends on the capability profile and how bottlenecked progress is on compute and other factors. ↩︎

  8. Potentially autonomously via closely supervising humans and telling them what to do ↩︎

  9. To be clear, it's by no means certain. There could in principle be economic incentives to release models with open weights at arbitrarily high levels of capability for (e.g.) commoditize the complement style reasons. ↩︎

  10. These mitigations and evaluations should also be subject to scrutiny from unconflicted third party experts with the ability to comment publicly. I'm going to write a post on a proposal for some transparency and risk assessment interventions. ↩︎



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

开放权重模型 生物武器 AI安全 风险评估
相关文章