Communications of the ACM - Artificial Intelligence 8小时前
Is AI Security Work Best Done In Academia or Industry? Part 2
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在人工智能安全领域,学术界相较于工业界的优势。尽管工业界在计算资源、数据和薪酬方面具有优势,但学术界在言论自由、人才流动和持续创新方面更具吸引力。开放源代码模型和数据集的兴起也缓解了学术界在资源上的限制。此外,学术界能够自由地发现并公开AI产品的漏洞,这对于推动更安全的人工智能发展至关重要。文章强调了学术界和工业界在人工智能安全研究中合作的重要性,并对学术界在这一领域持续发挥的关键作用表示乐观。

💻 学术界在AI安全研究中拥有更大的自由度,可以自由地发现和报告模型、训练协议或计算基础设施中的漏洞,而无需担心商业利益的限制。

🧑‍🎓 学术界受益于源源不断的新鲜人才,这些学生和研究人员充满活力,渴望学习和成长,他们较低的薪资期望使得学术界能够保持创新活力。

📚 开放源代码模型(如Meta的Llama、Google的Gemma和Mistral AI的Mistral)以及开放数据集(如OpenSubtitles、DailyDialog、The Pile和FRAMES)的出现,极大地促进了学术界在LLM领域的研究。

💰 尽管工业界的薪酬更高,但学术界在“热门领域”(如人工智能)的薪资也在上涨,许多学者还拥有兼职的工业界职位,这使得学术界能够吸引和留住顶尖的AI人才。

We left off in Part 1 with arguments for why industry has become the place for major leaps in AI, though with several notable exceptions, both historical and ongoing. In Part 2, we will consider the counterpoints that make it attractive for doing AI security work in academic lanes and bylanes.

Doubtless, a best-of-both-worlds approach is also a winner here—an academic researcher who has affiliation to some industrial organization and thus has the three advantages discussed in the Part 1: vast compute resources; vast amounts of data; and higher compensation structure. That lane, though, is a narrow one, and has become progressively narrower because of organizational reasons—someone with an industrial affiliation is often not free to speak her mind about weak spots in that organization’s commercial offerings. Take the famous case of Geoffrey Hinton, who stepped back from Google in 2023 in order to speak freely about the potential dangers of artificial intelligence, or more recently, in March 2025, when prolific and respected AI security researcher Nicholas Carlini left Google DeepMind.

So here are the counterpoints favoring AI security work in academia.

Vast Compute Resources

Regarding the training of foundational models, this is indeed well nigh impossible in academic circles. We have to be thankful to several industrial organizations that have released fully trained, open source models: Llama from Meta, Gemma from Google, and Mistral from Mistral AI, to name a few. Bless their souls, because without them, academic work on LLMs would indeed have crawled. These are not toy models either; Llama3’s 70B parameters, while decidedly smaller than those of the leaders, is nothing to smirk at. We can tinker with these open source models to our heart’s content. Further, we fine-tune a large model to suit our needs, and that takes but a fraction of the cost of training a model from scratch. (Pay attention to the licenses of these open source models, though, if you have plans to step outside the realm of academic research, and into the realm of commercial use.)

Vast Amounts of Data

Regarding the need for data, we have, in part, the same trend noted above to thank. The initial training of these open source models took seemingly limitless troves of data. Once that de novo training is done, though, the fine-tuning takes much more moderate amounts of data.

Another positive trend in our technology world has been the release of open source datasets. We are hooked to collecting data to document every nook of our lives (pictures and videos, wearable data with a continuous record of our physiology, and sensor data for a precise, up-to-the-minute record of our physical environments). A good many of us are likewise hooked to releasing such data; a data analog of the dopamine trigger of contributing to a Wikipedia article. Thus, even the environment of data scarcity is being mitigated to some extent.

Arguably, the world-changing dataset in the field of image processing/computer vision, now part of technical folklore, is the ImageNet dataset. There is a favorable evolutionary pattern toward creation of open-source datasets in the field of LLMs as well: OpenSubtitles, DailyDialog (for chatbot), The Pile (a diverse catch-all dataset for various LLM tasks), and FRAMES (for reasoning and knowledge tasks) are hugely popular.

The Compensation Structure

Indeed, a large fraction of AI talent, once trained, gravitates toward the opulence of well-resourced industrial organizations. However, the tale of academic penury vs. industrial opulence is overblown. Academic salaries have risen for those in the “hot areas” (and AI is glowing, furiously red-hot) driven by market trends, and many of us are regularly approached by our friends in industry as a result. Plus, many academics in these hot areas have part-time industrial appointments. So there are enough of my colleagues in academia who are leading thinkers in AI, and doers (lest you have the antiquated notion of academics just preaching and never doing).

As importantly, we have the benefit of the continuous flow of fresh talent, a pristine stream that seemingly magically gets continually replenished. In the U.S., we have had a long period of being the beacon for talented students and researchers from across the world. Such talented youngsters are fast learners and have the necessary naïveté of the untrained to propose leap-through ideas. They have the bug to learn and to grow their educational qualifications, and thus, a far lower salary than industry offers is not an impediment. This is true, though, only during the early phase of their careers, but that is enough to keep our pipeline in academia humming along. 

No Muzzles Allowed

Towering above all else and bridging the three points above is one factor that makes academia the place where security research in AI makes those big leaps. That factor is the lack of a muzzle. We are free to speak our minds, find vulnerabilities in the models or their training protocols or the compute infrastructure on which they run, and communicate them with our industrial colleagues so that our race to more intelligent AI models is also a race toward more secure AI models. 

Often explicitly and sometimes implicitly, industrial practitioners are forbidden from poking holes in their AI products, the cash cows that dare not be slowed down. In the climate of the ‘wild, wild west’ that has persisted in AI since its beginning days, there will be vulnerabilities in the software, especially because of the frenetic pace of development. Yet one is not incentivized to look too carefully into these vulnerabilities. The tech blogosphere and tech social media channels get regular doses of news items where some influential AI person has been eased out of their company because they sounded an early alarm that somehow became public. 

For the most part in academia, we are free from such constraints. We make a name by finding vulnerabilities, and by going farther, suggesting mitigations. Real impact arises from instantiating the mitigations in the actual software or model, which almost invariably involves a harmonious cooperation between academic and industry personnel.

To Sum

Academia in the U.S. has been the fertile soil where new ideas take root and flourish, including in the fast-moving, society-upturning field of AI. Specifically for security in AI, there are fundamental forces that favor academia as the place where many significant advancements will sprout. These forces have shaped this trend for several years now, and I see that this trend will last. So here’s an energetic hurray for us AI security researchers in academia, and even more of a hearty toast to synchronized efforts between academic and industry researchers and practitioners.

This post was originally published on Distant Whispers.

Saurabh would like to thank Rama Govindaraju of Nvidia for providing insightful comments on a draft of this article. The views in the article however are Saurabh’s own.

Saurabh Bagchi is a professor of Electrical and Computer Engineering and Computer Science at Purdue University, where he leads a university-wide center on resilience called CRISP. His research interests are in distributed systems and dependable computing, while he and his group have the most fun making and breaking large-scale usable software systems for the greater good.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能安全 学术研究 开放源代码 人才流动
相关文章