少点错误 2024年09月14日
Can startups be impactful in AI safety?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了近期举办的 AI 安全黑客马拉松,探讨了如何将 AI 安全理念转化为成功的商业模式。活动吸引了 101 位参与者,最终有 11 个项目入围决赛,涵盖了内容验证、机器人对齐、AI 安全集体等多个领域。这些项目展现了 AI 安全创业的潜力,并为未来 AI 安全发展提供了新的思路。

👨‍💻 **内容验证:DarkForest** DarkForest 提出了一种名为“人类内容验证系统 (HCVS)”的解决方案,旨在利用图论强化学习和区块链技术来确保在线内容的真实性。该项目旨在通过建立一个基于图论的强化学习代理来识别和验证网络内容的真实性,从而抵御 AI 生成的虚假内容。DarkForest 还计划通过探索平台功能来吸引内容创作者,并利用现有的数据集来验证验证决策的准确性。尽管 DarkForest 的目标是保护网络空间免受 AI 生成的虚假内容的侵害,但其成功的关键在于吸引内容创作者加入平台,并确保验证决策的准确性。

🤖 **机器人对齐:Simulation Operators** 随着 AI 技术在医疗、制造、汽车等领域的应用不断深入,对机器人训练和对齐的需求也日益增长。Simulation Operators 项目旨在通过提供场景数据集和反馈机制来解决这一问题。该项目认为,由于目前 AI 和机器人实验室无法满足所有训练和对齐需求,因此存在着为特定领域提供训练数据的市场机会。Simulation Operators 还强调了人类干预在监管部署的机器人中的重要性,并计划开发专门针对不同行业的监督平台。尽管该项目对解决机器人对齐问题具有重要意义,但其挑战在于如何与现有的强化学习服务提供商进行竞争,并向机器人公司证明其价值。

🤝 **AI 安全集体:Crowdsourcing Solutions** AISafetyCollective 提出了一种利用众包方式解决 AI 安全问题的平台。该平台允许 AI 公司发布安全挑战,并为解决方案提供奖励。AI 安全专家和爱好者可以通过参与挑战来赚取奖励。AISafetyCollective 初期将专注于非灾难性风险,以吸引更多参与者,并逐步扩展到更复杂的领域。该平台通过同行评审和风险评估机制来控制质量和安全。尽管该平台旨在推动 AI 安全领域的创新、问责制和合作,但其面临着如何吸引足够多的参与者和确保高质量解决方案的挑战。

Published on September 13, 2024 7:00 PM GMT

With Lakera's strides in securing LLM APIs, Goodfire AI's path to scaling interpretability, and 20+ model evaluations startups among much else, there's a rising number of technical startups attempting to secure the model ecosystem.

Of course, they have varying levels of impact on superintelligence containment and security and even with these companies, there's a lot of potential for aligned, ambitious and high-impact startups within the ecosystem. This point isn't new and has been made in our previous posts and by Eric Ho (Goodfire AI CEO). 

To set the stage, our belief is that these are the types of companies that will have a positive impact:

Piloting AI safety startups

To understand impactful technical AI safety startups better, Apart Research joined forces with collaborators from Juniper Ventures, vectorview (alumni from the latest YC cohort), Rudolf (from the upcoming def/acc cohort), Tangentic AI, and others. We then invited researchers, engineers, and students to resolve a key question “can we come up with ideas that scale AI safety into impactful for-profits?”

The hackathon took place during a weekend two weeks ago with a keynote by Esben Kran (co-director of Apart) along with ‘HackTalks’ by Rudolf Laine (def/acc) and Lukas Petersson (YC / vectorview). Individual submissions were a 4 page report with the problem statement, why this solution will work, what the key risks of said solution are, and any experiments or demonstrations of the solution the team made.

This post details the top 6 projects and excludes 2 projects that were made private by request (hopefully turning into impactful startups now!). In total, we had ? 101 signups and ? 11 final entries. Winners were decided by an LME model conditioned on reviewer bias. Watch the authors' lightning talks here.

Dark Forest: Making the web more trustworthy with third-party content verification

By Mustafa Yasir (AI for Cyber Defense Research Centre, Alan Turing Institute)

Abstract: ‘DarkForest is a pioneering Human Content Verification System (HCVS) designed to safeguard the authenticity of online spaces in the face of increasing AI-generated content. By leveraging graph-based reinforcement learning and blockchain technology, DarkForest proposes a novel approach to safeguarding the authentic and humane web. We aim to become the vanguard in the arms race between AI-generated content and human-centric online spaces.’

Content verification workflow supported by graph-based RL agents deciding verifications

Reviewer comments:

Natalia: Well explained problem with clear need addressed. I love that you included the content creation process - although you don't explicitly address how you would attract content creators to use your platform over others in their process. Perhaps exploring what features of platforms drive creators to each might help you make a compelling case for using yours beyond the verification capabilities. I would have also liked to see more details on how the verification decision is made and how accurate this is on existing datasets.

Nick: There's a lot of valuable stuff in here regarding content moderation and identity verification. I'd narrow it to one problem-solution pair (e.g., "jobs to be done") and focus more on risks around early product validation (deep interviews with a range of potential users and buyers regarding value) and go-to-market. It might also be worth checking out Musubi.

Read the full project here.

Simulation Operators: An annotation operation for alignment of robot

By Ardy Haroen (USC)

Abstract: ‘We bet on agentic AI being integrated into other domains within the next few years: healthcare, manufacturing, automotive, etc., and the way it would be integrated is into cyber-physical systems, which are systems that integrate the computer brain into a physical receptor/actuator (e.g. robots). As the demand for cyber-physical agents increases, so would be the need to train and align them. We also bet on the scenario where frontier AI and robotics labs would not be able to handle all of the demands for training and aligning those agents, especially in specific domains, therefore leaving opportunities for other players to fulfill the requirements for training those agents: providing a dataset of scenarios to fine-tune the agents and providing the people to give feedback to the model for alignment. Furthermore, we also bet that human intervention would still be required to supervise deployed agents, as demanded by various regulations. Therefore leaving opportunities to develop supervision platforms which might highly differ between different industries.’

Esben: Interesting proposal! I imagine the median fine-tuning capability for robotics at labs is significantly worse than LLMs. At the same time, it's also that much more important. I know OpenAI uses external contractors for much of their red-teaming work and it seems plausible that companies would bring in an organization like you describe to reduce the workload on internal staff. If I ran a robotics company however, I would count it as one of my key assets to be good at training that robot as well and you will probably have to show a significant performance increase to warrant your involvement in their work. I'm a fan of alignment companies spawning and robotics seems to become a pivotal field in the future. I also love the demand section outlining the various jobs it would support / displace.

Natalia: Really interesting business model! Good job incorporating new regulation into your thinking and thinking through the problem/proposed solution and its implications. However, I am curious about what would differentiate your proposed solution relative to existing RLHF service providers (even if currently mostly not applied to robotics).

Read the full project here.

AI Safety Collective: Crowdsourcing solutions to critical corporate AI safety challenges

By Lye Jia Jun (Singapore Management University), Dhruba Patra (Indian Association for the Cultivation of Science), Philipp Blandfort

Abstract: 'The AI Safety Collective is a global platform designed to enhance AI safety by crowdsourcing solutions to critical AI Safety challenges. As AI systems like large language models and multimodal systems become more prevalent, ensuring their safety is increasingly difficult. This platform will allow AI companies to post safety challenges, offering bounties for solutions. AI Safety experts and enthusiasts worldwide can contribute, earning rewards for their efforts. The project focuses initially on non-catastrophic risks to attract a wide range of participants, with plans to expand into more complex areas. Key risks, such as quality control and safety, will be managed through peer review and risk assessment. Overall, The AI Safety Collective aims to drive innovation, accountability, and collaboration in the field of AI safety.'

Minh (shortened): This is a great idea, similar to what I launched 2 years ago: super-linear.org. Your assessments are mostly accurate. Initially, getting postings isn't too difficult - people often post informal prizes/bounties across EA channels. However, you'll quickly reach a cap due to the small size of the EA/alignment community and the even smaller subset of bounties. Small ad-hoc prizes don't scale well. Manifund's approach of crowdfunding larger AI Safety projects is more effective, as these attract more effort and activity. Feel free to contact me or Manifund for more information. While their focus differs slightly, many lessons would apply. Meanwhile, here's a list of thousands of AI Safety/alignment bounty ideas I've compiled. Also check out Neel Nanda's open mech interp problems doc.

Esben: Pretty great idea, echoing a lot of our thoughts from the Apart Sprints as well! There are still a few issues with a bounty platform like this that would need to be resolved, e.g. 1) which problems that require little work from me and can be given to others without revealing trade information would actually be high enough value to get solved while not causing security risks to my company and 2) how will this platform compete with places like AICrowd and Kaggle (I'd assume this would be mostly based on the AI safety focus). At the same time, there's some nice side benefits, such as the platform basically exposing predicted incidents for vulnerabilities at AI companies (though again, dependent on companies actually sharing this). Cool project and great to see many of your ideas in the live demo from your presentation!

Read the full project here.

Identity System for AIs: Re-tracing AI agent actions

By Artem Grigor (UCL)

Abstract: 'This project proposes a cryptographic system for assigning unique identities to AI models and verifying their outputs to ensure accountability and traceability. By leveraging these techniques, we address the risks of AI misuse and untraceable actions. Our solution aims to enhance AI safety and establish a foundation for transparent and responsible AI deployment.'

Esben: What a great project! This is a clear solution to a very big problem. It seems like the product would need both providers and output-dependent platforms to be in on the work here, creating an incentive to mark your models in the first place. I would also be curious to understand if a malicious actor wouldn't be able to simply forge the IDs and proofs for a specific output? In general, it would be nice to hear about more limitations of this method, simply due to my somewhat limited cryptography background. It would also be interesting to see exactly where it would be useful, e.g. how exactly can something like Facebook use it to combat fake news? Otherwise, super nice project and clearly showcases a technical solution to an obvious and big problem while remaining commercially viable as well.

Read the full project here.

Honorable mentions

Besides the two projects that were kept private, several other projects deserve honorable mentions. We also suggest you check out the rest of the projects.

ÆLIGN: Aligned agent-based Workflows via collaboration & safety protocols

ÆLIGN is a multi-agent orchestration tool for reliable / safe agent interaction. It specifies communication protocol designs and monitoring systems to standardize and control agent behavior. As Minh mentions, the report undersells the value of this idea and the demo is a good representation for this idea. Read the full project here.

WELMA: Open-world environments for Language Model agents

WELMA details a sandbox tool to build environments for testing language models and takes much of the challenges paradigm of METR and AISI's evals and thinks about how it might look at the next stage. Read the full project here.

What evidence is this for impactful AI safety startups?

So, was our pilot for AI safety startups successful? We knew beforehand that the startup ideas wouldn't be a be-all-end-all solution to a specific problem but that safety and security is a complex question where a single problem (e.g. the chance of escape from rogue superintelligences) can give a single company work for years. 

With that said, we're cautiously optimistic about the impact companies might have during the next years in AI safety. This update we see from both participants and our networks is relatively well summarized by our participant Mustafa Yasir (AI for Cyber Defense Research Centre):

“[The AI safety startups hackathon] completely changed my idea of what working on 'AI Safety' means, especially from a for-profit entrepreneurial perspective. I went in with very little idea of how a startup can be a means to tackle AI Safety and left with incredibly exciting ideas to work on. This is the first hackathon in which I've kept thinking about my idea, even after the hackathon ended.”

And we can only agree.

If you are interested in joining future hackathons, find the schedule here.

Thank you to Nick Fitz, Rudolf Laine, Lukas Petersson, Fazl Barez, Jonas Vollmer, Minh Nguyen, Archana Vaidheesvaran, Natalia Pérez-Campanero Antolin, Finn Metz, and Jason Hoelscher-Obermaier for making this event a success.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 安全 黑客马拉松 创业
相关文章
2018年-2023年欧洲移动出行创业公司种子轮风险投资额(附原数据表) ​​​
2023年Q1-2024年Q1亚洲天使与种子轮风险投资额交易量(附原数据表) ​​​
大模型公司开始在应用层争抢占位
前天听一个85后女企业家分享她的创业故事,重点讲了几次大的失败。深受震撼。 她08年接的第一个外贸大单,是个一百多万货值的单子。当时她刚入行不懂,跑工厂跟...
创业的一个milestone可能是 不主动跟人提自己的过往背景(大厂、VC之类)且觉得那不是一个加分项
薇美姿拼营销官宣2位代言人,创始人王梓权股权近期遭冻结
今天发现了一个很聪明的创业点子,特别适合正在送外卖的小哥。 在一些工厂的附近,路边摆摊,卖盒饭。一份12元,中午2小时可以卖100份,三个人合伙。一天收入120...
大家知道现在有个小生意很火爆吗? 自助台球室。 我有个朋友去年年初开了一家,租了个不临街的铺面3000元。24小时营业,四张桌子,扫码进入。每天早上去打扫下卫...
见了一个消费品前辈,观点对我冲击蛮大 从1994到2024,过去30年简直疯狂;所以我们往后看,其实不是期待消费品的低迷周期什么时候结束,是就是看不见了,只能看...
《伟大始于无名》 在线阅读英文版:https://chipwilson.com/chapter/why-i-am-writing-this-book 网站内有PDF.EPUB 等格式下载。 本来想用沉浸翻一下,但是貌似...