少点错误 2024年08月10日
Provably Safe AI: Worldview and Projects
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着人工智能技术迅速发展,其潜在风险也日益显现。为了确保AI的安全发展,Max Tegmark和Steve Omohundro提出了“可证明安全的AI”策略,并进一步扩展为“保证安全的AI”概念。该策略旨在通过数学方法和人工智能技术,构建可证明安全的软件、硬件和社会机制,以应对AI可能带来的巨大风险。

💻 **可证明安全的AI:确保AI安全发展的关键** 随着人工智能技术的迅速发展,其潜在风险也日益显现。为了确保AI的安全发展,Max Tegmark和Steve Omohundro提出了“可证明安全的AI”策略,并进一步扩展为“保证安全的AI”概念。该策略旨在通过数学方法和人工智能技术,构建可证明安全的软件、硬件和社会机制,以应对AI可能带来的巨大风险。 可证明安全的AI的核心思想是利用数学方法和人工智能技术,对AI系统进行严格的验证,确保其行为符合预期,并不会产生意外或有害的结果。这可以通过以下几个方面来实现: * **可证明安全的软件:** 利用形式化方法和AI技术,生成可证明安全的软件代码,确保软件的正确性和安全性。 * **可证明安全的硬件:** 利用数学物理学原理,设计可证明安全的硬件系统,确保硬件的可靠性和安全性。 * **可证明安全的社会机制:** 将社会规则和制度进行形式化,利用AI技术设计可证明有效的社会协议,确保社会秩序和稳定。

📢 **可证明安全的AI的优势:应对AI带来的潜在风险** 可证明安全的AI可以有效地应对AI带来的潜在风险,主要体现在以下几个方面: * **防止AI失控:** 通过构建可证明安全的AI系统,可以有效地防止AI失控,避免AI对人类造成伤害。 * **提高AI可靠性:** 可证明安全的AI系统能够确保AI的可靠性,避免AI出现错误或漏洞,从而提高AI系统的安全性。 * **促进AI良性发展:** 可证明安全的AI能够促进AI的良性发展,引导AI技术朝着安全、可控的方向发展,为人类社会带来福祉。

📡 **可证明安全的AI的应用前景:构建安全可靠的AI未来** 可证明安全的AI具有广泛的应用前景,可以应用于各个领域,例如: * **自动驾驶:** 利用可证明安全的AI技术,可以开发出更加安全的自动驾驶系统,提高自动驾驶的安全性。 * **医疗诊断:** 利用可证明安全的AI技术,可以开发出更加准确可靠的医疗诊断系统,提高医疗诊断的效率和准确性。 * **金融交易:** 利用可证明安全的AI技术,可以开发出更加安全可靠的金融交易系统,提高金融交易的安全性。 * **网络安全:** 利用可证明安全的AI技术,可以开发出更加安全的网络安全系统,提高网络安全的防御能力。 可证明安全的AI是构建安全可靠的AI未来的关键,它将为人类社会带来巨大的益处。

📄 **可证明安全的AI:未来发展趋势** 可证明安全的AI目前仍处于发展阶段,未来将会有更大的发展空间。以下是可证明安全的AI未来发展趋势: * **工具的改进:** 形式化方法和AI技术将不断改进,为可证明安全的AI提供更加强大的工具支持。 * **应用范围的扩展:** 可证明安全的AI将应用于更多领域,例如医疗、金融、交通、安全等。 * **标准的制定:** 将制定可证明安全的AI标准,确保AI系统符合安全规范。 * **国际合作:** 各国将加强合作,共同推动可证明安全的AI发展。

💦 **可证明安全的AI:需要解决的挑战** 可证明安全的AI发展过程中面临着一些挑战,例如: * **技术难度:** 可证明安全的AI技术难度较大,需要解决形式化方法和AI技术结合的问题。 * **成本问题:** 可证明安全的AI系统开发成本较高,需要克服成本问题。 * **人才短缺:** 可证明安全的AI领域人才短缺,需要培养更多专业人才。 * **社会接受度:** 可证明安全的AI需要得到社会公众的认可和接受。

📣 **可证明安全的AI:未来展望** 可证明安全的AI是应对AI风险、确保AI安全发展的关键,它将为人类社会带来巨大的益处。未来,可证明安全的AI将得到更加广泛的应用,为人类社会创造更加美好的未来。

👫 **可证明安全的AI:需要关注的重点** 可证明安全的AI是一个新兴领域,需要关注以下几个方面: * **研究方向:** 继续深入研究可证明安全的AI技术,解决技术难题。 * **标准制定:** 制定可证明安全的AI标准,规范AI发展。 * **人才培养:** 培养更多可证明安全的AI专业人才。 * **社会宣传:** 加强社会宣传,提高公众对可证明安全的AI的认知和理解。

📒 **可证明安全的AI:未来期待** 可证明安全的AI将为人类社会带来更加安全、可靠、可控的AI未来。让我们共同努力,推动可证明安全的AI发展,为人类社会创造更加美好的未来。

Published on August 9, 2024 11:21 PM GMT

In September 2023, Max Tegmark and Steve Omohundro proposed "Provably Safe AI" as a strategy for AI Safety. In May 2024, a larger group delineated the broader concept of "Guaranteed Safe AI" which includes Provably Safe AI and other related strategies. In July, 2024, Ben Goldhaber and Steve discussed Provably Safe AI and its future possibilities, as summarized in this document.

Background

In June 2024, ex-OpenAI AI Safety Researcher Leopold Aschenbrenner wrote a 165-page document entitled "Situational Awareness, The Decade Ahead" summarizing AI timeline evidence and beliefs which are shared by many frontier AI researchers. He argued that human-level AI is likely by 2027 and will likely lead to superhuman AI in 2028 or 2029. "Transformative AI" was coined by Open Philanthropy to describe AI which can "precipitate a transition comparable to the agricultural or industrial revolution". There appears to be a significant probability that Transformative AI may be created by 2030. If this probability is, say, greater than 10%, then humanity must immediately begin to prepare for it.

The social changes and upheaval caused by Transformative AI are likely to be enormous. There will likely be many benefits but also many risks and dangers, perhaps even existential risks for humanity. Today's technological infrastructure is riddled with flaws and security holes. Power grids, cell service, and internet services have all been very vulnerable to accidents and attacks. Terrorists have attacked critical infrastructure as a political statement. Today's cybersecurity and physical security barely keeps human attackers at bay. When these groups obtain access to powerful cyberattack AI's, they will likely be able to cause enormous social damage and upheaval.

Humanity has known how to write provably correct and secure software since Alan Turing's 1949 paper. Unfortunately, proving program correctness requires mathematical sophistication and it is rare in current software development practice. Fortunately, modern deep learning systems are becoming proficient at proving mathematical theorems and generating provably correct code. When combined with techniques like "autoformalization," this should enable powerful AI to rapidly replace today's flawed and insecure codebase with optimized, secure, and provably correct replacements. Many researchers working in these areas believe that AI theorem-proving at the level of human PhD's is likely about two years away.

Similar issues plague hardware correctness and security, and it will be a much larger project to replace today's flawed and insecure hardware. Max and Steve propose formal methods grounded in mathematical physics to produce provably safe physical designs. The same AI techniques which are revolutionizing theorem proving and provable software synthesis are also applicable to provable hardware design.

Finally, today's social mechanisms like money, contracts, voting, and the structures of governance, will also need to be updated for the new realities of an AI-driven society. Here too, the underlying rules of social interaction can be formalized, provably effective social protocols can be designed, and secure hardware implementing the new rules synthesized using powerful theorem proving AIs.

What's next?

Given the huge potential risk of uncontrolled powerful AI, many have argued for a pause in Frontier AI development. Unfortunately, that does not appear to be a stable solution. Even if the US paused its AI development, China or other countries could gain an advantage by accelerating their own work.

There have been similar calls to limit the power of open source AI models. But, again, any group anywhere in the world can release their powerful AI model weights on BitTorrent. Mark Zuckerberg has concluded that it is better to open source powerful AI models for the white hats than to try to keep them secret from the black hats. In July 2024, Meta released the Llama 3.1 405B large language model which has a comparable performance to the best closed source models. We should expect access to the most powerful AI models to grow, and that this will enable both new levels of harmful cyber/social attacks and powerful new safety and security capabilities.  

Society tends to respond to threats only after they have already caused harm. We have repeatedly seen that once harmful actions have occurred, however, citizens and politicians rapidly demand preventive measures against recurrences. An AI-powered cyberattack which brings down the power grid, financial system, or communication networks would likely cause a huge demand for AI-secure infrastructure. One powerful way to have an impact is to prepare a strategy in advance for responding to that demand. The Manifold prediction market currently estimates a 29% chance of an AI "warning shot" before October 20, 2026. If this is accurate, it would be of great value to create detailed plans over the next two years.

Here is a potential positive scenario: In 2025, groups who see the potential for "Provably Safe Technology" fine tune Llama 4 to create a powerful model for generating verified software and hardware. This model spreads widely and by using "autoformalization," groups are able to feed it their current documentation and code and have it generate highly optimized, provably correct, provably secure replacements. The world's software vulnerabilities rapidly disappear. Chipmakers use the new AI to rapidly generate provably correct, efficient, and provably secure chips for every function. The new AI redesigns cryptographic protocols to eliminate quantum-vulnerabilities and other cryptographic weaknesses. The AI also generates new social mechanisms, eliminating current inefficiencies and vulnerabilities to corruption. And the new mechanisms enable governments to create mutually-beneficial treaties with hardware enforcement. "Provably-Constrained Self-Improvement" replaces "Unconstrained Self-Improvement" and GPU hardware is changed to only execute constrained models. AI improves to help humanity and we create infrastructure that enables all humans to thrive!

With more provably secure infrastructure in place, the surface area for disruption from AI misuse will be reduced, as well as by exploitation by rogue autonomous AIs. Expanding the ‘formally safe zone’ will increase the difficulty of takeover by AIs in general.

In the longer term, the "provably safe" approach has the potential to reduce the risk of AI takeover and to solve many of humanity's current problems. This requires building infrastructure which is invulnerable to AI attacks (as outlined in the Tegmark and Omohundro paper). It would enable a future in which powerful AIs can be tasked with solving humanity's biggest challenges (eg. curing cancer, solving the climate crisis, eliminating social dilemmas, etc.) while provably preventing runaway self-improvement.

Some Useful Projects

Strategically, the above considerations suggest that we emphasize preparation for the use of powerful future AI tools rather than putting large efforts into building systems with today's tools. For example, today a software team might spend a year designing, building, and testing a system with a good user interface to use AI to safely control an industrial process. In two years, an AI synthesis tool may be able to generate a formally verified implementation in an afternoon from a specification of the requirements. As AI software synthesis and theorem proving systems advance, they should be able to rapidly impact human infrastructure.

The currently best developed of the needed tools are software formal methods systems like the Dafny Programming and Verification Language which can prove that code satisfies a formal specification. But they require accurate specifications of the semantics of the programming language and of the desired software behavior.

Below are ideas for projects which can advance aspects of the Provably Safe agenda by advancing the principles and methods for building Provably Safe systems, or by demonstrating the potential of provably safe systems using current or likely-to-be-created-soon tools.

1) Improve methods for creating formal specifications: of systems, desired behaviors, and safety requirements. For example, there should be databases of existing specification elements and semantic search to find and verify desired behaviors.

2) Formally translate between formal languages: There are many theorem proving languages (eg. Lean, Coq, Isabelle, MetaMath, etc.) and many formal methods languages (eg. Dafny, Alloy, PRISM, etc.) but almost no interconvertibility between these. All the powerful mathematical theorems in Lean should be available to a programmer using Dafny. Modern AI tools should be able to translate between any of these and to represent the semantics of each.

3) A semantic library for probabilistic programming and machine learning: All of the popular statistical and machine learning methods should be formally specified with a range of provable bounds on behavior.

4) Precise formal specifications for physical systems: Each of the engineering disciplines (eg. Mechanical Engineering, Electrical Engineering, Chemical Engineering, etc.) needs to be formally codified and grounded in fundamental physics. Engineers should be able to provably verify properties of their designs. AI systems should be able to generate designs which provably meet desired specification requirements.

5) Formal models of security and cryptographic properties: Provable verification of human designs, automatic generation of verifiable designs, autoformalization of existing designs and resynthesis to eliminate flaws and security holes.

6) Demonstrate provable safety in a domain: by showing that an adversary with a specified set of capabilities is not able to push a designed system into a specified unsafe state.

7)  Build a formal model of mechanical engineering: capturing all the elements that today's engineers use in mechanical designs. While much of mathematics has been formalized, to great success, many mechanical engineering practices are not represented.

8) Provably unpickable mechanical lock: As an example, a concrete project which combines security with mechanical engineering would be to design a "provably unpickable mechanical lock". This would involve simulating designs of mechanical locks and processes for mechanically picking them. Verify human designs and automatically create AI-generated designs which provably cannot be opened by mechanical picking.

9) Composable Specifications Library: Develop a comprehensive library of safety specifications that can be composed and reused across different GS AI projects. This initiative would include

a) Cataloging and standardizing existing specifications from various domains.

b) Creating tools and methodologies for combining and adapting specifications for new use cases.

c) Implementing adversarial processes to continually generate and refine specifications, potentially leveraging automated systems (example)

d) Researching methods to provide assurance of specification completeness and coverage.

10) Guaranteed Safe by Design Development Process: At the heart of the agenda is using a world model, safety specifications, and a verifier (described in the recent paper Towards Guaranteed Safe AI) to verify the plans and actions of AIs. This is a framework and methodology for developing AI systems, and needs to be operationalized for real world systems. Examples of projects here include:

a.) defining the adversarial exploration process for generating specs

b.) creating the process for defining the ‘relevant’ world model for a domain

c.) creating processes for updating world models and safety specifications over time

d.) creating a prize and incentive system for generating the pieces of a system. This could be meta - for example, can we create the safety specs needed for operationalizing prize generation to resist adversary exploitation?

11) Provable hardware projects: Create foundational building blocks for guaranteed safe AI systems, focusing on components like tamper-evident sensors, secure memory, and verifiable computation units. An initial goal is to demonstrate that it's possible to construct hardware with mathematically provable security properties, providing a robust foundation for safe AI development.

12) Digital Twins for Guaranteed Safe AI: Create accurate digital representations of both cyber-cyber systems and cyber-physical systems following GS AI principles, and projects which could be pushed forward by non-profit or for-profit companies. Ideally these would be interoperable simulations, perhaps building on existing initiatives like NVIDIA's Omniverse. Examples include building digital twins for existing AI evaluation workflows, using digital twins for cybersecurity, and building provably secure buildings which include specs that represent a wide range of human preferences.

13) Demo Networks: Create small-scale, fully provable secure systems that serve as evidence of the feasibility of Guaranteed Safe AI approaches. These demo networks would showcase end-to-end security, from hardware to software, would provide tangible examples for stakeholders and policymakers, and could be adopted during "warning shot" scenarios to illustrate the practicality and effectiveness of GS AI principles. For example, a provably secure camera system, or a provably secure communication network.

14) Coordination and Governance: Industry coordination projects (example) that develop the protocols and standards for formally verifying software and hardware, or working on policy ideas and initiativesfor using strong hardware enforced mechanisms to verify treaties for executing constrained models.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 可证明安全的AI 保证安全的AI 形式化方法 AI技术
相关文章