These protocols will help AI agents navigate our messy lives

A growing number of companies are launching AI agents that can do things on your behalf—actions like sending an email, making a document, or editing a database. Initial reviews for these agents have been mixed at best, though, because they struggle to interact with all the different components of our digital lives.

Part of the problem is that we are still building the necessary infrastructure to help agents navigate the world. If we want agents to complete tasks for us, we need to give them the necessary tools while also making sure they use that power responsibly.

Anthropic and Google are among the companies and groups working to do those. Over the past year, they have both introduced protocols that try to define how AI agents should interact with each other and the world around them. These protocols could make it easier for agents to control other programs like email clients and note-taking apps.

The reason has to do with application programming interfaces, the connections between computers or programs that govern much of our online world. APIs currently reply to “pings” with standardized information. But AI models aren’t made to work exactly the same every time. The very randomness that helps them come across as conversational and expressive also makes it difficult for them to both call an API and understand the response.

“Models speak a natural language,” says Theo Chu, a project manager at Anthropic. “For [a model] to get context and do something with that context, there is a translation layer that has to happen for it to make sense to the model.” Chu works on one such translation technique, the Model Context Protocol (MCP), which Anthropic introduced at the end of last year.

MCP attempts to standardize how AI agents interact with the world via various programs, and it’s already very popular. One web aggregator for MCP servers (essentially, the portals for different programs or tools that agents can access) lists over 15,000 servers already.

Working out how to govern how AI agents interact with each other is arguably an even steeper challenge, and it’s one the Agent2Agent protocol (A2A), introduced by Google in April, tries to take on. Whereas MCP translates requests between words and code, A2A tries to moderate exchanges between agents, which is an “essential next step for the industry to move beyond single-purpose agents,” Rao Surapaneni, who works with A2A at Google Cloud, wrote in an email to MIT Technology Review.

Google says 150 companies have already partnered with it to develop and adopt A2A, including Adobe and Salesforce. At a high level, both MCP and A2A tell an AI agent what it absolutely needs to do, what it should do, and what it should not do to ensure a safe interaction with other services. In a way, they are complementary—each agent in an A2A interaction could individually be using MCP to fetch information the other asks for.

However, Chu stresses that it is “definitely still early days” for MCP, and the A2A road map lists plenty of tasks still to be done. We’ve identified the three main areas of growth for MCP, A2A, and other agent protocols: security, openness, and efficiency.

What should these protocols say about security?

Researchers and developers still don’t really understand how AI models work, and new vulnerabilities are being discovered all the time. For chatbot-style AI applications, malicious attacks can cause models to do all sorts of bad things, including regurgitating training data and spouting slurs. But for AI agents, which interact with the world on someone’s behalf, the possibilities are far riskier.

For example, one AI agent, made to read and send emails for someone, has already been shown to be vulnerable to what’s known as an indirect prompt injection attack. Essentially, an email could be written in a way that hijacks the AI model and causes it to malfunction. Then, if that agent has access to the user’s files, it could be instructed to send private documents to the attacker.

Some researchers believe that protocols like MCP should prevent agents from carrying out harmful actions like this. However, it does not at the moment. “Basically, it does not have any security design,” says Zhaorun Chen, a University of Chicago PhD student who works on AI agent security and uses MCP servers.

Bruce Schneier, a security researcher and activist, is skeptical that protocols like MCP will be able to do much to reduce the inherent risks that come with AI and is concerned that giving such technology more power will just give it more ability to cause harm in the real, physical world. “We just don’t have good answers on how to secure this stuff,” says Schneier. “It’s going to be a security cesspool really fast.”

Others are more hopeful. Security design could be added to MCP and A2A similar to the way it is for internet protocols like HTTPS (though the nature of attacks on AI systems is very different). And Chen and Anthropic believe that standardizing protocols like MCP and A2A can help make it easier to catch and resolve security issues even as is. Chen uses MCP in his research to test the roles different programs can play in attacks to better understand vulnerabilities. Chu at Anthropic believes that these tools could let cybersecurity companies more easily deal with attacks against agents, because it will be easier to unpack who sent what.

How open should these protocols be?

Although MCP and A2A are two of the most popular agent protocols available today, there are plenty of others in the works. Large companies like Cisco and IBM are working on their own protocols, and other groups have put forth different designs like Agora, designed by researchers at the University of Oxford, which upgrades an agent-service communication from human language to structured data in real time.

Many developers hope there could eventually be a registry of safe, trusted systems to navigate the proliferation of agents and tools. Others, including Chen, want users to be able to rate different services in something like a Yelp for AI agent tools. Some more niche protocols have even built blockchains on top of MCP and A2A so that servers can show they are not just spam.

Both MCP and A2A are open-source, which is common for would-be standards as it lets others work on building them. This can help protocols develop faster and more transparently.

“If we go build something together, we spend less time overall, because we’re not having to each reinvent the wheel,” says David Nalley, who leads developer experience at Amazon Web Services and works with a lot of open-source systems, including A2A and MCP.

Nalley oversaw Google’s donation of A2A to the Linux Foundation, a nonprofit organization that guides open-source projects, back in June. With the foundation’s stewardship, the developers who work on A2A (including employees at Google and many others) all get a say in how it should evolve. MCP, on the other hand, is owned by Anthropic and licensed for free. That is a sticking point for some open-source advocates, who want others to have a say in how the code base itself is developed.

“There’s admittedly some increased risk around a single person or a single entity being in absolute control,” says Nalley. He says most people would prefer multiple groups to have a “seat at the table” to make sure that these protocols are serving everyone’s best interests.

However, Nalley does believe Anthropic is acting in good faith—its license, he says, is incredibly permissive, allowing other groups to create their own modified versions of the code (a process known as “forking”).

“Someone could fork it if they needed to, if something went completely off the rails,” says Nalley. IBM’s Agent Communication Protocol was actually spun off of MCP.

Anthropic is still deciding exactly how to develop MCP. For now, it works with a steering committee of outside companies that help make decisions on MCP’s development, but Anthropic seems open to changing this approach. “We are looking to evolve how we think about both ownership and governance in the future,” says Chu.

Is natural language fast enough?

MCP and A2A work on the agents’ terms—they use words and phrases (termed natural language in AI), just as AI models do when they are responding to a person. This is part of the selling point for these protocols, because it means the model doesn’t have to be trained to talk in a way that is unnatural to it. “Allowing a natural-language interface to be used between agents and not just with humans unlocks sharing the intelligence that is built into these agents,” says Surapaneni.

But this choice does come with drawbacks. Natural-language interfaces lack the precision of APIs, and that could result in incorrect responses. And it creates inefficiencies.

Usually, an AI model reads and responds to text by splitting words into tokens. The AI model will read a prompt, split it into input tokens, generate a response in the form of output tokens, and then put these tokens into words to send back. These tokens define in some sense how much work the AI model has to do—that’s why most AI platforms charge users according to the number of tokens used.

But the whole point of working in tokens is so that people can understand the output—it’s usually faster and more efficient for machine-to-machine communication to just work over code. MCP and A2A both work in natural language, so they require the model to spend tokens as the agent talks to other machines, like tools and other agents. The user never even sees these exchanges—all the effort of making everything human-readable doesn’t ever get read by a human. “You waste a lot of tokens if you want to use MCP,” says Chen.

Chen describes this process as potentially very costly. For example, suppose the user wants the agent to read a document and summarize it. If the agent uses another program to summarize here, it needs to read the document, write the document to the program, read back the summary, and write it back to the user. Since the agent needed to read and write everything, both the document and the summary get doubled up. According to Chen, “It’s actually a lot of tokens.”

As with so many aspects of MCP and A2A’s designs, their benefits also create new challenges. “There’s a long way to go if we want to scale up and actually make them useful,” says Chen.

What should these protocols say about security?

How open should these protocols be?

Is natural language fast enough?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签