‘Open’ model licenses often carry concerning restrictions

This week, Google released a family of open AI models, Gemma 3, that quickly garnered praise for their impressive efficiency. But as a number of developers lamented on X, Gemma 3’s license makes commercial use of the models a risky proposition.

It’s not a problem unique to Gemma 3. Companies like Meta also apply custom, non-standard licensing terms to their openly available models, and the terms present legal challenges for companies. Some firms, especially smaller operations, worry that Google and others could “pull the rug” on their business by asserting the more onerous clauses.

“The restrictive and inconsistent licensing of so-called ‘open’ AI models is creating significant uncertainty, particularly for commercial adoption,” Nick Vidal, head of community at the Open Source Initiative, a long-running institution aiming to define and “steward” all things open source, told TechCrunch. “While these models are marketed as open, the actual terms impose various legal and practical hurdles that deter businesses from integrating them into their products or services.”

Open model developers have their reasons for releasing models under proprietary licenses as opposed to industry-standard options like Apache and MIT. AI startup Cohere, for example, has been clear about its intent to support scientific — but not commercial — work on top of its models.

But Gemma and Meta’s Llama licenses in particular have restrictions that limit the ways companies can use the models without fear of legal reprisal.

Meta, for instance, prohibits developers from using the “output or results” of Llama 3 models to improve any model besides Llama 3 or “derivative works.” It also prevents companies with over 700 million monthly active users from deploying Llama models without first obtaining a special, additional license.

Gemma’s license is generally less burdensome. But it does grant Google the right to “restrict (remotely or otherwise) usage” of Gemma that Google believes is in violation of the company’s prohibited use policy or “applicable laws and regulations.”

These terms don’t just apply to the original Llama and Gemma models. Models based on Llama or Gemma must also adhere to the Llama and Gemma licenses, respectively. In Gemma’s case, that includes models trained on synthetic data generated by Gemma.

Florian Brand, a research assistant at the German Research Center for Artificial Intelligence, believes that — despite what tech giant execs would have you believe — licenses like Gemma and Llama’s “cannot reasonably be called ‘open source.’”

“Most companies have a set of approved licenses, such as Apache 2.0, so any custom license is a lot of trouble and money,” Brand told TechCrunch. “Small companies without legal teams or money for lawyers will stick to models with standard licenses.”

Brand noted that AI model developers with custom licenses, like Google, haven’t aggressively enforced their terms yet. However, the threat is often enough to deter adoption, he added.

“These restrictions have an impact on the AI ecosystem — even on AI researchers like me,” said Brand.

Han-Chung Lee, director of machine learning at Moody’s, agrees that custom licenses such as those attached to Gemma and Llama make the models “not usable” in many commercial scenarios. So does Eric Tramel, a staff applied scientist at AI startup Gretel.

“Model-specific licenses make specific carve-outs for model derivatives and distillation, which causes concern about clawbacks,” Tramel said. “Imagine a business that is specifically producing model fine-tunes for their customers. What license should a Gemma-data fine-tune of Llama have? What would the impact be for all of their downstream customers?”

The scenario that deployers most fear, Tramel said, is that the models are a trojan horse of sorts.

“A model foundry can put out [open] models, wait to see what business cases develop using those models, and then strong-arm their way into successful verticals by either extortion or lawfare,” he said. “For example, Gemma 3, by all appearances, seems like a solid release — and one that could have a broad impact. But the market can’t adopt it because of its license structure. So, businesses will likely stick with perhaps weaker and less reliable Apache 2.0 models.”

To be clear, certain models have achieved widespread distribution in spite of their restrictive licenses. Llama, for example, has been downloaded hundreds of millions of times and built into products from major corporations, including Spotify.

But they could be even more successful if they were permissively licensed, according to Yacine Jernite, head of machine learning and society at AI startup Hugging Face. Jernite called on providers like Google to move to open license frameworks and “collaborate more directly” with users on broadly accepted terms.

“Given the lack of consensus on these terms and the fact that many of the underlying assumptions haven’t yet been tested in courts, it all serves primarily as a declaration of intent from those actors,” Jernite said. “[But if certain clauses] are interpreted too broadly, a lot of good work will find itself on uncertain legal ground, which is particularly scary for organizations building successful commercial products.”

Vidal said that there’s an urgent need for AI models companies can freely integrate, modify, and share without fearing sudden license changes or legal ambiguity.

“The current landscape of AI model licensing is riddled with confusion, restrictive terms, and misleading claims of openness,” Vidal said. “Instead of redefining ‘open’ to suit corporate interests, the AI industry should align with established open source principles to create a truly open ecosystem.”

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签