TechCrunch News 2024年10月22日
Stability claims its newest Stable Diffusion models generate more ‘diverse’ images
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AI 初创公司 Stability AI 宣布推出新的图像生成模型系列,包括更具可定制性和多功能性的 Stable Diffusion 3.5 系列。该系列有三个模型,各有特点。同时,文章还涉及模型的发布时间、性能、应用场景、版权问题、安全措施等方面的内容。

🎨Stable Diffusion 3.5 系列包括三个模型:Stable Diffusion 3.5 Large 拥有 80 亿参数,是最强大的模型,能生成高达 1 兆像素分辨率的图像;Stable Diffusion 3.5 Large Turbo 是前者的精简版,生成图像更快,但质量有所牺牲;Stable Diffusion 3.5 Medium 优化用于边缘设备,能生成 0.25 至 2 兆像素分辨率的图像,10 月 29 日发布。

💡Stability 称 Stable Diffusion 3.5 模型应能生成更多样化的输出,在训练中为每个图像标注多种提示,优先考虑简短提示,以确保对任何给定文本描述有更广泛和多样的图像概念分布。

📄Stable Diffusion 3.5 系列模型的使用许可规定,非商业用途包括研究可免费使用,年营收低于 100 万美元的企业也可免费商业化,年营收超过 100 万美元的组织需与 Stability 签订企业许可证。

🛡️Stability 的模型训练使用公共网络数据,可能存在版权问题,公司让客户自行应对版权索赔,但允许数据所有者请求将其数据从训练数据集中删除。

Following a string of controversies stemming from technical hiccups and licensing changes, AI startup Stability AI has announced its latest family of image generation models.

The new Stable Diffusion 3.5 series is more customizable and versatile than Stability’s previous-generation tech, the company claims — as well as more performant. There are three models in total:

  • Stable Diffusion 3.5 Large: With 8 billion parameters, it’s the most powerful model, capable of generating images at resolutions up to 1 megapixel. (Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer.)
  • Stable Diffusion 3.5 Large Turbo: A distilled version of Stable Diffusion 3.5 Large that generates images more quickly, at the cost of some quality.
  • Stable Diffusion 3.5 Medium: A model optimized to run on edge devices like smartphones and laptops, capable of generating images ranging from 0.25 to 2 megapixel resolutions.

While Stable Diffusion 3.5 Large and 3.5 Large Turbo are available today, 3.5 Medium won’t be released until October 29.

Stability says that the Stable Diffusion 3.5 models should generate more “diverse” outputs — that is to say, images depicting people with different skin tones and features — without the need for “extensive” prompting.

“During training, each image is captioned with multiple versions of prompts, with shorter prompts prioritized,” Hanno Basse, Stability’s chief technology officer, told TechCrunch in an interview. “This ensures a broader and more diverse distribution of image concepts for any given text description. Like most generative AI companies, we train on a wide variety of data, including filtered publicly available datasets and synthetic data.”

Some companies have cludgily built these sorts of “diversifying” features into image generators in the past, prompting outcries on social media. An older version of Google’s Gemini chatbot, for example, would show an anachronistic group of figures for historical prompts such as “a Roman legion” or “U.S. senators.” Google was forced to pause image generation of people for nearly six months while it developed a fix.

With any luck, Stability’s approach will be more thoughtful than others. We can’t give impressions, unfortunately, as Stability didn’t provide early access.

Image Credits:Stability AI

Stability’s previous flagship image generator, Stable Diffusion 3 Medium, was roundly criticized for its peculiar artifacts and poor adherence to prompts. The company warns that Stable Diffusion 3.5 models might suffer from similar prompting errors; it blames engineering and architectural trade-offs. But Stability also asserts the models are more robust than their predecessors in generating images across a range of different styles, including 3D art.

“Greater variation in outputs from the same prompt with different seeds may occur, which is intentional as it helps preserve a broader knowledge-base and diverse styles in the base models,” Stability wrote in a blog post shared with TechCrunch. “However, as a result, prompts lacking specificity might lead to increased uncertainty in the output, and the aesthetic level may vary.”

Image Credits:Stability AI

One thing that hasn’t changed with the new models is Stability’s licenses.

As with previous Stability models, models in the Stable Diffusion 3.5 series are free to use for “non-commercial” purposes, including research. Businesses with less than $1 million in annual revenue can also commercialize them at no cost. Organizations with more than $1 million in revenue, however, have to contract with Stability for an enterprise license.

Stability caused a stir this summer over its restrictive fine-tuning terms, which gave (or at least appeared to give) the company the right to extract fees for models trained on images from its image generators. In response to the blowback, the company adjusted its terms to allow for more liberal commercial use. Stability reaffirmed today that users own the media they generate with Stability models.

“We encourage creators to distribute and monetize their work across the entire pipeline,” Ana Guillèn, VP of marketing and communications at Stability, said in an emailed statement, “as long as they provide a copy of our community license to the users of those creations and prominently display ‘Powered by Stability AI’ on related websites, user interfaces, blog posts, About pages, or product documentation.”

Stable Diffusion 3.5 Large and Diffusion 3.5 Large Turbo can be self-hosted or used via Stability’s API and third-party platforms including Hugging Face, Fireworks, Replicate, and ComfyUI. Stability says that it plans to release the ControlNets for the models, which allow for fine-tuning, in the next few days.

Atability’s models, like most AI models, are trained on public web data — some of which may be copyrighted or under a restrictive license. Stability and many other AI vendors argue that the fair-use doctrine shields them from copyright claims. But that hasn’t stopped data owners from filing a growing number of class-action lawsuits.

Image Credits:Stability AI

Stability leaves it to customers to defend themselves against copyright claims, and, unlike some other vendors, has no payout carve-out in the event that it’s found liable.

Stability does allow data owners to request that their data be removed from its training datasets, however. As of March 2023, artists had removed 80 million images from Stable Diffusion’s training data, according to the company.

Asked about safety measures around misinformation in light of the upcoming U.S. general elections, Stability said that it “has taken — and continues to take — reasonable steps to prevent the misuse of Stable Diffusion by bad actors.” The startup declined to give specific technical details about those steps, however.

As of March, Stability only prohibited explicitly “misleading” content created using its generative AI tools — not content that could influence elections, hurt election integrity, or that features politicians and public figures.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Stability AI 图像生成模型 版权问题 使用许可
相关文章