MIT Technology Review » Artificial Intelligence 03月26日
OpenAI’s new image generator aims to be practical enough for designers and advertisers
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI发布了新的图像生成器,该工具更注重可控性和实用性,旨在满足广告和平面设计等领域的需求。新模型解决了困扰AI图像生成器多年的技术问题,如对象绑定和文本生成。它能够在一个图像中生成多个图形并按正确顺序排列,生成带有清晰可读文本的图像,以及修改上传的图像。OpenAI正将该工具定位为创意专业人士的工具,并面临来自Adobe Photoshop和Canva等竞争对手的挑战。虽然该工具也可能被用于创建社交媒体帖子,但OpenAI的大量投资表明其具有更宏大的商业目标。该工具的推出也提高了其他AI公司的技术标准,加速了行业创新。

✍️ OpenAI的新图像生成器专注于实用性,旨在满足广告和平面设计等领域的需求,并已集成到GPT-4o模型中。

🧩 该模型解决了AI图像生成器长期存在的技术问题,例如对象绑定和文本生成,能够在一个图像中生成多个图形并按正确顺序排列。

🖼️ OpenAI的新图像生成器可以生成带有清晰可读文本的图像,例如配方卡片、漫画文本气泡和模拟广告。

🎯 OpenAI正在将该工具定位为创意专业人士的工具,例如平面设计师、广告公司、社交媒体经理或插画家。

💰 OpenAI计划大规模投资,这表明该图像生成器将扮演重要的商业角色,并推动其他AI公司的技术进步。

OpenAI has released a new image generator that’s designed less for typical surrealist AI art and more for highly controllable and practical creation of visuals—a sign that OpenAI thinks its tools are ready for use in fields like advertising and graphic design. 

The image generator, which is now part of the company’s GPT-4o model, was promised by OpenAI last May but wasn’t released. Requests for generated images on ChatGPT were filled by an older image generator called DALL-E. OpenAI has been tweaking the new model since then and will now release it over the coming weeks to all tiers of users starting today, replacing the older one. 

The new model makes progress on technical issues that have plagued AI image generators for years. While most have been great at creating fantastical images or realistic deepfakes, they’ve been terrible at something called binding, which refers to the ability to identify certain objects correctly and put them in their proper place (like a sign that says “hot dogs” properly placed above a food cart, not somewhere else in the image). 

It was only a few years ago that models started to succeed at things like “Put the red cube on top of the blue cube,” a feature that is essential for any creative professional use of AI. Generators also struggle with text generation, typically creating distorted jumbles of letter shapes that look more like captchas than readable text.

Example images from OpenAI show progress here. The model is able to generate 12 discrete graphics within a single image—like a cat emoji or a lightning bolt—and place them in proper order. Another shows four cocktails accompanied by recipe cards with accurate, legible text. More images show comic strips with text bubbles, mock advertisements, and instructional diagrams. The model also allows you to upload images to be modified, and it will be available in the video generator Sora as well as in GPT-4o. 

It’s “a new tool for communication,” says Gabe Goh, the lead designer on the generator at OpenAI. Kenji Hata, a researcher at OpenAI who also worked on the tool, puts it a different way: “I think the whole idea is that we’re going away from, like, beautiful art.” It can still do that, he clarifies, but it will do more useful things too. “You can actually make images work for you,” he says, “and not just just look at them.”

It’s a clear sign that OpenAI is positioning the tool to be used more by creative professionals: think graphic designers, ad agencies, social media managers, or illustrators. But in entering this domain, OpenAI has two paths, both difficult. 

One, it can target the skilled professionals who have long used programs like Adobe Photoshop, which is also investing heavily in AI tools that can fill images with generative AI. 

“Adobe really has a stranglehold on this market, and they’re moving fast enough that I don’t know how compelling it is for people to switch,” says David Raskino, the cofounder and chief technical officer of Irreverent Labs, which works on AI video generation. 

The second option is to target casual designers who have flocked to tools like Canva (which has also been investing in AI). This is an audience that may not have ever needed technically demanding software like Photoshop but would use more casual design tools to create visuals. To succeed here, OpenAI would have to lure people away from platforms built for design in hopes that the speed and quality of its own image generator would make the switch worth it (at least for part of the design process). 

It’s also possible the tool will simply be used as many image generators are now: to create quick visuals that are “good enough” to accompany social media posts. But with OpenAI planning massive investments, including participation in the $500 billion Stargate project to build new data centers at unprecedented scale, it’s hard to imagine that the image generator won’t play some ambitious moneymaking role. 

Regardless, the fact that OpenAI’s new image generator has pushed through notable technical hurdles has raised the bar for other AI companies. Clearing those hurdles likely required lots of very specific data, Raskino says, like millions of images in which text is properly displayed at lots of different angles and orientations. Now competing image generators will have to match those achievements to keep up.

“The pace of innovation should increase here,” Raskino says.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI 图像生成器 AI 设计
相关文章