少点错误 2024年12月23日
Checking in on Scott's composition image bet with imagen 3
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Scott Alexander曾打赌到2025年6月,图像生成应能解决组合性问题。以5个提示为标准,至少答对3个。谷歌的Imagen 3在4个提示上表现较好,文章对其结果进行了分析。

🎨Scott Alexander打赌图像生成解决组合性问题

🎉谷歌Imagen 3在4个提示上有较好表现

😕Imagen在一个提示上出现多处构图错误

🤔作者好奇OpenAI的视频生成AI对特定提示的处理

Published on December 22, 2024 7:04 PM GMT

2.5 years ago Scott Alexander made a bet that by June of 2025, image gen should have more or less solved compositionality, operationalized through 5 prompts, must get at least 3 correct. There was a premature declaration of victory, but if the bet was settled I hadn't heard about it. 

It's time. Google's Imagen 3 gets 4/5. The bet specifies 10 shots per prompt, but I'm just going to put the four it generates since that's plenty.

1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth

This is the only one that Imagen doesn't get. It makes multiple mistakes in the composition. It's a bit ironic that this is the one it missed given that the whole genesis of the bet was about designing stained glass.

2. An oil painting of a man in a factory looking at a cat wearing a top hat

Purrfect. I wonder what filter tripped to block that fourth one, this seems like a pretty innocuous prompt to me.

3. A digital art picture of a child riding a llama with a bell on its tail through a desert

3 out of 4 ain't bad. Also I like how well it handles shadows.

4. A 3D render of an astronaut in space holding a fox wearing lipstick

3d renders are so good now I'm not sure how the 4th image would be different if it were photorealistic. 

5. Pixel art of a farmer in a cathedral holding a red basketball

Again with the filter, but otherwise perfect.

Edwin Chen at Surge seems to be the official judge, and he's a very strict grader, so maybe there's some risk the basketball isn't red enough of whatever. But this all seems fairly convincing to me.

Addendum: I was curious if Sora, OpenAI's video gen AI, could handle the raven/key stained glass prompt. Answer: nope, but at least it tried!



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

图像生成 Scott Alexander Imagen 3 组合性问题
相关文章