Letting neural networks be weird 2024年11月27日
An exercise in frustration
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了AI图像生成工具在修订方面的局限性。作者通过多个案例,例如修改恐龙在盒子里、修改烘焙节目场景、修复壁画等,展现了AI在面对细微修改请求时,往往会生成完全不同的图像,或者导致图像质量下降,甚至出现奇怪的错误。作者认为,AI图像生成工具在修订方面存在很大挑战,用户需要适应其局限性,并认识到其潜在问题,例如版权、环境影响等,最终强调了聘用真正艺术家而非AI生成图像的重要性。

🤔 **AI图像修订困难:**作者通过多个案例(恐龙在盒子里、烘焙节目场景、修复壁画等)说明,AI图像生成工具在面对细微修改请求时,往往会生成完全不同的图像,或者导致图像质量下降,甚至出现奇怪的错误。例如,修改恐龙的姿势和位置,AI可能会生成完全不同的恐龙,或者修改烘焙场景,AI可能会导致蛋糕变得更糟糕。

🖼️ **AI图像修订导致质量下降:**作者指出,在进行多次修订后,AI生成的图像质量通常会下降,细节丢失,甚至出现不符合逻辑的元素。例如,在修订烘焙场景时,AI可能会生成更多蛋糕,或者在修订壁画时,AI可能会改变耶稣的面部特征。

⚠️ **AI图像生成存在伦理和环境问题:**作者也提到了AI图像生成模型的潜在问题,例如,这些模型的训练数据可能涉及版权问题,并且训练和运行模型会产生巨大的环境足迹。AI生成的图像也可能被用于欺诈,例如在广告、搜索结果或研究论文中。

🎨 **真正艺术家不可替代:**作者认为,尽管AI图像生成工具可以快速生成图像,但它们在修订方面的局限性以及潜在问题,使得它们难以满足高质量图像的需求。因此,对于需要高质量图像的场景,例如广告、设计和艺术创作,聘用真正艺术家仍然是最佳选择。

💸 **AI图像生成或导致行业变化:**作者指出,一些地方可能会使用AI生成图像来降低成本,但对于需要高质量图像的场景,AI图像生成工具的局限性使其难以满足需求,最终可能导致行业发生变化,例如艺术家和设计师的需求可能增加。

There's an anonymous facebook posting that's been making the rounds, in which a studio art director tried to hire AI prompters to make art, only to discover that they were completely unable to carry out minor revision requests. Asked to remove a person from a shot or fix perspective errors, the prompters would return completely different art instead, or with other weird changes that didn't fit the brief. I completely believe that trying to revise AI generated art really is that frustrating.

For example, I recently asked ChatGPT4:

Please generate an image of a tyrannosaurus inside a box. The box should be closed, and the tyrannosaurus should not be visible at all.

I got this completely reasonable response:

It sounds like you're asking for an image that implies the presence of a tyrannosaurus without actually showing it, since it's completely enclosed within a box. If the tyrannosaurus is not visible, what kind of details or themes would you like to emphasize about the box or the setting to suggest the dinosaur's presence?

I confirmed:

No tyrannosaurus visible, just some air holes for its comfort.

And ChatGPT (using DALL-E3 for the image generation) gave me this:

It's clearly not quite there - it's a weird box, and the symbols on the side are particularly confusing. Are those squid? A perching flycatcher? Rolling pins?

ChatGPT has a feature where you can erase some parts of the image and submit more text describing how it should change those images. So, I blocked out the line of symbols and asked:

Replace these symbols with a silhouette of a tyrannosaurus and the word "DANGER".

ChatGPT produced this:

Which is sort of what I asked for, but that doesn't really look like a tyrannosaurus. I erased that part of the box again and asked for more revisions:

The T-rex should be facing the word "DANGER". Its silhouette needs to have the powerful, heavy look of a T-rex, especially in the head and neck.

ChatGPT gave me:

The dinosaur is facing away but it's still not facing the word "Danger" and it's still not a very good tyrannosaurus silhouette. I asked for revisions again.

Keep the heavy dark font, but go back to the red warning text color. The t-rex still needs to face toward the words, not away from it, and its lower jaw is too heavy.

ChatGPT responds with:

Which... no. That does not count as facing toward the text, the lower jaw is still not right, and does the word still say "danger" or more like "dandier"?

The T-rex needs to be beside the text, not on top of it. And what's up with the extra spiky hump on its shoulders? Its lower jaw is still much too heavy.

ChatGPT responded with this:

My response:
... that's perfect. Now can you keep everything - textures, lighting, materials - the same but make the box appear much larger? Right now it looks like it's only knee height. It needs to be large enough to hold a T-rex.

ChatGPT's revision:

It seems like the more revisions I ask for, the worse the result becomes. If I allow it to revise the entire image, the whole process starts over.

Here's a similar revision process, this time attempting to revise my way toward prettier pies.

Me: Please generate an image of a giraffe entering the great british bakeoff.

Revision 1: I erased the counter and asked for the giraffe to be making a cherry pie with intricate latticework.

Below is revision 5, by which time it becomes evident that the more I ask for intricate latticework, the sloppier it gets, the more I ask for a deep dish pie, the thinner the pie gets, and the more I beg for a single pie, the more pies I get.

And when I don't restrict my changes to a specific part of the image, I get much better results but also a huge reset. For the image below I asked ChatGPT/DALLE3 to keep each detail exactly the same but to make it so that we can see it's raining outside the tent. Not only did it completely change the image, but it's not even raining.

My prompt: "Perfect. Please keep the image exactly as it is, except make it so that we can see that it is raining hard outside the tent."

In this image I asked for a snake making a painting of a bear, then erased the painting and asked for the bear to be standing on all fours. The revision is much worse.

Here's another example, in which I asked for a deer in a grocery store, and then asked for the deer to be a fawn instead. By revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover), both the quality of the deer and the grocery store background have deteriorated.

Here's a one-shot, in which I asked ChatGPT/DALL-E3 to generate a damaged fresco of Jesus, and then restore the fresco to as good as new.

So, every time AI is asked to revise an image, it either starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system - or go in with a mentality that anything that fits the brief is good enough.

There's also the fact that the image-generating models directly compete with artists whose work was used to train these models without permission or compensation. And the fact that training and running the models has a large environmental footprint. AI-generated imagery has become a tip-off that an advertisement, a search result, or a research paper is a scam.

I'm not surprised that there are some places looking for cheap filler images that don't mind the problems with AI-generated imagery. But for everyone else I think it's quickly becoming clear that you need a real artist, not a knockoff.

Bonus content: a unicorn goes hilariously bad

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI绘图 图像生成 AI修订 艺术家 伦理问题
相关文章