少点错误 03月30日
Memory Persistence within Conversation Threads with Multimodal LLMS
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了大型语言模型(LLMs)在处理模糊图像时的视觉识别能力,以及它们对先前信息的依赖程度。通过实验,作者发现GPT-4o在识别模糊的汽车图像时表现出色,但在新的对话线程中,识别准确率下降,这与记忆持久性有关。相比之下,Claude 3.7在不同线程中表现出更稳定和谨慎的反应。研究结果表明,LLMs在处理图像时,会受到先前上下文的影响,即使被指示忽略,这使得模型的控制变得更加困难。

👁️‍🗨️ 作者对30张模糊处理过的汽车图像进行测试,以探究LLMs的视觉识别能力,图像中汽车位于不同区域。

🤔 GPT-4o在旧的对话线程中能够准确识别模糊图像中的汽车,但在新的线程中,识别能力明显下降,甚至将楼梯误认为是甜点。

💡 实验结果表明,GPT-4o的识别能力受到先前上下文的影响,即使被明确指示忽略,模型仍然会受到先前信息的影响。

🧐 Claude 3.7在不同线程中表现出更稳定和谨慎的反应,其答案受先前信息的影响较小。

🧠 作者认为,LLMs在处理指令时,如“忽略先前上下文”,这些指令仅影响概率,而非硬性规则,无法完全覆盖先前激活对模型知识的影响。

Published on March 30, 2025 7:16 AM GMT

In neuroscience, we learned about foveated vision — our eyes focus sharply at the center and blur the edges to conserve energy.

I was curious how LLMs handle that kind of input, so I wrote a quick script to apply a foveated blur to 30 CAPTCHA images of cars, with the car in different regions each time.

At first, I asked GPT-4o:

“Do you see a car in this?”

But that felt too leading. So I switched to:

“What do you see in this image?”

No matter how intense the blur, GPT-4o consistently identified the car — even the blur made the image indescribable.

But in a fresh thread? It struggled. At one point, it even called a staircase a dessert. While in older conversation thread, it correctly identified it was a staircase with a car.

Turns out my original hypothesis — that the blur would impair recognition — wasn’t quite right. Instead, the limiting factor was something more subtle: memory persistence.

I suspected the model might be relying on a previously mentioned keyword (“car”), so I asked if it was using prior context. It said no. But after pointing out the inconsistency in performance across threads, it admitted that earlier context had helped. Not maliciously — it just didn’t seem fully aware of how much prior memory shaped its output.

Claude 3.7 was different. It gave more cautious and consistent responses across threads, even when I primed it with the word “car.” Its answers weren’t influenced the same way.

It reminded me of a LessWrong post from exactly two years ago that questioned why LLMs don’t have access to long-term memory beyond their immediate context windows. Now, they seem to do exactly that — but at a cost. By heavily utilizing this long-term memory, even when instructed not to, model controllability is more difficult. https://www.lesswrong.com/posts/zoiiYreQZSs4mppfY/why-no-major-llms-with-memory

I think it's because, for these models, instructions like “ignore previous context” are just more tokens. The instructions affect probabilities but aren't hard rules that override the prior activations that have already shaped their knowledge.

Curious if others have noticed something similar, especially with multimodal models. Are you seeing this kind of implicit context carryover too?



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 视觉识别 记忆 GPT-4o Claude 3.7
相关文章