MarkTechPost@AI 2024年09月12日
Understanding the Hidden Layers in Large Language Models LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

希伯来大学研究人员探讨大语言模型信息流动及各层作用,通过实验发现高层对前序token隐藏状态依赖较小,为模型设计提供优化方向。

🧐研究人员关注大语言模型中信息如何在不同层流动,尤其是解码器型LLMs。他们提出并非所有层都同等依赖前序token的隐藏状态,特别是高层。

💡研究团队假设低层注重聚合前序token信息,高层对此依赖较少。他们在模型不同层的前序token隐藏状态中进行多种操作,包括用随机向量替换、在特定层冻结及交换不同提示的隐藏状态等,并在四个开源LLMs和四个任务上进行实验。

🎉实验发现对模型顶部30 - 50%进行操作时,多项任务性能几乎无下降,表明高层对前序token隐藏表示依赖较小。例如冻结高达50%的层时,模型性能仍与基线相似。此外,交换不同提示的隐藏状态进一步证实了这一观察结果。

📄研究揭示了基于Transformer的LLMs的两阶段过程,早期层收集信息,高层主要内部处理信息。这表明高层对前序token详细表示的依赖较少,为如跳过这些层的注意力以降低计算成本等优化提供了可能。

Hebrew University Researchers addressed the challenge of understanding how information flows through different layers of decoder-based large language models (LLMs). Specifically, it investigates whether the hidden states of previous tokens in higher layers are as crucial as believed. Current LLMs, such as transformer-based models, use the attention mechanism to process tokens by attending to all previous tokens in every layer. While each transformer layer applies this attention uniformly, prior research indicates that different layers capture different types of information. The study builds on the idea that not all layers may equally rely on the hidden states of previous tokens, especially in higher layers.

The research team hypothesized that while lower layers focus on aggregating information from previous tokens, higher layers may rely less on this information. They propose various manipulations in the hidden states of previous tokens in different layers of the model. These include replacing hidden states with random vectors, freezing hidden states at specific layers, and swapping the hidden states of one token with another from a different prompt. They conduct experiments on four open-source LLMs (Llama2-7B, Mistral-7B, Yi-6B, and Llemma-7B) and four tasks, including question answering and summarization, to evaluate the impact of these manipulations on model performance.

One technique involves introducing noise by replacing hidden states with random vectors, which allows researchers to evaluate whether the content of these hidden states still matters at certain layers. The second method, freezing, locks the hidden states at a particular layer and reuses them for the subsequent layers, reducing the computational load.

The researchers found that when these manipulations were applied to the top 30-50% of the model, performance across multiple tasks experienced little to no drop, suggesting that the top layers rely less on the hidden representations of previous tokens. For example, when freezing up to 50% of the layers, the models retained performance similar to that of the baseline. Additionally, swapping hidden states from different prompts further confirmed this observation; the model ignored changes made in the top layers, while changes in lower layers significantly altered the output. The experiments were conducted to understand whether attention was needed in the higher layers of the model by skipping the attention block in those layers. This test demonstrated that skipping attention in the upper layers had minimal impact on tasks like summarization and question answering, while doing so in lower layers led to severe performance degradation.

In conclusion, the study reveals a two-phase process in transformer-based LLMs: the early layers gather information from previous tokens, while the higher layers primarily process that information internally. The findings suggest that higher layers are less dependent on the detailed representation of previous tokens, offering potential optimizations, such as skipping attention in these layers to reduce computational costs. Overall, the paper dives deep into the hierarchical nature of information processing in LLMs and leads to more informed and efficient model designs.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

The post Understanding the Hidden Layers in Large Language Models LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大语言模型 隐藏层 信息流动 模型优化
相关文章