MarkTechPost@AI 03月08日
Microsoft AI Introduces Belief State Transformer (BST): Enhancing Goal-Conditioned Sequence Modeling with Bidirectional Context
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

微软研究院等机构推出了信念状态Transformer (BST),通过考虑前缀和后缀上下文来增强下一个token的预测。与标准Transformer不同,BST双向编码信息,预测前缀后的下一个token和后缀之前的上一个token。这种方法提高了在诸如目标条件文本生成和星形图等结构化预测问题上的性能。通过学习紧凑的信念状态,BST在序列建模中优于传统方法,提供更有效的推理和更强的文本表示,对大规模应用具有广阔的前景。实验表明,BST在故事写作中优于Fill-in-the-Middle方法,并在无条件文本生成方面表现出优势。

🧠**双向编码,突破传统单向限制**: BST模型通过前向和后向编码器,同时考虑前缀和后缀,预测下一个和上一个token,从而避免模型采用捷径策略,并改善长期依赖学习。

⭐**星形图导航,性能卓越**: 在星形图导航任务中,BST的性能显著优于仅使用前向Transformer的模型。消融实验证实了信念状态目标和后向编码器对于性能至关重要。

✍️**紧凑信念状态,提升故事连贯性**: BST通过联合建模前缀和后缀,学习紧凑的信念状态表示,从而实现目标条件文本生成。在TinyStories上的实验表明,BST比Fill-in-the-Middle模型产生更连贯和结构化的叙述。GPT-4的评估也显示了BST在故事讲述方面的优势,前缀、生成文本和后缀之间的联系更清晰。

Transformer models have transformed language modeling by enabling large-scale text generation with emergent properties. However, they struggle with tasks that require extensive planning. Researchers have explored modifications in architecture, objectives, and algorithms to improve their ability to achieve goals. Some approaches move beyond traditional left-to-right sequence modeling by incorporating bidirectional context, as seen in models trained on past and future information. Others attempt to optimize the generation order, such as latent-variable modeling or binary tree-based decoding, though left-to-right autoregressive methods often remain superior. A more recent approach involves jointly training a transformer for forward and backward decoding, enhancing the model’s ability to maintain compact belief states.

Further research has explored predicting multiple tokens simultaneously to improve efficiency. Some models have been designed to generate more than one token at a time, leading to faster and more robust text generation. Pretraining on multi-token prediction has been shown to enhance large-scale performance. Another key insight is that transformers encode belief states non-compactly within their residual stream. In contrast, state-space models offer more compact representations but come with trade-offs. For instance, certain training frameworks struggle with specific graph structures, revealing limitations in existing methodologies. These findings highlight ongoing efforts to refine transformer architectures for better structured and efficient sequence modeling.

Researchers from Microsoft Research, the University of Pennsylvania, UT Austin, and the University of Alberta introduced the Belief State Transformer (BST). This model enhances next-token prediction by considering both prefix and suffix contexts. Unlike standard transformers, BST encodes information bidirectionally, predicting the next token after the prefix and the previous token before the suffix. This approach improves performance on challenging tasks, such as goal-conditioned text generation and structured prediction problems like star graphs. By learning a compact belief state, BST outperforms conventional methods in sequence modeling, offering more efficient inference and stronger text representations, with promising implications for large-scale applications.

Unlike traditional next-token prediction models, the BST is designed to enhance sequence modeling by integrating both forward and backward encoders. It utilizes a forward encoder for prefixes and a backward encoder for suffixes, predicting the next and previous tokens. This approach prevents models from adopting shortcut strategies and improves long-term dependency learning. BST outperforms baselines in star graph navigation, where forward-only Transformers struggle. Ablations confirm that the belief state objective and backward encoder are essential for performance. During inference, BST omits the backward encoder, maintaining efficiency while ensuring goal-conditioned behavior.

Unlike forward-only and multi-token models, the BST effectively constructs a compact belief state. A belief state encodes all necessary information for future predictions. The BST learns such representations by jointly modeling prefixes and suffixes, enabling goal-conditioned text generation. Experiments using TinyStories show BST outperforms the Fill-in-the-Middle (FIM) model, producing more coherent and structured narratives. Evaluation with GPT-4 reveals BST’s superior storytelling ability, with clearer connections between prefix, generated text, and suffix. Additionally, BST excels in unconditional text generation by selecting sequences with high-likelihood endings, demonstrating its advantages over traditional next-token predictors.

In conclusion, the BST improves goal-conditioned next-token prediction by addressing the limitations of traditional forward-only models. It constructs a compact belief state, encoding all necessary information for future predictions. Unlike conventional transformers, BST predicts the next token for a prefix and the previous token for a suffix, making it more effective in complex tasks. Empirical results demonstrate its advantages in story writing, outperforming the Fill-in-the-Middle approach. While our experiments validate its performance on small-scale tasks, further research is needed to explore its scalability and applicability to broader goal-conditioned problems, enhancing efficiency and inference quality.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post Microsoft AI Introduces Belief State Transformer (BST): Enhancing Goal-Conditioned Sequence Modeling with Bidirectional Context appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

信念状态Transformer 双向编码 序列建模 目标条件文本生成
相关文章