Supernova: Achieving More with Less in Transformer Architectures

cs.AI updates on arXiv.org 07月22日 12:34

Supernova: Achieving More with Less in Transformer Architectures

本文介绍了一种名为Supernova的650M参数解码器，通过精心设计的架构和创新的分词技术，在保持计算效率的同时达到大型模型性能。Supernova在参数和训练数据量上均优于同类模型，挑战了传统的模型扩展范式。

arXiv:2507.15773v1 Announce Type: cross Abstract: We present Supernova, a 650M-parameter decoder-only transformer that demonstrates how careful architectural design and tokenization innovation can achieve the performance of larger models while maintaining computational efficiency. Our architecture combines Rotary Positional Embeddings (RoPE), Grouped Query Attention (GQA) with a 3:1 compression ratio, RMSNorm for computational efficiency, and SwiGLU activation functions. A critical innovation is our custom 128,000-vocabulary byte-level BPE tokenizer, which achieves state-of-the-art compression performance. Through detailed analysis, we show that Supernova achieves 90% of the performance of 1B-parameter models while using 53% fewer parameters and requiring only 100B training tokens--an order of magnitude less than competing models. Our findings challenge the prevailing scaling paradigm, demonstrating that architectural efficiency and tokenization quality can compensate for reduced parameter counts.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Supernova模型解码器性能效率平衡分词技术模型扩展

相关文章

Google DeepMind Introduces Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

《Attention is all you need》通俗解读，彻底理解版：part2

群晖这是要疯？新版DSM删除影视套件以及不再支持设备端HEVC/AVC解码

阿里自研解码器Ali266助力高通骁龙平台AI PC首次实现H.266超高清播放

Microsoft Releases GRIN MoE: A Gradient-Informed Mixture of Experts MoE Model for Efficient and Scalable Deep Learning

黄仁勋：我从不在乎市场份额，英伟达唯一目标是创造新市场

Token化一切，甚至网络！北大&谷歌&马普所提出TokenFormer，Transformer从来没有这么灵活过！

Token化一切，甚至网络，北大&谷歌&马普所提出TokenFormer，Transformer从来没有这么灵活过

谷歌AlphaQubit重磅发布，实时为量子计算机纠错，研究登上Nature

Is AI progress slowing down?