MarkTechPost@AI 2024年09月29日
AMD Releases AMD-135M: AMD’s First Small Language Model Series Trained from Scratch on AMD Instinct™ MI250 Accelerators Utilizing 670B Tokens 
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AMD发布新语言模型AMD-135M,基于LLaMA2架构,在AI模型领域具重要意义,优化用于AMD最新GPU,具多种优势。

🌐AMD-135M基于LLaMA2模型架构,拥有13500万参数,结构强大,且针对AMD最新GPU进行了性能优化,适用于多种应用,特别是文本生成和语言理解。

📚该模型与Hugging Face Transformers库无缝集成,可处理复杂任务,采用多种先进技术,如Swiglu函数、RMSNorm层归一化、RoPE方法的位置嵌入等。

✨AMD-135M具有诸多显著特征,如参数规模、层数与注意力头数、隐藏大小、注意力类型、上下文窗口大小等,还利用多个数据集进行预训练和微调。

🛠️该模型可通过Hugging Face Transformers库轻松部署和使用,兼容多种解码方式,在多种NLP基准测试中表现出色,具有较高竞争力。

AMD has recently introduced its new language model, AMD-135M or AMD-Llama-135M, which is a significant addition to the landscape of AI models. Based on the LLaMA2 model architecture, this language model boasts a robust structure with 135 million parameters and is optimized for performance on AMD’s latest GPUs, specifically the MI250. This release marks a crucial milestone for AMD in its endeavor to establish a strong foothold in the competitive AI industry.

Background and Technical Specifications

The AMD-135M is built on the LLaMA2 model architecture and is integrated with advanced features to support various applications, particularly in text generation and language comprehension. The model is designed to work seamlessly with the Hugging Face Transformers library, making it accessible for developers and researchers. The model can handle complex tasks with a hidden size of 768, 12 layers (blocks), and 12 attention heads while maintaining high efficiency. The activation function used is the Swiglu function, and the layer normalization is based on RMSNorm. Its positional embedding is designed using the RoPE method, enhancing its ability to understand and generate contextual information accurately.

The release of this model is not just about the hardware specifications but also about the software and datasets that power it. AMD-135M has been pretrained on two key datasets: the SlimPajama and Project Gutenberg datasets. SlimPajama is a deduplicated version of RedPajama, which includes sources such as Commoncrawl, C4, GitHub, Books, ArXiv, Wikipedia, and StackExchange. The Project Gutenberg dataset provides access to a vast repository of classical texts, enabling the model to grasp various language structures and vocabularies.

Key Features of AMD-135M

AMD-135M has remarkable features that set it apart from other models in the market. Some of these key features include:

Deployment and Usage

The AMD-135M can be easily deployed and used through the Hugging Face Transformers library. For deployment, users can load the model using the LlamaForCausalLM and the AutoTokenizer modules. This ease of integration makes it a favorable option for developers looking to incorporate language modeling capabilities into their applications. Additionally, the model is compatible with speculative decoding for AMD’s CodeLlama, further extending its usability for code generation tasks. This feature makes AMD-135M particularly useful for developers working on programming-related text generation or other NLP applications.

Performance Evaluation

The performance of AMD-135M has been evaluated using the lm-evaluation-harness on various NLP benchmarks, such as SciQ, WinoGrande, and PIQA. The results indicate the model is highly competitive, offering comparable performance to other models in its parameter range. For instance, it achieved a pass rate of approximately 32.31% on the Humaneval dataset using MI250 GPUs, a strong performance indicator for a model of this size. This shows that AMD-135M can be a reliable model for research and commercial applications in natural language processing.

In conclusion, the release of AMD-135M underscores AMD’s commitment to advancing AI technologies and providing accessible, high-performance models for the research community. Its robust architecture and advanced training techniques position AMD-135M as a formidable competitor in the rapidly evolving landscape of AI models.


Check out the Model on Hugging Face and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

The post AMD Releases AMD-135M: AMD’s First Small Language Model Series Trained from Scratch on AMD Instinct™ MI250 Accelerators Utilizing 670B Tokens  appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AMD-135M 语言模型 AI技术 性能优化
相关文章