AMD Releases AMD-135M: AMD’s First Small Language Model Series Trained from Scratch on AMD Instinct™ MI250 Accelerators Utilizing 670B Tokens

AMD has recently introduced its new language model, AMD-135M or AMD-Llama-135M, which is a significant addition to the landscape of AI models. Based on the LLaMA2 model architecture, this language model boasts a robust structure with 135 million parameters and is optimized for performance on AMD’s latest GPUs, specifically the MI250. This release marks a crucial milestone for AMD in its endeavor to establish a strong foothold in the competitive AI industry.

Background and Technical Specifications

The AMD-135M is built on the LLaMA2 model architecture and is integrated with advanced features to support various applications, particularly in text generation and language comprehension. The model is designed to work seamlessly with the Hugging Face Transformers library, making it accessible for developers and researchers. The model can handle complex tasks with a hidden size of 768, 12 layers (blocks), and 12 attention heads while maintaining high efficiency. The activation function used is the Swiglu function, and the layer normalization is based on RMSNorm. Its positional embedding is designed using the RoPE method, enhancing its ability to understand and generate contextual information accurately.

The release of this model is not just about the hardware specifications but also about the software and datasets that power it. AMD-135M has been pretrained on two key datasets: the SlimPajama and Project Gutenberg datasets. SlimPajama is a deduplicated version of RedPajama, which includes sources such as Commoncrawl, C4, GitHub, Books, ArXiv, Wikipedia, and StackExchange. The Project Gutenberg dataset provides access to a vast repository of classical texts, enabling the model to grasp various language structures and vocabularies.

Key Features of AMD-135M

AMD-135M has remarkable features that set it apart from other models in the market. Some of these key features include:

Parameter Size:

Number of Layers:

Hidden Size:

Context Window Size:

Pretraining and Finetuning Datasets:

Training Configuration:

Deployment and Usage

The AMD-135M can be easily deployed and used through the Hugging Face Transformers library. For deployment, users can load the model using the LlamaForCausalLM and the AutoTokenizer modules. This ease of integration makes it a favorable option for developers looking to incorporate language modeling capabilities into their applications. Additionally, the model is compatible with speculative decoding for AMD’s CodeLlama, further extending its usability for code generation tasks. This feature makes AMD-135M particularly useful for developers working on programming-related text generation or other NLP applications.

Performance Evaluation

The performance of AMD-135M has been evaluated using the lm-evaluation-harness on various NLP benchmarks, such as SciQ, WinoGrande, and PIQA. The results indicate the model is highly competitive, offering comparable performance to other models in its parameter range. For instance, it achieved a pass rate of approximately 32.31% on the Humaneval dataset using MI250 GPUs, a strong performance indicator for a model of this size. This shows that AMD-135M can be a reliable model for research and commercial applications in natural language processing.

In conclusion, the release of AMD-135M underscores AMD’s commitment to advancing AI technologies and providing accessible, high-performance models for the research community. Its robust architecture and advanced training techniques position AMD-135M as a formidable competitor in the rapidly evolving landscape of AI models.

Check out the Model on Hugging Face and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

The post AMD Releases AMD-135M: AMD’s First Small Language Model Series Trained from Scratch on AMD Instinct™ MI250 Accelerators Utilizing 670B Tokens appeared first on MarkTechPost.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签