MarkTechPost@AI 05月02日 15:20
JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

JetBrains正式开源了专为软件开发任务设计的40亿参数语言模型Mellum。Mellum从零开始构建,体现了JetBrains的工程优先方法,提供了一个领域专用模型,经过训练可在代码库和编程环境中实际使用。该模型在Hugging Face上以Apache 2.0许可证发布,JetBrains邀请更广泛的研究和开发者社区来试验、调整和提升Mellum的能力。Mellum专注于代码理解,支持多种语言,并在代码填充和完成等任务中表现出色。开源旨在提高透明度、可重用性,并促进社区协作。

💡Mellum是JetBrains开发的,专门针对软件开发任务的40亿参数语言模型,与通用LLM不同,Mellum是一种“焦点模型”,针对编程相关任务进行了优化,例如自动完成、代码填充和源代码的结构理解。

📚Mellum支持多种语言,包括Java、Kotlin、Python、Go、PHP、C、C++、C#、JavaScript、TypeScript、CSS、HTML、Rust和Ruby,反映了现代开发团队的多语言特性。

⚙️Mellum遵循LLaMA风格的架构,使用来自The Stack、StarCoder、CommitPack和英文维基百科等代码丰富来源的超过4.2万亿个token进行训练。它具有8K token上下文窗口,并使用bf16混合精度在通过Infiniband连接的256个NVIDIA H200 GPU的高吞吐量集群上进行训练。

📊JetBrains通过一系列基准测试评估了Mellum,这些基准测试反映了其主要用例——代码填充和完成。结果表明,该模型在结构化代码理解方面表现出色,尤其是在涉及部分或中断代码的场景中,这在实际开发工作流程中很常见。

JetBrains has officially open-sourced Mellum, a purpose-built 4-billion-parameter language model tailored for software development tasks. Developed from the ground up, Mellum reflects JetBrains’ engineering-first approach, offering a domain-specialized model trained for practical usage across codebases and programming environments. With its release on Hugging Face under the Apache 2.0 license, JetBrains extends an invitation to the broader research and developer community to experiment, adapt, and advance Mellum’s capabilities.

A Focal Model for Code Understanding

Unlike general-purpose LLMs, Mellum is classified by JetBrains as a “focal model”—a term they use to describe models with a narrow yet deep specialization. Mellum is optimized specifically for programming-related tasks such as autocompletion, infilling, and structural understanding of source code. This focused design avoids the overhead of broader linguistic modeling and enables the model to perform efficiently in IDE-like environments.

The model supports a wide array of languages including Java, Kotlin, Python, Go, PHP, C, C++, C#, JavaScript, TypeScript, CSS, HTML, Rust, and Ruby—reflecting the polyglot nature of modern development teams.

Model Architecture and Training Pipeline

Mellum follows a LLaMA-style architecture and was trained from scratch using over 4.2 trillion tokens drawn from code-rich sources such as The Stack, StarCoder, CommitPack, and English Wikipedia. It features an 8K token context window and was trained using bf16 mixed precision across a high-throughput cluster of 256 NVIDIA H200 GPUs connected via Infiniband.

The training process spanned approximately 20 days and leveraged modern infrastructure for scalable model development. The architecture and training procedure were designed with reproducibility and deployment flexibility in mind, making Mellum usable in both cloud inference setups (e.g., vLLM) and on local environments (e.g., llama.cpp, Ollama).

Benchmarking and Evaluation

JetBrains evaluated Mellum across a range of benchmarks that reflect its primary use cases—code infilling and completion. The model’s performance indicates strong alignment with the design goals:

These results reflect Mellum’s specialization for structured code understanding, especially in scenarios involving partial or interrupted code, which are common in real-world development workflows.

Rationale for Open Sourcing

JetBrains’ decision to release Mellum as open-source is grounded in several practical motivations:

The release includes both the base model (Mellum-4b-base) and a fine-tuned variant for Python (Mellum-4b-sft-python).

Implications for Developer Tooling

The availability of a compact, performant model optimized for source code opens new opportunities in the IDE space and beyond. JetBrains envisions Mellum as part of a broader strategy involving multiple focal models, each optimized for specific programming tasks such as diff generation or code review assistance. This approach aligns with the growing need for deployable, cost-effective, and context-aware AI tooling that can augment developer productivity without introducing opaque or oversized general-purpose models.

Conclusion

Mellum represents a deliberate shift toward smaller, specialized language models that prioritize utility, transparency, and efficiency. By making the model openly available, JetBrains offers a high-quality foundation for building the next generation of AI-assisted developer tools. Its architecture, training methodology, and benchmark performance signal a practical step forward in the evolving space of LLMs tailored for software engineering.


The release includes both the base model (Mellum-4b-base) and a fine-tuned variant for Python (Mellum-4b-sft-python). Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

JetBrains Mellum 开源 语言模型 软件开发
相关文章