ByteByteGo 07月08日 23:44
How Discord Stores Trillions of Messages with High Performance
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了 Discord 如何应对海量消息存储挑战,从早期的 Apache Cassandra 到 ScyllaDB 的转变,以及 Rust 编写的数据服务层的引入。通过对架构的改进,Discord 显著提升了系统的性能和稳定性,特别是在处理高峰流量时。文章详细介绍了技术选型、迁移策略和最终成果,为类似规模的平台提供了宝贵的经验。

💡 Discord 最初使用 Apache Cassandra 存储消息,但随着用户增长,出现了热分区、Compaction 滞后和 JVM 垃圾回收等问题,导致性能瓶颈和运维压力。

🚀 为了解决这些问题,Discord 选择了 ScyllaDB,其 C++ 编写的特性消除了垃圾回收带来的延迟,并采用了 shard-per-core 架构,提高了并发处理能力。

🛡️ 为了进一步优化,Discord 引入了基于 Rust 的数据服务层,负责数据访问和协调,通过请求合并和一致性路由等技术,降低了数据库负载,提高了系统的稳定性和吞吐量。

🔄 迁移过程中,Discord 采用了双写策略,并开发了定制的 Rust 迁移工具,实现了快速、无停机的数据库迁移,最终将迁移时间从数月缩短至 9 天。

✅ 迁移到 ScyllaDB 后,Discord 的系统性能和稳定性得到了显著提升,例如消息读取延迟从 40-125 毫秒降低到 15 毫秒,节点数量也大幅减少,为应对突发流量提供了更强的支持。

MCP Authorization in 5 Easy OAuth Specs (Sponsored)

Securely authorizing access to an MCP server used to be an open question. Now there's a clear answer: OAuth. It provides a path with five key specs covering delegation, token exchange, and scoped access.

WorkOS packages the full stack into one API, so you can add MCP authorization without building your own OAuth infrastructure.

Implement MCP Auth with WorkOS


Disclaimer: The details in this post have been derived from the articles shared online by the Discord Engineering Team. All credit for the technical details goes to the Discord Engineering Team.  The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Many chat platforms never reach the scale where they have to deal with trillions of messages. However, Discord does. And when that happens, a somewhat manageable data problem can quickly turn into a major engineering challenge that involves millions of users sending messages across millions of channels. 

At this scale, even the smallest architectural choices can have a big impact. Things like hot partitions can turn into support nightmares. Garbage collection pauses aren’t just annoying, but can lead to system-wide latency spikes. The wrong database design can lead to wastage of developer time and operational bandwidth.

Discord’s early database solution (moving from MongoDB to Apache Cassandra®) promised horizontal scalability and fault tolerance. It delivered both, but at a significant operational cost. Over time, keeping Apache Cassandra® stable required constant firefighting, careful compaction strategies, and JVM tuning. Eventually, the database meant to scale with Discord had become a bottleneck.

In this article, we will walk through how Discord rebuilt its message storage layer from the ground up. We will learn the issues Discord faced with Apache Cassandra® and their shift to ScyllaDB. Also, we will look at the introduction of Rust-based data services to shield the database from overload and improve concurrency handling.


Go from Engineering to AI Product Leadership (Sponsored)

As an engineer or tech lead, you know how to build complex systems. But how do you translate that technical expertise into shipping world-class AI products? The skills that define great AI product leaders—from ideation and data strategy to managing LLM-powered roadmaps—are a different discipline.

This certification is designed for technical professionals. Learn directly from Miqdad Jaffer, Product Leader at OpenAI, in the #1 rated AI certificate on Maven. You won't just learn theory; you will get hands-on experience developing a capstone project and mastering the frameworks used to build and scale products in the real world.

Exclusive for ByteByteGo Readers: Use code BBG500 to save $500 before the next cohort sells out.

View Course & Save $500


Initial Architecture

Discord's early message storage relied on Apache Cassandra®. The schema grouped messages by channel_id and a bucket, which represented a static time window. 

This schema allowed for efficient lookups of recent messages in a channel, and Snowflake IDs provided natural chronological ordering. A replication factor of 3 ensured each partition existed on three separate nodes for fault tolerance.

Within each partition, messages were sorted in descending order by message_id, a Snowflake-based 64-bit integer that encoded creation time.

The diagram below shows the overall partitioning strategy based on the channel ID and bucket.

At a small scale, this design worked well. However, scale often introduces problems that don't show up in normal situations.

Apache Cassandra® favors write-heavy workloads, which aligns well with chat systems. However,  high-traffic channels with massive user bases can generate orders of magnitude more messages than quiet ones. 

A few things started to go wrong at this point:

The diagram below shows the concept of hot partitions:

Performance wasn't the only issue. Operational overhead also ballooned.

At this point, Apache Cassandra® was being scaled manually, by throwing more hardware and more engineer hours at the problem. The system was running, but it was clearly under strain.

Switching to ScyllaDB

ScyllaDB entered the picture as a natural alternative. It preserved compatibility with the query language and data model of Apache Cassandra®, which meant the surrounding application logic could remain largely unchanged. However, under the hood, the execution model was very different.

Some key characteristics were as follows:

Overall, ScyllaDB offered the same interface with a far more predictable runtime. 

Rust-Based Data Services Layer

To reduce direct load on the database and prevent repeated query amplification, Discord also introduced a dedicated data services layer. These services act as intermediaries between the main API monolith and the ScyllaDB clusters. They are responsible solely for data access and coordination, and no business logic is embedded here.

The goal behind them was simple: isolate high-throughput operations, control concurrency, and protect the database from accidental overload.

Rust was chosen for the data services for both technical and operational reasons. This is because it brings together low-level performance and modern safety guarantees.

Some key advantages of choosing Rust are as follows:

Each data service exposes gRPC endpoints that map one-to-one with database queries. This keeps the architecture clean and transparent. The services do not embed any business logic. They are designed purely for data access and efficiency.

Request Coalescing

One of the most important features in this layer is request coalescing.

When multiple users request the same piece of data, such as a popular message in a high-traffic channel, the system avoids hammering the database with duplicate queries.

See the diagram below:

To support this pattern at scale, the system uses consistent hash-based routing. Requests are routed using a key, typically the channel_id. This allows all traffic for the same channel to be handled by the same instance of the data service. 

Ultimately, the Rust-based data services help offload concurrency and coordination away from the database. They flatten spikes in traffic, reduce duplicated load, and provide a stable interface to ScyllaDB. 

The result for Discord was higher throughput, better latency under load, and fewer emergencies during traffic surges.

Migration Strategy

Migrating a database that stores trillions of messages is not a trivial problem. The primary goals of this migration were clear:

The entire migration process was divided into phases:

Phase 1: Dual Writes with a Cutover Point

The team began by setting up dual writes. Every new message was written to both Apache Cassandra® and ScyllaDB. A clear cutover timestamp defined which data belonged to the "new" world and which still needed to be migrated from the "old."

This allowed the system to adopt ScyllaDB for recent data while leaving historical messages intact in Apache Cassandra® until the backfill completed.

Phase 2: Historical Backfill Using Spark

The initial plan for historical migration relied on ScyllaDB’s Spark-based migrator. 

This approach was stable but slow. Even after tuning, the projected timeline was three months to complete the full backfill. That timeline wasn't acceptable, given the ongoing operational risks with Apache Cassandra®.

Phase 3: A Rust-Powered Rewrite

Instead of accepting the delay, the team extended their Rust data service framework to handle bulk migration. This new custom migrator:

The result was a dramatic improvement. The custom migrator achieved a throughput of 3.2 million messages per second, reducing the total migration time from months to just 9 days. This change also simplified the plan. With fast migration in place, the team could migrate everything at once instead of splitting logic between "old" and "new" systems.

Final Step: Validation and Cutover

To ensure data integrity, a portion of live read traffic was mirrored to both databases, and the responses were compared. Once the system consistently returned matching results, the final cutover was scheduled.

In May 2022, the switch was flipped. ScyllaDB became the primary data store for Discord messages.

Post-Migration Results

After the migration, the system footprint shrank significantly. The Apache Cassandra® cluster had grown to 177 nodes to keep up with storage and performance demands. ScyllaDB required only 72 nodes to handle the same workload.

This wasn’t just about node count. Each ScyllaDB node ran with 9 TB of disk space, compared to an average of 4 TB on Apache Cassandra® nodes. The combination of higher density and better performance per node translated into lower hardware and maintenance overhead.

Latency Improvements

The performance gains were clear and measurable.

Operational Stability

One of the biggest wins was operational calm.

Conclusion

The real test of any system comes when traffic patterns shift from expected to chaotic. During the 2022 FIFA World Cup Final, Discord’s message infrastructure experienced exactly that kind of stress test and passed cleanly.

As Argentina and France battled through regular time, extra time, and penalties, user activity surged across the platform. Each key moment (goals by Messi, Mbappé, the equalizers, the shootout) created massive spikes in message traffic, visible in monitoring dashboards almost in real time. 

Message sends surged, and read traffic ballooned. The kind of workload that used to trigger hot partitions and paging alerts during the earlier design now ran smoothly. Some key takeaways were as follows:

Note: Apache Cassandra® is a registered trademark of the Apache Software Foundation.

References:


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Discord ScyllaDB Rust 数据库迁移
相关文章