ByteByteGo 03月11日
Facebook’s Database Handling Billions of Messages (Cassandra Deep Dive)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了Cassandra,一个由Facebook开发的用于处理海量数据的分布式数据库系统。文章介绍了Cassandra的设计灵感来源于Amazon Dynamo和Google Bigtable,并详细阐述了其关键特性,如分布式存储、高可用性和可扩展性。此外,文章还剖析了Cassandra独特的数据模型,包括列族的概念,以及简单和超级列族的区别。同时,文章还介绍了Cassandra的API结构,包括插入、检索和删除数据的主要操作。最后,文章深入研究了Cassandra的系统架构,包括节点组织方式、复制机制和Gossip协议,以及数据写入的处理方式,旨在帮助读者全面了解Cassandra的强大功能和设计理念。

💡 **分布式存储与高可用性**: Cassandra将数据分散存储在多台机器上,即使部分机器发生故障,系统也能持续运行,没有单点故障风险,并且能够通过简单地增加机器来扩展数据处理能力。

🗂️ **独特的数据模型**: Cassandra的数据模型类似于多维地图,数据组织成列族,包括简单列族(键值对集合)和超级列族(嵌套结构),通过主键查找数据,而非复杂的SQL查询,从而提升速度和可扩展性。

🔄 **Gossip协议与故障检测**: Cassandra利用Gossip协议在节点间高效通信,传递节点状态信息。采用Scuttlebutt协议跟踪节点活跃状态,并使用概率性的故障检测机制,根据可疑程度判断节点是否失效,从而适应网络状况,防止误判。

✍️ **数据写入流程**: 数据写入时,Cassandra首先将数据记录到Commit Log中,确保数据持久性。然后,数据被写入到内存中的Memtable,当Memtable达到一定大小时,数据会被异步刷新到磁盘上的SSTable,以此优化写入速度和可靠性。

Google’s 7 predictions on AI, LLM, and Observability (Sponsored)

Read the 7 key takeaways from Google’s Director of AI and Datadog’s VP of Engineering as they break down their predictions of the future:

Read the insights


Disclaimer: The details in this post have been derived from Cassandra Research Paper and other sources. All credit for the technical details goes to the Facebook engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Cassandra is a powerful database system designed to store and manage massive amounts of data across many computers. 

Facebook originally developed it to support a feature called Inbox Search, which allows users to quickly search through their messages. The goal was to support billions of messages sent by Facebook users every day.

Storing and efficiently searching through such a massive amount of data is a big challenge. Traditional databases, like MySQL, struggled to handle this workload because they were not designed to scale easily.

To solve this, Facebook engineers took inspiration from two existing technologies:

By combining the best parts of these two systems, Facebook created Cassandra, which became a decentralized, highly scalable, and fault-tolerant database. Later, it was released as open-source software, allowing companies like Netflix, Twitter, and Apple to use and improve it.

In this article, we’ll take a deep dive into Cassandra and understand what makes it special.

The Key Features of Cassandra

Some key features of Cassandra are as follows:


Setting targets for developer productivity metrics — March 24th (Sponsored)

Setting targets for developer productivity metrics takes careful consideration: we need to think through the potential tradeoffs or incentives created (hello Goodhart's law!), whether the targets are realistic, and which goals are appropriate at different levels of the organization. Join Abi Noda and Laura Tacho, DX CEO and CTO, for a discussion about how to properly set targets for productivity metrics so you can continue to push for improvement and accountability from your own teams.

Join this discussion to learn:

Register to join


Cassandra’s Data Model

Cassandra’s data model is quite different from traditional relational databases like MySQL.

At its core, Cassandra’s data model is like a multi-dimensional map (or dictionary), where each piece of data is indexed by a row key. This means that instead of rigidly defining tables and columns in advance, data can be stored in a way that best suits the needs of the application.

The data is organized into column families that are of two types:

Columns can be sorted by timestamp or name, depending on the application’s needs. Primary key lookup is the main way to retrieve data. Instead of running complex queries like in SQL databases, Cassandra retrieves data by directly accessing the row key.

The structure of a column consists of the following parts:

Cassandra API Overview

Cassandra follows a key-based lookup approach, meaning every operation revolves around the row key. Unlike relational databases that support complex queries (like JOINs or subqueries), Cassandra prioritizes speed and scalability by keeping its API lightweight.

Therefore, Cassandra provides a simple API structure that allows applications to interact with the database using three main operations.

1 - Insert Data

The interface is insert(table, key, rowMutation). This command adds new data to Cassandra. 

The “table” is where the data will be stored and the “key” uniquely identifies the row. The rowMutation represents the changes made to the row, such as adding new columns or updating existing ones.

2 - Retrieve Data

The API interface is get(table, key, columnName). It fetches data from the database.

The “table” specifies where to look and the “key” identifies which row to retrieve. The “columnName” specifies which part of the row is needed.

3 - Delete Data

The interface is delete(table, key, columnName).

This command removes data from the database. It can delete an entire row or just a specific column within a row.

Cassandra System Architecture

Cassandra is designed as a highly scalable and fault-tolerant distributed database. 

It does not rely on a single central server but instead follows a peer-to-peer model, where all nodes in the system are equal. 

Cassandra organizes its nodes (servers) in a ring structure. Each piece of data is assigned to a node using consistent hashing, which ensures even distribution across all nodes. When new nodes are added, Cassandra automatically rebalances the data without requiring a complete reorganization.

See the diagram below that shows how consistent hashing works.

There is no master node, meaning any node can handle read and write requests. Since all nodes are equal, there is no single point of failure. If a node fails, other nodes in the system can continue handling requests without disruption.

Replication Mechanisms

Cassandra ensures that data is copied across multiple nodes to prevent data loss and improve availability. Developers can choose between different replication strategies:

Gossip Protocols in Cassandra

Cassandra uses a gossip protocol to allow nodes (servers) in the system to communicate with each other efficiently. 

This protocol is inspired by how rumors spread in real life. Instead of requiring a central system to keep track of everything, information is passed from one node to another in small, periodic updates.

Gossip protocols are great because they have a low network overhead. Instead of flooding the system with updates, nodes exchange small bits of information at regular intervals. Even if some nodes go offline, others can still function because they share information across the network.

Cassandra uses Scuttlebutt, a specialized Gossip Protocol, to keep track of which nodes are active or inactive. Each node periodically exchanges information about itself and other nodes with its neighbors, ensuring that the entire cluster remains up to date.

Instead of a simple "up or down" status, Cassandra assigns a suspicion level to each node. 

In other words, Cassandra’s failure detection is probabilistic, meaning it adapts to network conditions instead of rigid timeout rules. This helps prevent false alarms caused by temporary delays or slow responses.

Query Execution in Cassandra

Cassandra is designed to handle high-speed data writes and efficient reads while ensuring durability and fault tolerance. 

Instead of storing data like traditional relational databases, which write changes immediately to disk, Cassandra follows a log-structured storage model that optimizes speed and reliability.

How Cassandra Handles Writes?

Cassandra follows a multi-step process when writing data. The process consists of three main components:

This write process is efficient because, unlike traditional databases that modify data in place (causing random disk writes), Cassandra writes data sequentially, which is much faster and more efficient. Since SSTables are never modified, Cassandra avoids the overhead of complex locking mechanisms found in relational databases. Also, Cassandra can recover lost data if a node crashes because every write is first recorded in the Commit Log. 

How Cassandra Handles Reads?

Unlike traditional databases that rely on complex indexing, Cassandra optimizes read performance using a combination of in-memory lookups and efficient disk scans.

Here’s a step-by-step look at the read process:

Facebook Inbox Search Use Case

As mentioned, Cassandra was originally developed at Facebook to solve the challenge of storing and searching billions of messages efficiently. Before Cassandra, Facebook used MySQL for storing these messages, but as the platform grew, MySQL struggled to handle the increasing volume of data and high query load.

To address this, Facebook deployed Cassandra on a 150-node cluster, which stored over 50 terabytes (TB) of messages. The system needed to support fast and scalable searches while handling constant write operations as users sent and received messages.

Facebook’s Inbox Search allows users to find messages using two types of queries:

One of the biggest challenges in Facebook’s messaging system was ensuring low-latency searches across a massive dataset. Cassandra’s highly optimized architecture allowed it to achieve impressive performance:


Join the NVIDIA GTC Event (Virtual GTC is Free!) [Sponsored]

Join your fellow engineers at GTC25 in San Jose, California (March 17-21). This flagship event by NVIDIA is bringing you more than 1000 session, 400+ exhibits, technical hands-on training, and tons of unique networking events.

Register here


Conclusion

Cassandra is a highly scalable, distributed database system designed to handle large volumes of data while ensuring fault tolerance and high availability. 

Its peer-to-peer architecture and ring-based design make it particularly well-suited for applications that require continuous uptime and seamless scaling across multiple data centers. One of Cassandra’s key strengths is its ability to handle high write-throughput efficiently, making it ideal for real-time applications, such as messaging platforms, recommendation systems, and IoT data storage. 

However, Cassandra is not a replacement for traditional relational databases. It is not optimized for complex queries, joins, or transactional consistency, which makes it less suitable for applications requiring strong relational integrity.

For businesses and developers building large-scale, distributed systems, Cassandra provides a robust, flexible, and highly available solution that can grow with demand while maintaining performance and reliability.

References:


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.


Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Cassandra 分布式数据库 数据模型 Gossip协议
相关文章