Facebook’s Database Handling Billions of Messages (Cassandra Deep Dive)

Google’s 7 predictions on AI, LLM, and Observability (Sponsored)

Read the 7 key takeaways from Google’s Director of AI and Datadog’s VP of Engineering as they break down their predictions of the future:

Smarter AI and LLM strategies for your org

Building customer trust in AI outputs

Scaling your tooling as LLM expertise grows

Read the insights

Disclaimer: The details in this post have been derived from Cassandra Research Paper and other sources. All credit for the technical details goes to the Facebook engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Cassandra is a powerful database system designed to store and manage massive amounts of data across many computers.

Facebook originally developed it to support a feature called Inbox Search, which allows users to quickly search through their messages. The goal was to support billions of messages sent by Facebook users every day.

Storing and efficiently searching through such a massive amount of data is a big challenge. Traditional databases, like MySQL, struggled to handle this workload because they were not designed to scale easily.

To solve this, Facebook engineers took inspiration from two existing technologies:

Amazon Dynamo: A system that makes sure data is always available, even if some nodes fail. It does this by copying data to multiple machines and using a peer-to-peer structure where every node (computer) is equal.

Google Bigtable: A database used by Google to store large amounts of structured data efficiently. It introduced the idea of a column-based storage model, which makes it faster to access specific pieces of data.

By combining the best parts of these two systems, Facebook created Cassandra, which became a decentralized, highly scalable, and fault-tolerant database. Later, it was released as open-source software, allowing companies like Netflix, Twitter, and Apple to use and improve it.

In this article, we’ll take a deep dive into Cassandra and understand what makes it special.

The Key Features of Cassandra

Some key features of Cassandra are as follows:

Distributed Storage: Data is spread across many machines instead of being stored on a single server.

High Availability: Even if some machines fail, Cassandra continues to work without interruption.

No Single Point of Failure: Since there is no central control system, there is no weak spot that can bring everything down.

Scalability: It can easily handle increasing amounts of data by simply adding more machines to the network.

Setting targets for developer productivity metrics — March 24th (Sponsored)

Setting targets for developer productivity metrics takes careful consideration: we need to think through the potential tradeoffs or incentives created (hello Goodhart's law!), whether the targets are realistic, and which goals are appropriate at different levels of the organization. Join Abi Noda and Laura Tacho, DX CEO and CTO, for a discussion about how to properly set targets for productivity metrics so you can continue to push for improvement and accountability from your own teams.

Join this discussion to learn:

How to think about potential pitfalls, like Goodhart’s Law and Campbell’s Law

How to set goals at different levels: frontline team, group, and organization level

Real-world examples of good and bad approaches to setting goals and targets for productivity metrics

Cassandra’s Data Model

Cassandra’s data model is quite different from traditional relational databases like MySQL.

At its core, Cassandra’s data model is like a multi-dimensional map (or dictionary), where each piece of data is indexed by a row key. This means that instead of rigidly defining tables and columns in advance, data can be stored in a way that best suits the needs of the application.

The data is organized into column families that are of two types:

Simple Column Family: It is a collection of standard key-value pairs where each key points to a set of columns. For example, if storing user information, the row key could be the User ID, and the columns could be name, email, phone number, etc.

Super Column Family: It is a more complex, nested structure that groups multiple columns under a "Super Column." This allows hierarchical data organization. For example, if storing a user’s messages, the row key could be User ID. Super Columns could be different conversations, and within each Super Column, individual messages could be stored.

Columns can be sorted by timestamp or name, depending on the application’s needs. Primary key lookup is the main way to retrieve data. Instead of running complex queries like in SQL databases, Cassandra retrieves data by directly accessing the row key.

The structure of a column consists of the following parts:

Name: The identifier for the column.

Value: The actual data stored in the column.

Timestamp: A timestamp that records when the data was written, helping in managing updates and conflict resolution.

Cassandra API Overview

Cassandra follows a key-based lookup approach, meaning every operation revolves around the row key. Unlike relational databases that support complex queries (like JOINs or subqueries), Cassandra prioritizes speed and scalability by keeping its API lightweight.

Therefore, Cassandra provides a simple API structure that allows applications to interact with the database using three main operations.

1 - Insert Data

The interface is insert(table, key, rowMutation). This command adds new data to Cassandra.

The “table” is where the data will be stored and the “key” uniquely identifies the row. The rowMutation represents the changes made to the row, such as adding new columns or updating existing ones.

2 - Retrieve Data

The API interface is get(table, key, columnName). It fetches data from the database.

The “table” specifies where to look and the “key” identifies which row to retrieve. The “columnName” specifies which part of the row is needed.

3 - Delete Data

The interface is delete(table, key, columnName).

This command removes data from the database. It can delete an entire row or just a specific column within a row.

Cassandra System Architecture

Cassandra is designed as a highly scalable and fault-tolerant distributed database.

It does not rely on a single central server but instead follows a peer-to-peer model, where all nodes in the system are equal.

Cassandra organizes its nodes (servers) in a ring structure. Each piece of data is assigned to a node using consistent hashing, which ensures even distribution across all nodes. When new nodes are added, Cassandra automatically rebalances the data without requiring a complete reorganization.

See the diagram below that shows how consistent hashing works.

There is no master node, meaning any node can handle read and write requests. Since all nodes are equal, there is no single point of failure. If a node fails, other nodes in the system can continue handling requests without disruption.

Replication Mechanisms

Cassandra ensures that data is copied across multiple nodes to prevent data loss and improve availability. Developers can choose between different replication strategies:

Rack-Unaware Replication: Data is copied to N-1 successor nodes in the ring, meaning if one node goes down, its neighbors still have a copy. This method works well for small-scale deployments but does not account for hardware or network failures across different locations.

Rack-Aware Replication: Uses Zookeeper, a coordination service, to manage which nodes store replicas. This ensures that data copies are distributed across different racks (physical groupings of servers) in a data center.

Datacenter-Aware Replication: Distributes data copies across multiple data centers to ensure high availability even if an entire data center goes offline.

Gossip Protocols in Cassandra

Cassandra uses a gossip protocol to allow nodes (servers) in the system to communicate with each other efficiently.

This protocol is inspired by how rumors spread in real life. Instead of requiring a central system to keep track of everything, information is passed from one node to another in small, periodic updates.

Gossip protocols are great because they have a low network overhead. Instead of flooding the system with updates, nodes exchange small bits of information at regular intervals. Even if some nodes go offline, others can still function because they share information across the network.

Cassandra uses Scuttlebutt, a specialized Gossip Protocol, to keep track of which nodes are active or inactive. Each node periodically exchanges information about itself and other nodes with its neighbors, ensuring that the entire cluster remains up to date.

Instead of a simple "up or down" status, Cassandra assigns a suspicion level to each node.

If a node stops responding, its suspicion value starts increasing over time.

If the value crosses a certain threshold, the system considers the node as "dead" and reroutes traffic to other nodes.

In other words, Cassandra’s failure detection is probabilistic, meaning it adapts to network conditions instead of rigid timeout rules. This helps prevent false alarms caused by temporary delays or slow responses.

Query Execution in Cassandra

Cassandra is designed to handle high-speed data writes and efficient reads while ensuring durability and fault tolerance.

Instead of storing data like traditional relational databases, which write changes immediately to disk, Cassandra follows a log-structured storage model that optimizes speed and reliability.

How Cassandra Handles Writes?

Cassandra follows a multi-step process when writing data. The process consists of three main components:

Commit Log (Disk): Every time data is written to Cassandra, it is first recorded in the Commit Log. The Commit Log is stored on disk and ensures that data is not lost even if the system crashes before it is fully processed. This step makes Cassandra fault-tolerant.

Memtable (RAM): After writing to the Commit Log, the data is stored in memory in a structure called the Memtable. The Memtable acts as a temporary, in-memory cache for fast access. Because reading from RAM is faster than reading from disk, this speeds up query performance.

SSTables (Disk): When the Memtable reaches a certain size, it is flushed to disk as an SSTable (Sorted String Table). SSTables are immutable, meaning they are never modified after being written. Instead of updating existing files, Cassandra writes new SSTables and merges them later through a compaction process to optimize storage.

This write process is efficient because, unlike traditional databases that modify data in place (causing random disk writes), Cassandra writes data sequentially, which is much faster and more efficient. Since SSTables are never modified, Cassandra avoids the overhead of complex locking mechanisms found in relational databases. Also, Cassandra can recover lost data if a node crashes because every write is first recorded in the Commit Log.

How Cassandra Handles Reads?

Unlike traditional databases that rely on complex indexing, Cassandra optimizes read performance using a combination of in-memory lookups and efficient disk scans.

Here’s a step-by-step look at the read process:

Check Memtable: When a read request comes in, Cassandra first checks the Memtable because it contains the most recent data. If the data is found in the Memtable, the result is returned immediately, making the read extremely fast.

Check SSTables on Disk: If the requested data is not in the Memtable, Cassandra searches for it in the SSTables stored on disk. Since SSTables are never updated, multiple versions of a row may exist in different SSTables, so Cassandra must scan multiple files.

Use Bloom Filters: To improve efficiency, Cassandra uses Bloom Filters, which are probabilistic data structures that help quickly determine if an SSTable might contain the requested data. If the Bloom Filter suggests that an SSTable does not contain the data, Cassandra skips that file entirely, reducing the number of disk reads. If the Bloom Filter indicates that an SSTable might contain the data, Cassandra checks the file for the requested row.

Merge and Return Most Recent Data: Since Cassandra writes new SSTables instead of modifying existing ones, multiple versions of a row might exist across different SSTables. The system merges all versions of the row, applying the latest updates based on timestamps, and returns the final result to the client.

Facebook Inbox Search Use Case

As mentioned, Cassandra was originally developed at Facebook to solve the challenge of storing and searching billions of messages efficiently. Before Cassandra, Facebook used MySQL for storing these messages, but as the platform grew, MySQL struggled to handle the increasing volume of data and high query load.

To address this, Facebook deployed Cassandra on a 150-node cluster, which stored over 50 terabytes (TB) of messages. The system needed to support fast and scalable searches while handling constant write operations as users sent and received messages.

Facebook’s Inbox Search allows users to find messages using two types of queries:

Term Search (Keyword-Based Search): It allows users to search for messages that contain specific words or phrases. For example, if a user searches for "project update," Cassandra retrieves all messages containing those words. The system stores word-to-message mappings using Super Column Families, where the row key is the user ID, the super columns are the words and the sub-columns store message IDs that contain those words.

Interaction Search (User-Based Search): It allows users to find messages exchanged with a specific person. For example, if a user searches for "Alice," Cassandra retrieves all messages exchanged between the user and Alice. This is implemented using another Super Column Family, where the row key is the user ID, the super columns are the contacts, and the sub-columns contain the message IDs exchanged with that contact.

One of the biggest challenges in Facebook’s messaging system was ensuring low-latency searches across a massive dataset. Cassandra’s highly optimized architecture allowed it to achieve impressive performance:

Minimum Latency: As low as 7-8 milliseconds, meaning some searches return almost instantly.

Median Latency: Around 15-18 milliseconds, ensuring consistently fast responses for most queries.

Maximum Latency: In worst-case scenarios, searches could take up to 44 milliseconds, which is still very fast given the large dataset.

Join the NVIDIA GTC Event (Virtual GTC is Free!) [Sponsored]

Join your fellow engineers at GTC25 in San Jose, California (March 17-21). This flagship event by NVIDIA is bringing you more than 1000 session, 400+ exhibits, technical hands-on training, and tons of unique networking events.

Conclusion

Cassandra is a highly scalable, distributed database system designed to handle large volumes of data while ensuring fault tolerance and high availability.

Its peer-to-peer architecture and ring-based design make it particularly well-suited for applications that require continuous uptime and seamless scaling across multiple data centers. One of Cassandra’s key strengths is its ability to handle high write-throughput efficiently, making it ideal for real-time applications, such as messaging platforms, recommendation systems, and IoT data storage.

However, Cassandra is not a replacement for traditional relational databases. It is not optimized for complex queries, joins, or transactional consistency, which makes it less suitable for applications requiring strong relational integrity.

For businesses and developers building large-scale, distributed systems, Cassandra provides a robust, flexible, and highly available solution that can grow with demand while maintaining performance and reliability.

References:

Cassandra - A Decentralized Structured Storage System

A Presentation on Cassandra

Cassandra - A Structured Storage System on a P2P network

SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.