ByteByteGo 2024年09月10日
How Shopify Manages its Petabyte Scale MySQL Database
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Shopify通过多种方式管理其关键的MySQL数据库,以应对大规模数据处理的挑战,包括零停机时间的分片平衡、保持数据库复制的读取一致性以及数据库的备份和恢复。

🎯Shopify采用零停机时间的分片平衡技术,根据店铺流量变化移动数据,以减轻数据库失败风险、提高基础设施生产力并保证买家能随时访问喜爱的店铺。Shopify通过分析历史数据和资源需求来进行分片平衡,这是一个持续优化的过程。

🛅在移动店铺数据时,Shopify需解决可用性、数据完整性和吞吐量三大约束。他们使用Ghostferry工具,通过批处理复制和跟踪二进制日志来迁移数据,并分阶段进行操作,以确保迁移过程中无明显停机和数据无丢失或损坏。

📄Shopify重视数据库复制的读取一致性,使用只读查询的副本(读副本)来分担读取工作负载,但需注意复制滞后可能导致读副本读取到过时数据,对于时间敏感的读取仍需在主服务器进行。

Generate Handwritten SDKs (Sponsored)

Invest hundreds of hours your team doesn't have in maintaining SDKs by hand or generate crappy SDKs that leave a bad taste in your users' mouths. That's two bad options. Fortunately, you can now use Speakeasy to generate ergonomic type-safe SDKs in over 10 languages. We've worked with language experts to create a generator that gets the details right. With Speakeasy you can build SDKs that your team is proud of.

Try for free


Disclaimer: The details in this post have been derived from the article originally published on the Shopify Engineering Blog. All credit for the details about Shopify’s architecture goes to their engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Shopify has revolutionized the e-commerce landscape by empowering small business owners to establish and grow their online presence. 

With millions of merchants relying on their platform globally, Shopify’s infrastructure has evolved to handle the ever-increasing demands of their user base.

At the heart of Shopify’s infrastructure lies their MySQL database, which has grown to an impressive petabyte scale. Managing a database of this magnitude presents significant challenges, especially when considering Shopify’s commitment to providing a zero-downtime service. 

Their direct customers are business owners, who depend on their online stores to generate revenue and sustain their livelihoods. Any downtime or service disruption can have severe consequences for these merchants, potentially leading to lost sales and damaged customer relationships.

In this post, we will look at how Shopify manages its critical MySQL database in three major areas:

Each area is critical for operating a database at Shopify’s scale. For us, it’s a great opportunity to derive some key learnings.

Shard Balancing with Zero Downtime

Shopify runs a large fleet of MySQL database instances.

These instances are internally known as shards and are hosted within pods. 

Each shard can store the data for one or more shops. See the diagram below where the MySQL shard within pod 1 contains the data for shop ABC and FOO.

As traffic patterns for individual shops change, certain database shards become unbalanced in their resource utilization and load. 

For example, if both shop ABC and shop FOO launch a mega sale simultaneously, it will result in a surge of traffic causing the database server to struggle. To deal with this, Shopify moves one of the shop’s data to another shard. 

This process is known as shard balancing and it’s important for multiple reasons such as:

An interesting takeaway from these reasons is how successful companies are focused on the customer experience even when dealing with largely technical concerns. A well-balanced shard is not directly visible to the end user. However, an unbalanced shard can indirectly impact the user experience negatively.

The second takeaway is a strong focus on cost. This is evident from the idea of improving the infrastructure’s productivity, which ultimately translates to savings.

Let’s now investigate how Shopify runs the shard rebalancing process.

The Concept of Pods

Shopify’s infrastructure is composed of many pods. 

Each pod is an isolated instance of the core Shopify application and a MySQL database shard. There are other data stores such as Redis and Memcached but we are not concerned about them right now.

A pod houses the data for one or more shops. Web requests for shops arrive at the Nginx load balancer that consults a routing table and forwards the request to the correct pod based on the shop.

The concept of pods in Shopify’s case is quite similar to cells in a cell-based architecture. 

Nginx acts as the cell router and the application layer is the same across all pods. It has access to a routing table that maps a shop to a particular shard. See the diagram below:

However, there is also a slight difference from cell-based architecture. The data in each pod varies depending on the shops hosted in a pod’s database instance.

As discussed earlier, each pod consists of a shard or a partition of the data. 

Shopify’s data model works well with this topology since “shop” is an identifier for most tables. The shop acts as a natural partition between data belonging to different customers. They can attach a shop_id field to all shop-owned tables and use it as a sharding key. 

The trouble starts when multiple shops living on the same pod become too big, resulting in higher database usage for some shards and lower usage for others. There are two problems when this happens:

The graph below highlights the variation in database usage per shard that developed over time as merchants came on board and grew. Each line represents the database usage for a unique shard on a given day.

Balancing the Shards

Shopify faces two key challenges when it comes to rebalancing shards for optimal resource utilization:

A simplistic approach of evenly distributing shops across shards is not effective due to the varying data sizes and resource requirements of each shop. Some shops may consume a disproportionate amount of resources, leading to an imbalanced shard utilization.

Instead, Shopify employs a data-driven approach to shard rebalancing. 

They analyze historical database utilization and traffic data for each shard to identify usage patterns and classify shops based on their resource requirements. The analysis takes into account factors such as:

Nevertheless, this is an ongoing process that requires continuous optimization. Shopify also uses data analysis and machine learning algorithms to make better decisions.

Moving the Shop

Moving a shop from one shard to another is straightforward: select all records from all tables having the required shop_id and copy them to another MySQL shard.

However, there are three main constraints Shopify has to deal with:

As expected, availability is critical. Shopify doesn’t want any visible downtime. While there’s a possibility for some downtime, the end user should not feel the impact.

Also, data integrity is crucial. Imagine there was a sale that got wiped out because the shop was moving from one shard to another. This would be unacceptable for the business owner.

As you can notice, each technical requirement is driven by strong business drivers.

Let us now look at each step in the process:

Phase One: Batch Copying and Tailing the Binlog

To perform the data migration, Shopify uses Ghostferry. It’s an in-house tool written in Go.

Later on, Shopify made it open-source. At present, Ghostferry’s GitHub repository has around 690+ stars.

Let’s assume that Pod 1 has two shops - ABC and FOO. Both shops decided to run a sale and expect a surge of traffic. Based on Shopify’s rebalancing strategy, Shop ABC should be moved from Pod 1 to Pod 2 for better resource utilization.

The diagram below shows the initial state where the traffic for Shop ABC is served by Pod 1. However, the copy process has started.

Ghostferry uses two main components to copy over data: 

In batch copying, Ghostferry iterates over the tables on the source shard, selects the relevant rows based on the shop’s ID, and writes these rows to the target shard. Each batch of writes is performed within a separate MySQL transaction to ensure data consistency.

To ensure that the rows being migrated are not modified on the source shard, Ghostferry uses MySQL’s SELECT…FOR UPDATE clause. This statement implements locking reads, which means that the selected rows from the source shard are write-locked for the duration of the transaction.

Ghostferry also starts tailing MySQL’s binlog to track and replicate changes that occur on the source shard to the target shard. The binlog serves as a sink for events that describe the modifications made to a database, making it the authoritative source of truth.

In essence, both batch copying and tailing the binlog take place together.

Phase Two: Entering Cutover

The only opportunity for downtime is during the cutover. Therefore, the cutover is designed to be a short process.

Here’s what happens during the cutover phase:

Phase Three: Switch Traffic and Prune Stale Data

In the last phase, the shop mover process updates the routing table to associate the shop with its new pod.

The shop is now served from the new pod. However, the old pod still contains the shop data.

They perform a verification to ensure that the movement is successful. If no issues are identified during the verification process, stale data of shop ABC on the old pod is deleted.

Read Consistency with Database Replication

The second major learning point from data management at Shopify’s scale is related to database replication.

Read replicas are copies of a primary database that are used to handle read-only queries. They help distribute the read workload across multiple servers, reducing the load on the primary database server. This allows the primary servers to be used for time-sensitive read/write operations.

An interesting point to note here is that read replicas don’t handle all the reads. Time-sensitive reads still go to the primary servers.

Why is this the case?

The unavoidable reason is the existence of replication lag. 

Any database replication process will have some lag. The implication is that applications reading from a replica might end up reading stale data. However, this may not be acceptable for some specific reads. For example, a customer updating the profile information and not seeing the updates reflected on the profile page.

Also, reads are not always atomic. There can be a scenario where related pieces of data are assembled from the results of multiple queries. 

For example, consider the below sequence of events:

Imagine that between steps 2 and 3, the inventory for Item B gets updated on the master and the item is sold out. However, replica 2 has a higher replication lag compared to replica 1. This means that while replica 1 returns the updated inventory, replica 2 returns the outdated inventory for Item B.

This can create inconsistency within the application.

The diagram below shows this scenario:

To use replication effectively, Shopify had to solve this issue: 

There were two potential solutions Shopify considered but did not use:

Finally, Shopify settled on a solution to implement monotonic read consistency. In this approach, successive reads should follow a consistent timeline even if the data read is not real-time.

This can be ensured by routing a series of related reads to the same server so that successive reads fetch a consistent state even if it’s not the latest state. See the diagram below for reference:

To implement this technically, Shopify had to take care of two points:

Any application that requires read consistency within a series of requests supplies a unique identifier common to those requests. This identifier is passed within query comments as a key-value pair.

The diagram below shows the complete process:

The identifier is a UUID that represents a series of related requests. 

The UUID is labeled as consistent_read_id within the comments and goes through an extraction followed by a hashing process to determine the server that should receive all the requests that contain this identifier.

Shopify’s approach to consistent reads was simple to implement and had a low overhead in terms of processing. Its main drawback was that intermittent server outages can introduce read consistencies but this tradeoff was acceptable to them.

Database Backup and Restore

The last major learning point from Shopify’s data management is related to how they manage database backup and restore.

As mentioned earlier, Shopify runs a large fleet of MySQL servers. These servers are spread across three Google Cloud Platform (GCP) regions.

Initially, Shopify’s data backup process was as follows:

While the process was robust, it was time-consuming. Backing up a petabyte of data spread across multiple regions was too long. Also, the restore time for each shard was more than six hours. This meant Shopify had to accept a very high Recovery Time Objective (RTO).

To bring the RTO down to just 30 minutes, Shopify redesigned the backup and restore process. Since their MySQL servers ran on GCP’s VM using Persistent Disk (PD), they decided to leverage PD’s snapshot feature.

Let’s look at each step of the process in detail.

Taking a Backup 

Shopify developed a new backup solution that uses GCP API to create persistent disk snapshots of their MySQL instances.

They deployed this backup tooling as a CronJob within their Kubernetes infrastructure. The CronJob is configured to run every 15 minutes across all clusters in all available regions. The tool creates snapshots of MySQL instances nearly 100 times a day across all shards, resulting in thousands of daily snapshots.

The diagram below shows the process:

Retaining Snapshots

Since the backup process generated so many snapshots, Shopify also wanted to have a retention process to keep the important snapshots only. This was to keep the costs down.

They built another tool that implements the retention policy and deployed it using another CronJob on Kubernetes.

The diagram below shows the snapshot deletion process based on the retention policy.

Performing a Restore

Having a very recent snapshot readily available enables Shopify to clone replicas with the most up-to-date data possible.

The process of restoring the backup is quite simple. It involves the following steps:

By exporting a snapshot to a new PD volume, Shopify can restore the database in a matter of minutes. This approach has reduced their RTO to less than 30 minutes, including the time needed to recover from any replication lag.

The diagram below shows the database restore process:

Conclusion

Shopify’s database management techniques are a great example of how simple solutions can help organizations achieve the needed scale. Also, it shows that companies like Shopify have a strong focus on the user experience and cost while making any technical decision.

In this post, we’ve seen a glimpse of how Shopify manages its petabyte-scale MySQL database. Some of the key things we’ve covered are as follows:

References:

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Shopify 数据库管理 分片平衡 数据复制 Ghostferry
相关文章