ByteByteGo 2024年10月22日
Uber Reduces Database Lock Time by 94% with Major MySQL Fleet Upgrade
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨Uber从MySQL 5.7升级到8.0的过程,包括升级的必要性、面临的挑战、采取的策略及升级流程等内容。

Uber升级MySQL的原因包括5.7版本达到寿命末期,存在安全风险和操作不稳定,而8.0版本提供了性能、并发增强,新功能如改进索引等。

Uber的MySQL基础设施规模庞大,由超过2100个集群、19个生产区的16000多个节点组成,每秒处理约300万次查询。

升级过程中面临诸多挑战,如规模大需详细升级策略、要保持高可用性、确保兼容性、进行全面测试及实现自动回滚机制。

Uber选择了并行升级策略,该策略具有最小停机时间、易回滚、可充分测试等优点,还通过自动化流程管理升级的复杂性。

Uber的MySQL升级经过精心规划,以四阶段过程执行,确保服务中断最小化并实现大规模数据的平稳过渡。

The Future of AI, LLMs, and Observability on Google Cloud (Sponsored)

Discover 7 key insights for technical leaders from Google’s Director of AI, Dr. Ali Arsanjani, and Datadog’s VP of Engineering, Sajid Mehmood. This ebook provides actionable insights around questions such as:

Download the eBook


Disclaimer: The details in this post have been derived from the Uber Engineering Blog. All credit for the technical details goes to the Uber engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

MySQL serves as the backbone for Uber’s vast and complex operations. For many years, Uber relied upon MySQL version 5.7 to support business-critical features.

However, in 2023, they decided to upgrade from MySQL version 5.7 to version 8.

In this post, we’ll look at the need for this and the challenges Uber faced in such a large-scale upgrade. We will also investigate the solutions Uber used to achieve the upgrade without violating the Service-Level Objective (SLO).

The Need for the Upgrade

The decision to upgrade Uber's MySQL infrastructure from version 5.7 to 8.0 was driven by several critical factors. 

First, MySQL 5.7 was reaching its end-of-life, meaning it would no longer receive security updates or bug fixes, leaving Uber's infrastructure vulnerable to potential security risks and operational instability. Upgrading to MySQL 8.0 mitigated these risks by ensuring ongoing support and security improvements. 

Additionally, MySQL 8.0 offered significant performance and concurrency enhancements such as:

Beyond performance, MySQL 8.0 introduced several new functionalities such as:

Overall, these performance, security, and operational benefits made the transition to MySQL 8.0 a critical move for Uber's data infrastructure.


Workshop: Implementing Clean Architecture in Next.js (Sponsored)

Lazar Nikolov and Sarah Guthals are hosting a free workshop on Implementing Clean Architecture in Next.js. It will dive deep into what clean architecture *actually* is, what problems it solves, and how to implement it in a Next.js application with Sentry.

RSVP Here


The Scale of The Upgrade

Uber’s MySQL infrastructure is vast, operating at a scale that supports its global platform operations. Here are some stats about the overall scale that shows the critical role of MySQL in Uber’s services:

Also, to ensure high availability and data redundancy, Uber employs a primary-secondary replication architecture. It works as follows:

Challenges with the Upgrade

Several challenges had to be addressed during the upgrade of Uber’s MySQL fleet from version 5.7 to 8.0. Some of the major ones are as follows:

Uber conducted thorough regression checks and validation tests to ensure all existing systems and applications continued to work seamlessly with the upgraded database. 

This process included testing in a staging environment before making production upgrades. By validating every aspect of the system, Uber was able to mitigate the risk of any unexpected issues after the upgrade.

Finally, Uber implemented automated rollback mechanisms to safeguard the upgrade process. 

In the event of any failures or compatibility issues during the upgrade, these mechanisms could automatically revert the changes, ensuring the maintenance of service continuity and data integrity.

For instance, in the pre-maintenance stage, where the new MySQL 8.0 nodes operated as replicas, if performance issues or system degradation were detected, Uber could instantly roll back to MySQL 5.7 without any risk of data loss. The rollback capability was crucial for addressing any latency, resource consumption, or service degradation issues, allowing Uber to revert to a stable state until the issues were resolved. 

However, once a MySQL 8.0 node was promoted to the primary status, rolling back to MySQL 5.7 became more complex because replication between the new and old versions was no longer possible. In other words, Uber had to ensure everything was functioning correctly before promoting the new nodes to avoid irreversible complications.

Upgrade Strategy

When upgrading its massive MySQL infrastructure from version 5.7 to 8.0, Uber had two possible strategies to choose from: side-by-side upgrade and in-place upgrade.

In-Place Upgrade

An in-place upgrade involves directly upgrading the existing MySQL installation to the new version (MySQL 8.0) on the same nodes. 

The process typically requires stopping the MySQL service, upgrading the software, and restarting it. While this method can be simpler in terms of setup, it also comes with significant drawbacks:

Due to these limitations, Uber decided against the in-place upgrade method.

Side-by-Side Upgrade

Uber chose a side-by-side upgrade approach, which allowed for a smoother and less risky transition. 

See the diagram below:

In this method, the new MySQL 8.0 nodes were set up and operated alongside the existing MySQL 5.7 nodes. 

This approach was more suitable for Uber’s infrastructure due to the following reasons:

Scaling the Upgrade Process with Automation

To manage the complexity of upgrading such a large infrastructure, Uber implemented an automated workflow. 

With more than 2,100 clusters and over 16,000 nodes, upgrading each node manually was an impossible task. Automation ensured that the process was scalable, efficient, and free from human error.

Two main aspects of this automation are:

Four-Stage Upgrade Process for MySQL

Uber’s MySQL upgrade from version 5.7 to 8.0 was carefully planned and executed in a four-stage process. 

This approach ensured minimal service disruption and allowed Uber to transition its massive data infrastructure safely. Let’s break down the four stages in simple terms:

1. Pre-Maintenance Stage

In the pre-maintenance stage, new MySQL 8.0 nodes were added as replicas to the existing MySQL 5.7 clusters. A "node" here is a server running a MySQL instance. 

By adding these MySQL 8.0 nodes as replicas, they could work alongside the old 5.7 nodes without disrupting any operations.

This setup ensured that the old system (MySQL 5.7) continued functioning normally while the new system (MySQL 8.0) was being integrated, allowing Uber to keep everything running smoothly.

2. System Monitoring (Soak Period)

After setting up the MySQL 8.0 nodes, Uber entered the system monitoring stage, also known as the "soak period." This stage lasted for about a week and was crucial for testing the new system under real-world conditions.

During this time, Uber monitored the MySQL 8.0 nodes as they handled real production traffic (read operations), checking for issues such as slow performance, errors, or increased resource usage.

This period was essential to detect potential problems before making the final switch to MySQL 8.0.

3. Maintenance Stage

Once the soak period confirmed that everything was working smoothly, Uber moved to the maintenance stage. 

In this phase, the MySQL 8.0 node was promoted to primary status, meaning it now handled all write operations and became the main database for that cluster.

This promotion marked the point where MySQL 8.0 officially became the main database, while the MySQL 5.7 nodes were demoted or turned off for write traffic.

4. Post-Maintenance Stage

Finally, in the post-maintenance stage, Uber removed all the old MySQL 5.7 nodes that were no longer needed. 

At this point, the new MySQL 8.0 nodes were fully operational, and all traffic (both read and write) was being handled by the new system.

By completing this step, Uber successfully transitioned to the new version, ensuring that the system was upgraded without any data loss or significant service disruptions.

Issues During Upgrade

During the upgrade of Uber’s MySQL infrastructure to version 8.0, several issues were encountered that required careful handling and technical solutions to ensure the system continued to run smoothly. 

Here’s a breakdown of the key problems and how they were addressed:

Query Execution Plan Changes

One of the major issues that Uber faced was related to changes in the query execution plans in MySQL 8.0. 

A query execution plan is the path the database system uses to retrieve data. In some clusters, MySQL 8.0 chose different paths compared to version 5.7, leading to increased latencies (delays) and higher resource consumption.

These changes could slow down certain operations, affecting the performance of dashboards and other tools that relied on quick access to data. For instance, clusters powering key dashboards at Uber experienced noticeable slowdowns.

Uber worked with Percona, a database consulting company, to develop a patch that optimized the execution plans for the affected clusters. By applying this patch, Uber was able to restore performance and reduce resource consumption, bringing the system back to optimal operation.

Unsupported Queries and Configurations

MySQL 8.0 introduced new syntax rules and stricter configurations, which caused some queries that worked in MySQL 5.7 to fail after the upgrade. 

Specifically, some clusters didn’t have the STRICT_TRANS_TABLES SQL mode enabled, which is a default setting in MySQL 8.0. This mode enforces stricter rules on handling invalid or missing data.

Uber had to carefully adjust configurations and rewrite certain queries to align with MySQL 8.0’s new syntax and rules. For example, they enabled the STRICT_TRANS_TABLES and ONLY_FULL_GROUP_BY modes, which made the system more robust but required changes to some of the legacy queries and applications.

Collation and Character Set Changes

MySQL 8.0 also brought changes to the default character set and collation. The character set controls how text is stored, and the collation determines how text is compared. 

In MySQL 5.7, Uber had been using the utf8mb4_unicode_520_ci collation, but MySQL 8.0 switched to the new utf8mb4_0900_ai_ci collation. 

This change in the default character set and collation caused issues with sorting and comparing text data across different clusters, particularly when dealing with different languages or special characters. The system needed consistency in collation settings to function correctly, but this shift created mismatches.

Uber had to align the collation settings across its systems to ensure all nodes used the same character set and collation. This required detailed configuration changes and testing to ensure compatibility and proper sorting behavior across all clusters.

Client Library Incompatibility

Many client libraries that Uber used to connect to the MySQL database were not initially compatible with MySQL 8.0. Client libraries are essential for applications to communicate with the database, and outdated versions of these libraries did not support some of the new features and functions introduced in MySQL 8.0.

Without updating these libraries, Uber’s applications couldn’t fully utilize the benefits of MySQL 8.0, and some applications experienced failures or errors when trying to connect to the upgraded database.

Uber upgraded these client libraries across its systems. This process involved rigorous testing in a staging environment to ensure that all client libraries worked properly with MySQL 8.0 before the full upgrade. Once the testing was complete, the libraries were deployed in production, ensuring a smooth transition.

Improvements After The Upgrade

The upgrade to MySQL 8.0 brought significant performance improvements to Uber’s infrastructure, both on the server side and client side.

Let’s look at both.

Server-Side Performance:

Client-Side Performance:

Conclusion

Through careful planning, automation, and a phased rollout strategy, Uber successfully transitioned its vast data systems with minimal downtime and disruption. 

The new version brought significant benefits in terms of performance, security, and functionality, helping Uber improve its operational efficiency and user experience.

Some key learnings are as follows:

References:


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Uber MySQL升级 升级策略 自动化流程
相关文章