ByteByteGo 2024年11月06日
How McDonald Sells Millions of Burgers Per Day With Event-Driven Architecture
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了麦当劳如何构建一个统一的事件驱动架构,以支持其全球运营和面向客户的服务。该架构基于AWS和Datadog,旨在实现可扩展性、高可用性、性能、安全性、可靠性和一致性。文章深入介绍了该架构的关键组件,如AWS MSK、Schema Registry、Standby Event Store、Custom SDKs和Event Gateway,并详细阐述了事件创建和接收流程。此外,文章还介绍了麦当劳团队如何利用Schema Registry维护数据完整性,以及如何使用Autoscaler功能实现MSK集群的自动扩展,从而有效解决事件驱动架构中常见挑战。

🤔 **架构核心:AWS Managed Streaming for Kafka (MSK)** 作为核心组件,负责管理生产者和消费者之间的通信、主题组织和管理以及事件在平台上的分发。

📖 **Schema Registry确保数据质量:** 该组件存储所有事件模式,用于验证生产者和消费者发送/接收的事件,确保数据结构的一致性,并允许消费者识别消息处理的模式。

🔄 **Standby Event Store保证事件不丢失:** 当MSK不可用时,该组件作为备用机制,临时存储无法发布到Kafka的事件,并使用AWS Lambda函数在Kafka可用时重试发布。

⚙️ **Custom SDK简化开发:** 麦当劳团队构建了特定于语言的库,简化生产者和消费者与平台的交互,支持标准化接口、内置模式验证和错误处理。

🌐 **Event Gateway支持外部集成:** 该组件提供HTTP端点,将外部合作伙伴的请求转换为Kafka事件,并实施身份验证和授权层,实现平台与外部系统的互操作。

📊 **数据治理:** 利用Schema Registry和数据契约来确保不同系统之间共享信息的准确性,并通过实时验证消息与模式的一致性来提高数据质量。

Cloud-scale monitoring with AWS and Datadog (Sponsored)

In this eBook, you’ll learn about the benefits of migrating workloads to AWS and how to get deep visibility into serverless and containerized applications with Datadog.

You’ll also learn how to:

Download the ebook


Disclaimer: The details in this post have been derived from the McDonald’s Technical Blog. All credit for the technical details goes to the McDonald’s engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Over the years, McDonald’s has undergone a significant digital transformation to enhance customer experiences, strengthen its brand, and optimize overall operations. 

At the core of this transformation is a robust technological infrastructure that unifies processes across various channels and touchpoints throughout their global operations.

The need for unified event processing emerged from McDonald's extensive digital ecosystem, where events are utilized across the technology stack. There were three key processing types:

The events were used across use cases such as mobile-order progress tracking and sending customers marketing communications (deals and promotions).

Coupled with the scale of McDonald’s operations, the system needed an architecture that could handle:

In this article, we’re going to look at McDonald’s journey of developing a unified platform enabling real-time, event-driven architectures.

Design Goals of the Platform

McDonald's unified event-driven platform was built with specific foundational principles to support its global operations and customer-facing services. 

Each design goal was carefully considered to ensure the platform's robustness and efficiency. Let’s look at the goals in a little more detail.

Scalability

The platform needed the ability to auto-scale to accommodate demand.

For this purpose, they engineered it to handle growing event volumes through domain-based sharding across multiple MSK clusters. This approach enables horizontal scaling and efficient resource utilization as transaction volumes increase.

High Availability

The platform had to be capable enough to withstand failures in components.

System resilience is achieved through redundant components and failover mechanisms. The architecture includes a standby event store that maintains operation continuity when the primary MSK service experiences issues.

Performance

The goal was to deliver events in real time with the ability to handle highly concurrent workloads.

Real-time event delivery is facilitated through optimized processing paths and schema caching mechanisms. The system maintains low latency while handling high-throughput scenarios across different geographical regions.

Security

The data needed to adhere to data security guidelines.

The platform implements comprehensive security measures, including:

Reliability

The platform must be dependable with controls to avoid losing any events.

Event loss prevention is achieved through:

Consistency

The platform should maintain consistency around important patterns related to error handling, resiliency, schema evolution, and monitoring.

Standardization is maintained using:

Simplicity

The platform should reduce operational complexity so that teams can build on the platform with ease.

Operational complexity is minimized with:


The leading open source Notion alternative (Sponsored)

AppFlowy is the AI collaborative workspace where you achieve more without losing control of your data. It works offline and supports self-hosting. Own your data and embrace a smarter way to work. Get started for free!

Try for Free


Key Components of the Architecture

The diagram below shows the high-level architecture of McDonald’s event-driven architecture.

The key components of the architecture are as follows:

Event Broker

The core component of the platform is AWS Managed Streaming for Kafka (MSK), which handles:

Schema Registry

A schema registry is a critical component that maintains data quality by storing all event schemas.

This enables schema validation for producers as well as consumers. It also allows the consumers to determine which schema to follow for message processing.

Standby Event Store

This component helps avoid the loss of messages if MSK is unavailable. 

It performs the following functions:

Custom SDKs

The McDonald’s engineering team built language-specific libraries for producers and consumers.

Here are the features supported by these SDKs:

Event Gateway

McDonald’s event-based architecture is required to support internally generated events and events produced by external partner applications.

The event gateway serves as an interface for external integrations by:

Supporting Utilities

These are administrative tools that offer capabilities such as:

Event Processing Flow

The event processing system at McDonald's follows a sophisticated flow that ensures data integrity and efficient processing. 

The diagram below shows the overall processing flow.

Let’s look at it in more detail by dividing the flow in two major themes - event creation and event reception.

Event Creation and Sharing

Event Reception

Techniques for Key Challenges

The McDonald’s engineering team also used some interesting techniques to solve common challenges associated with the setup.

Let’s look at a few important ones:

Data Governance

Ensuring data accuracy is crucial when different systems share information. If the data is reliable, it makes designing and building these systems much simpler. 

MSK and Schema Registry help maintain data integrity by enforcing "data contracts" between systems.

A schema is like a blueprint that defines what information should be present in each message and in what format. It specifies the required and optional data fields and their types (e.g., text, number, date). Every message is checked against this blueprint in real time. If a message doesn't match the schema, it's sent to a separate area to be fixed.

Here's how schemas work:

See the diagram below for reference:

Using a schema registry to validate data contracts ensures that the information flowing between systems is accurate and consistent. This saves time and effort in designing and operating the systems that rely on this data, especially for analytics purposes.

Cluster Autoscaling

MSK is a messaging system that helps different parts of an application communicate with each other. It uses brokers to store and manage the messages. 

As the amount of data grows, MSK automatically increases the storage space for each broker. However, they needed a way to add more brokers to the system when the existing ones got overloaded.

To solve this problem, they created an Autoscaler function. See the diagram below:

Think of this function as a watchdog that keeps an eye on how hard each broker is working. When a broker's workload (measured by CPU utilization) goes above a certain level, the Autoscaler function kicks in and does two things:

This way, the MSK system can automatically adapt to handle more data and traffic without the need to add brokers or move data around manually. 

Domain-Based Sharding

To ensure that the messaging system can handle a lot of data and minimize the risk of failures, they divide events into separate groups based on their domain. 

Each group has its own dedicated MSK cluster. This is like having separate mailrooms for different departments in a large company. The domain of an event determines which cluster and topic it belongs to. For example, events related to user profiles might go to one cluster, while events related to product orders might go to another.

Applications that need to receive events can choose to get them from any of these domain-based topics. This improves flexibility and helps distribute the workload across the system.

To make sure the platform is always available and can serve users globally, it is set up to work across multiple regions. In each region, there is a high-availability configuration. This means that if one part of the system goes down, another part can take over seamlessly, ensuring uninterrupted service.

Conclusion

McDonald's event-driven architecture demonstrates a successful implementation of a large-scale, global event processing platform. The system effectively handles diverse use cases from mobile order tracking to marketing communications while maintaining high reliability and performance.

Key success factors include the robust implementation of AWS MSK, effective schema management, and comprehensive error-handling mechanisms. The architecture's domain-based sharding approach and auto-scaling capabilities have proven crucial for handling growing event volumes.

Some best practices established through this implementation include:

Looking ahead, McDonald's platform is positioned to evolve with planned enhancements including:

These improvements will further strengthen the platform's capabilities while maintaining its core design principles of scalability, reliability, and simplicity.

References:


Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

事件驱动架构 AWS MSK 麦当劳 Schema Registry 数据治理
相关文章