ByteByteGo 2024年09月28日
EP131: How Uber Served 40 Million Reads with Integrated Redis Cache?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了 Kafka 的顶级用例,包括 Uber 如何使用 Redis 缓存来处理 4000 万次读取、AWS Lambda 的快速执行原理以及分布式锁的必要性。文章还介绍了 Uber 的缓存解决方案 CacheFront、AWS Lambda 的四项核心技术以及分布式锁的六大应用场景。

📢 **Uber 的缓存解决方案 CacheFront:** Uber 开发了 CacheFront 作为一种集成缓存解决方案,结合了 Redis、Docstore 和 MySQL。CacheFront 允许 Docstore 的查询引擎与 Redis 进行通信以处理读取请求。对于缓存命中,查询引擎从 Redis 获取数据;对于缓存未命中,请求则发送到存储引擎和数据库。在写入方面,Docstore 的 CDC(变更数据捕获)服务(Flux)会使 Redis 中的记录失效。它跟踪 MySQL 的二进制日志事件以触发失效操作。

📣 **AWS Lambda 的快速执行原理:** AWS Lambda 的快速执行基于四项核心技术:函数调用、分配服务、Firecracker 微虚拟机和组件存储。函数调用支持同步和异步两种模式,分别允许调用者直接调用 Lambda 函数或将请求放入内部 SQS 队列中异步处理。分配服务管理执行环境,并使用 Rust 编写以实现高性能。Firecracker 是一种轻量级的虚拟机管理器,用于运行无服务器工作负载,例如 AWS Lambda 和 AWS Fargate。组件存储使用分块和收敛加密等技术来高效管理输入数据和函数代码。

📤 **分布式锁的必要性:** 分布式锁是一种机制,用于在分布式系统中确保互斥访问。它在各种场景中发挥着至关重要的作用,包括领导者选举、任务调度、资源分配、微服务协调、库存管理和会话管理。分布式锁确保在任何给定时间只有一个节点成为领导者,防止任务重复执行,确保对共享资源的独占访问,协调多个微服务执行协调操作,维护库存水平的准确性,以及防止用户会话出现不一致。

📥 **Uber 的多区域缓存预热:** 为了处理区域故障转移期间可能出现的缓存未命中和数据库过载问题,Uber 的工程团队使用了跨区域 Redis 复制。通过跟踪 Redis 写入流,将键复制到远程区域。在远程区域,流消费者向查询引擎发出读取请求,查询引擎读取数据库并更新缓存。

📦 **Redis 和 Docstore 分片:** Uber 的所有团队都使用 Docstore,部分团队会产生大量的请求。为了处理负载,Redis 和 Docstore 实例都被分片或分区。但单个 Redis 集群出现故障可能会导致数据库分片出现热点。为了防止这种情况,Uber 使用与数据库分片不同的方案对 Redis 集群进行分区,确保负载均匀分布。

This week’s system design refresher:


The Enterprise Ready Conference for engineering leaders (Sponsored)

The Enterprise Ready Conference is a one-day event in SF, bringing together product and engineering leaders shaping the future of enterprise SaaS.

The event features a curated list of speakers with direct experience building for the enterprise, including OpenAI, Vanta, Checkr, Dropbox, and Canva.

Topics include advanced identity management, compliance, encryption, and logging — essential yet complex features that most enterprise customers require.

If you are a founder, exec, PM, or engineer tasked with the enterprise roadmap, this conference is for you. You’ll get detailed insights from industry leaders that have years of experience navigating the same challenges you face today. And best of all, it’s completely free since it’s hosted by WorkOS.

Request an invite


Top Kafka Use Cases You Should Know


How Uber Served 40 Million Reads with Integrated Redis Cache?

There are 3 main parts of the implementation:

    CacheFront Read and Writes with CDC

      Uber built CacheFront - an integrated caching solution with Redis, Docstore, and MySQL.

      Rather than the microservice, Docstore’s query engine communicates with Redis for read requests.

      For cache hits, the query engine fetches data from Redis. For cache misses, the request goes to the storage engine and the database.

      In the case of writes, Docstore’s CDC service (Flux) invalidates the records in Redis. It tails MySQL binlog events to trigger the invalidation.

    Multi-Region Cache Warming with Redis Streaming

      A region fail-over can result in cache misses and overload the database.

      To handle this, Uber’s engineering team uses cross-region Redis replication. This is done by tailing the Redis write stream to replicate keys to the remote region.

      In the remote region, the stream consumer issues read requests to the query engine that reads the database and updates the cache.

    Redis and Docstore Sharding

      All teams in Uber use Docstore and some generate a huge number of requests.

      Both Redis and Docstore instances are sharded or partitioned to handle the load. But a single Redis cluster going down may create a hot DB shard.

      To prevent this, they partitioned the Redis cluster using a scheme that was different from the DB sharding. This ensures that the load is evenly distributed.

Over to you: Would you have done something differently?


Latest articles

If you’re not a paid subscriber, here’s what you missed.

    Software Architecture Patterns

    The Saga Pattern

    Infrastructure as Code

    A Crash Course on Scaling the Data Layer

    A Crash Course on Load Balancers for Scaling

To receive all the full articles and support ByteByteGo, consider subscribing:

Subscribe now


What makes AWS Lambda so fast?

There are 4 main pillars:

    Function Invocation
    AWS Lambda supports synchronous and asynchronous invocation.

    In synchronous invocation, the caller directly calls the Lambda function using AWS CLI, SDK, or other services.

    In asynchronous invocation, the caller doesn’t wait for the function’s response. The request is authorized and an event is placed in an internal SQS queue. Pollers read messages from the queue and send them for processing.

    Assignment Service
    The Assignment Service manages the execution environments.

    The service is written in Rust for high performance and is divided into multiple partitions with a leader-follower approach for high availability.

    The state of execution environments is written to an external journal log.

    Firecracker MicroVM
    Firecracker is a lightweight virtual machine manager designed for running serverless workloads such as AWS Lambda and AWS Fargate.

    It uses Linux’s Kernel-based virtual machine to create and manage secure, fast-booting microVMs.

    Component Storage
    AWS Lambda also has to manage the state consisting of input data and function code.


    To make it efficient, it uses multiple techniques:

      Chunking to store the container images more efficiently.

      Using convergent encryption to secure the shared data. This involves appending additional data to the chunk to compute a more robust hash.

      SnapStart feature to reduce cold start latency by pre-initializing the execution environment

Over to you: Which other features do you think make AWS Lambda fast?


Why do we need to use a distributed lock?

A distributed lock is a mechanism that ensures mutual exclusion across a distributed system.

Top 6 Use Cases for Distributed Locks

    Leader Election
    Distributed locks can be used to ensure that only one node becomes the leader at any given time.

    Task Scheduling
    In a distributed task scheduler, distributed locks ensure that a scheduled task is executed by only one worker node, preventing duplicate execution.

    Resource Allocation
    When managing shared resources like file systems, network sockets, or hardware devices, distributed locks ensure that only one process can access the resource at a time.

    Microservices Coordination
    When multiple microservices need to perform coordinated operations, such as updating related data in different databases, distributed locks ensure that these operations are performed in a controlled and orderly manner.

    Inventory Management
    In e-commerce platforms, distributed locks can manage inventory updates to ensure that stock levels are accurately maintained when multiple users attempt to purchase the same item simultaneously.

    Session Management
    When handling user sessions in a distributed environment, distributed locks can ensure that a user session is only modified by one server at a time, preventing inconsistencies.


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Kafka Uber 缓存 Redis AWS Lambda 分布式锁
相关文章