ByteByteGo 前天 23:39
Shopify Tech Stack
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了Shopify的后端架构、前端技术、编程语言和工具,揭示了其如何应对大规模流量挑战。通过模块化单体架构、React Native移动开发、以及各种开源工具的应用,Shopify实现了快速迭代、高可用性和开发者友好性。文章详细介绍了Shopify的技术选型和实践经验,包括Ruby on Rails、React、MySQL、Kafka等核心技术,以及YJIT、Sorbet、Packwerk等自研和开源工具,为构建高负载、高可靠性的电商平台提供了宝贵的参考。

💡Shopify的后端采用Ruby on Rails构建,核心代码库依然是系统的心脏。为了适应大规模需求,Shopify对Rails进行了深度优化,例如使用YJIT提升运行时性能,Sorbet进行静态类型检查,以及Rails Engines实现模块化。

🧱Shopify采用模块化单体架构,整个代码库位于一个仓库中,运行于单一进程,但被拆分成独立部署的组件,并有严格的边界。组件通过定义公共接口,利用Sorbet进行契约约束,以防止紧耦合,支持安全重构,并降低系统复杂度。

📱Shopify的前端技术经历了多次演变,最终Admin界面采用React、React Router by Remix和TypeScript构建,并完全由GraphQL驱动。这种架构确保了Web端和移动端(React Native)的一致性,减少了代码重复和平台差异。

⚙️Shopify使用多种编程语言和工具,Ruby是后端的核心,TypeScript用于前端,Lua用于OpenResty的定制脚本,GraphQL用于跨平台的数据交互。此外,还使用了Kubernetes、Remix等技术来支持平台的部署和开发。

🛠️Shopify开发了一系列内部和开源工具,如Packwerk、Tapioca、Bootsnap等,以加强组件间的依赖关系,自动化安全检查,并减少运维工作。同时,Shopify也积极贡献于TruffleRuby、Semian等开源项目,提升Ruby的性能和稳定性。

Note: This article is written in collaboration with the Shopify engineering team. Special thanks to the Shopify engineering team for sharing details with us about their tech stack and also for reviewing the final article before publication. All credit for the technical details and diagrams shared in this article goes to the Shopify Engineering Team.

Shopify handles scale that would break most systems.

On a single day (Black Friday 2024), the platform processed 173 billion requests, peaked at 284 million requests per minute, and pushed 12 terabytes of traffic every minute through its edge. 

These numbers aren’t anomalies. They’re sustained targets that Shopify strives to meet. Behind this scale is a stack that looks deceptively simple from the outside: Ruby on Rails, React, MySQL, and Kafka.

But that simplicity hides sharp architectural decisions, years of refactoring, and thousands of deliberate trade-offs.

In this article, we map the tech stack powering Shopify from the modular monolith that still runs the business, to the pods that isolate failure domains, to the deployment pipelines that ship hundreds of changes a day. It covers the tools, programming languages, and patterns Shopify uses to stay fast, resilient, and developer-friendly at incredible scale.

Shopify Backend Architecture

Shopify’s backend runs on Ruby on Rails. The original codebase, written in the early 2000s, still forms the heart of the system. Rails offers fast development, convention over configuration, and strong patterns for database-backed web applications. Shopify also uses Rust for its systems programming language.

While most startups eventually rewrite their early frameworks, Shopify doubled down to help ensure Ruby and Rails are 100-year tools that will continue to merit being in their toolchain of choice. Instead of moving on to another framework, Shopify pushed it further. They invested in:

The result is one of the largest and longest-running Rails applications in production.

Modularization Strategy

Shopify runs a modular monolith. That phrase gets thrown around a lot, but in Shopify’s case, it means this: the entire codebase lives in one repository, runs in a single process, but is split into independently deployable components with strict boundaries.

Each component defines a public interface, with contracts enforced via Sorbet. 

These interfaces aren’t optional. They’re a way to prevent tight coupling, allow safe refactoring, and make the system feel smaller than it is. Developers don’t need to understand millions of lines of code. They need to know the contracts their component depends on and trust those contracts will hold.

To manage complexity, components are organized into logical layers:

This layering prevents cyclic dependencies and encourages clean flow across domains. 

To support this at scale, Shopify maintains a comprehensive system of static analysis tools, exception monitoring dashboards, and differentiated application/business metrics to track component health across the company. 

This modular structure doesn’t make development effortless. It introduces boundaries, which can feel like friction. However, it keeps teams aligned, reduces accidental coupling, and lets Shopify evolve without losing control of its core.

Frontend Technologies

Shopify’s frontend has gone through multiple architectural shifts, each one reflecting changes in the broader web ecosystem and lessons learned under scale.

The early days used standard patterns: server-rendered HTML templates, enhanced with jQuery and prototype.js. As frontend complexity grew, Shopify built Batman.js, its single-page application (SPA) framework. It offered reactivity and routing, but like most in-house frameworks, it came with long-term maintenance overhead.

Eventually, Shopify shifted back to simpler patterns: statically rendered HTML and vanilla JavaScript. However, that also had limits. Once the broader ecosystem matured, particularly around React and TypeScript, the team made a clean move forward.

Today, the Shopify Admin interface runs on React, React Router by Remix, written in TypeScript, and driven entirely by GraphQL. It follows a strict separation: no business logic in the client, no shared state across views. The Admin is one of Shopify’s biggest apps, built on Remix that behaves as a stateless GraphQL client. Each page fetches exactly the data it needs, when it needs it.

This discipline enforces consistency across platforms. Mobile apps and web admin screens speak the same language (GraphQL), reducing duplication and misalignment between surfaces. 

Mobile Development with React Native

Mobile development at Shopify follows a similar philosophy: reuse where possible, specialize where needed.

Every major app now runs on React Native. The goal of using a single framework is to share code, reduce drift between platforms, and improve developer velocity across Android and iOS.

Shared libraries power common concerns like authentication, error tracking, and performance monitoring. When apps need to drop into native for camera access, payment hardware, or long-running background tasks, they do so through well-defined native modules.

Shopify teams also contribute directly to React Native ecosystem projects like Mobile Bridge (for enabling web to trigger native UI elements), Skia (for fast 2D rendering), WebGPU (that enables modern GPU APIs and enables general-purpose GPU computation for AI/ML), and Reanimated (for performant animations). In some cases, Shopify engineers co-captain React Native releases.

Programming Languages and Tooling

Shopify’s language choices reflect its commitment to developer productivity and operational resilience. 

Developer Tooling & Open Source Contributions

A large monolith doesn’t stay healthy without support. Shopify has developed an ecosystem of internal and open-source tools to enforce structure, automate safety checks, and reduce operational toil.

A much more exhaustive list of open-source software supported by Shopify is also present here.

Databases, Caching, and Queuing

There are two main categories here:

Primary Database: MySQL

Shopify uses MySQL as its primary relational database, and has done so since the platform's early days. However, as merchant volume and transactional throughput grew, the limits of a single instance became unavoidable.

In 2014, Shopify introduced sharding. Each shard holds a partition of the overall data, and merchants are distributed across those shards based on deterministic rules. This works well in commerce, where tenant isolation is natural. One merchant’s orders don’t need to query another merchant’s inventory.

Over time, Shopify replaced the flat shard model with Pods. A pod is a fully isolated slice of Shopify, containing its own MySQL instance, Redis node, and Memcached cluster. Each pod can run independently, and each one can be deployed in a separate geographic region.

This model solves two problems:

By pushing isolation to the infrastructure level, Shopify contains failure domains and simplifies operational recovery.

Caching and Queues

Shopify relies on two core systems for caching and asynchronous work: Memcached and Redis.

But Redis wasn’t always scoped cleanly. At one point, all database shards shared a single Redis instance. A failure in that central Redis brought down the entire platform. Internally, the incident is still known as “Redismageddon.”

The lesson Shopify took from this incident was clear: never centralize a system that’s supposed to isolate work. Afterward, Redis was restructured to match the pod model, giving each pod its own Redis node. Since then, outages have been localized, and the platform has avoided global failures tied to shared infrastructure.

Messaging and Communication Between Services

There are two main categories of the same:

Eventing & Streaming

Shopify uses Kafka as the backbone for messaging and event distribution. It forms the spine of the platform’s internal communication layer, decoupling producers from consumers, buffering high-volume traffic, and supporting real-time pipelines that feed search, analytics, and business workflows.

At peak, Kafka at Shopify has handled 66 million messages per second, a throughput level that few systems encounter outside large-scale financial or streaming platforms.

This messaging layer serves several use cases:

By relying on Kafka, Shopify avoids tight coupling between services. Producers don't wait for consumers. Consumers process at their own pace. And when something goes wrong, like a downstream service crashing, the event stream holds the data until the system recovers.

That’s a practical way to build resilience into a fast-moving platform.

API Interfaces

For synchronous interactions, Shopify services communicate over HTTP, using a mix of REST and GraphQL.

However, as the number of services grows, this model starts to strain. Synchronous calls introduce tight coupling and hidden failure paths, especially when one service transitively depends on five others.

To address this, Shopify is actively exploring RPC standardization and service mesh architectures. The goal is to build a communication layer that’s:

ML Infrastructure at Shopify

The ML infrastructure at Shopify could be divided into two main parts:

Real-Time Search with Embeddings

Shopify’s storefront search doesn’t rely on traditional keyword matching. It uses semantic search powered by text and image embeddings: vector representations of product metadata and visual features that enable more relevant, contextual search results.

This system runs at production scale. Shopify processes around 2,500 embeddings per second, translating to over 216 million per day. These embeddings cover multiple modalities, including:

Each embedding is generated in near real time and immediately published to downstream consumers that use them to update search indices and personalize results.

The embedding system also performs intelligent deduplication. For example, visually identical images are grouped to avoid unnecessary inference. This optimization alone reduced image embedding memory usage from 104 GB to under 40 GB, freeing up GPU resources and cutting costs across the pipeline.

Data Pipeline Infrastructure

Under the hood, Shopify runs its ML pipelines on Apache Beam, executed through Google Cloud Dataflow. This setup supports:

Inference jobs are structured to process embeddings as quickly and cheaply as possible. The pipeline uses a low number of concurrent threads (down from 192 to 64) to prevent memory contention, ensuring that inference performance remains predictable under load.

Shopify trades off between latency, throughput, and infrastructure cost. The current configuration strikes that balance carefully:

For offline analytics, Shopify stores embeddings in BigQuery, allowing large-scale querying, trend analysis, and model performance evaluation without affecting live systems.

DevOps, CI/CD & Deployment

This area can be divided into the following parts:

Kubernetes-Based Deployment

Shopify deploys infrastructure using Kubernetes, running on Google Kubernetes Engine (GKE). Each Shopify pod, an isolated unit containing its own MySQL, Redis, and Memcached stack, is defined declaratively through Kubernetes YAML, making it easy to replicate, scale, and isolate across regions.

The runtime environment uses Docker containers for packaging applications and OpenResty, built on Nginx with embedded Lua scripting, for custom load balancing at the edge. These Lua scripts give Shopify fine-grained control over HTTP behavior, enabling smart routing decisions and performance optimizations closer to the user.

Before Kubernetes, deployment was managed through Chef, a configuration management tool better suited for static environments. As the platform evolved, so did the need for a more dynamic, container-based architecture. The move to Kubernetes replaced slow, manual provisioning with fast, declarative infrastructure-as-code.

CI/CD Process

Shopify’s monolith contains over 400,000 unit tests, many of which exercise complex ORM behaviors. Running all of them serially would take hours, maybe days. To stay fast, Shopify relies on Buildkite as its CI orchestrator. Buildkite coordinates test runs across hundreds of parallel workers, slashing feedback time and keeping builds within a 15–20 minute window.

Once the build passes,  Shopify's internal deployment tools take over and offer visibility into who's deploying what, and where.

Deployments don’t go straight to production. Instead, ShipIt uses a Merge Queue to control rollout. At peak hours, only 5–10 commits are merged and deployed at a time. This throttling makes issues easier to trace and minimizes the blast radius when something breaks.

Notably, Shopify doesn’t rely on staging environments or canary deploys. Instead, they use feature flags to control exposure and fast rollback mechanisms to undo bad changes quickly. If a feature misbehaves, it can be turned off without redeploying the code.

Observability, Reliability, and Security

This area can be divided into multiple parts, such as:

Observability Infrastructure

Shopify takes a structured, service-aware approach to observability. At the center of this is ServicesDB, an internal service registry that tracks:

ServicesDB catalog metadata and enforces good practices. When a service falls out of compliance (for example, due to outdated gems or missing logs), it automatically opens GitHub issues and tags the responsible team. This creates continuous pressure to maintain service quality across the board.

Incident response isn’t siloed into a single ops team. Shopify uses a lateral escalation model: all engineers share responsibility for uptime, and escalation happens based on domain expertise, not job title. This encourages shared ownership and reduces handoff delays during critical outages.

For fault tolerance, Shopify leans on two key tools:

Supply Chain & Security

Security isn’t an afterthought in Shopify’s stack, but part of the ecosystem investment. Since the company relies heavily on Ruby, it also works actively to secure the Ruby community at large.

Key efforts include:

The goal isn’t just to secure Shopify’s stack, but to strengthen the foundation shared by thousands of developers who depend on the same tools. 

Shopify’s Scale

Shopify's architecture isn’t theoretical. It’s built to withstand real-world pressure—Black Friday flash sales, celebrity product drops, and continuous developer activity across a global platform. These numbers put that scale in context.

References:


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Shopify 技术栈 Ruby on Rails React 电商平台
相关文章