System Design
This cheatsheet attempts to give a high-level overview of system design principles and patterns. Please contact me for corrections/omissions.
Last updated: 1 January 2026
Contents
Problem Solving12
- Requirements
- Functional: these are your “users can…” statements and the core features of the system.
- Non-functional: these are your “the system should be…” statements and the system qualities important to users.
- Core Entities: list the core entities that the API will interact with and that the system will persist.
- API: define the contract between the system and its users, including which API protocol to use, e.g REST, GraphQL, RPC, etc.
- Data Flow: describe the sequence of actions the system performs on the inputs to produce the outputs.
- Architecture: draw boxes and arrows to represent the system’s components and their interactions.
- Challenges: harden the design by ensuring it: (i) meets all the non-functional requirements; (ii) addresses edge cases; (iii) addresses bottlenecks; and (iv) builds on hints from the interviewer.
Concepts
Useful concepts to consider when designing scalable systems.
Concepts: API Design23
APIs describe how clients and services communicate.
There are three common API protocols:
-
REST (Representational State Transfer): uses standard HTTP verbs to create, read, update, and delete entities via clear URLs. It’s stateless, widely understood, benefits from caching and works well for most public web APIs
-
RPC (Remote Procedure Call): clients call remote functions as if they were local methods. It’s space- and performance-efficient and provides strong type safety. The trade-off is that it tends to tightly couple clients and servers and is best suited for internal, service-to-service communication in microservice architectures.
-
GraphQL: allows clients to request exactly the data they need, consolidating resource endpoints into a single endpoint. It avoids over-fetching and under-fetching and is especially useful when multiple clients have different data needs. The trade-off is added schema design complexity and query execution overhead.
Concepts: Caching23
Caching is storing an expensive computation so it doesn’t have to be computed again.
There are two common caching patterns:
- Cache-Aside / Lazy Loading: the application first checks the cache, falls back to the expensive computation on a miss, then stores the result back into the cache.
- Data can become stale if there are lots of updates but this can be handled via TTLs or explicit invalidation on writes.
- If there are lots of cache misses, the database will be under pressure.
- Least recently used (LRU) is a common eviction policy.
- Write-Through / Write-Behind: the application writes directly to the cache and the cache is responsible for writing the data to the database.
- In write-through, the cache synchronously writes to the database before acknowledging the write.
- In write-behind, the cache writes to the database asynchronously, often via a queue.
- Cache may be filled with data that is never read.
Concepts: Databases23
SQL Databases
Overview
- Relational databases are composed of tables, where each row reflects a data entity and each column defines specific information about the entity.
- Uses self-balancing B-trees for indexing, which can be slow to write into.
- Favours consistency over availability.
ACID
- Atomicity: transactions are either fully executed or not at all.
- Consistency: transactions maintain database integrity.
- Isolation: transactions run independently and do not affect each other.
- Durability: committed transactions persist even in case of system failure.
Types
- MySQL: popular relational database management system.
- PostgreSQL: feature-rich relational database management system.
- SQLite: relational database management system stored in a single file.
NoSQL Databases
Overview
- Non-relational databases that are designed to handle large volumes of unstructured data.
- Uses log-structured merge trees (LSM trees) for indexing, which are faster to write into.
- Favours availability over consistency.
BASE
- Basically Available: always operational and accessible.
- Soft state: system state can change without external input.
- Eventual consistency: guarantees consistency over time but not immediately.
Types
- Columnar: stores data as columns, e.g. Cassandra.
- Document: stores data as documents, e.g. MongoDB.
- Graph: stores data as graphs, e.g. Neo4j.
- Key-Value: stores data as key-value pairs, e.g. Redis.
CAP Theorem
Distributed systems must choose two of the following three properties:
- Consistency: all nodes see the same data.
- Availability: all nodes are available and responsive to requests.
- Partition Tolerance: the system continues to operate even if communication between nodes is lost.
Consistent Hashing4
Effective way to distribute keys to a large number of nodes in a way that stays stable as nodes are added/removed.
Naive Approach
\(\textrm{hash}(\textrm{key})\ \textrm{mod}\ M\) works only when the number of nodes \(M\) is fixed; changing \(M\) reshuffles assignments for most keys, forcing an expensive full rebalance.
Consistent Hashing
Both keys and nodes are hashed into the same fixed “hash space” arranged as a ring; a key is assigned to the first node encountered when moving clockwise from the key’s position. With this setup, adding or removing a node only affects the keys that fall between that node and its predecessor on the ring
Cascading Failures
If each node has only one position on the ring, losing a node pushes all its keys to the next node, potentially overloading it. The standard fix is virtual nodes: each physical node is represented by many positions on the ring. This spreads each node’s ownership across many small intervals, so when a node joins/leaves, the reassigned keys are distributed more evenly across the remaining nodes, reducing hot spots and improving balance.
Replication
Typically uses a master-slave architecture, where the master is the primary database (only supports write operations), while the slaves are replicas (only support read operations).
- If the only slave goes offline, read operations will be temporarily redirected to the master while a new slave will replace the old one.
- If the only master goes offline, a slave will be promoted to master, although this is complicated as the new master may not have the latest data.
Partitioning
Split a large database into smaller pieces inside a single database instance.
- Horizontal partitioning: split rows across the partition, e.g. by year.
- Vertical partitioning: split columns across the partition, e.g. keeping frequently accessed columns in the same partition.
Sharding2
Horizontal partitioning across multiple database instances.
Sharding Key
Consists of one or more columns that determine how data is grouped. It should have high cardinality, promote even distribution and align with query patterns.
Distribution Strategy
- Directory: split the data using a lookup table.
- Hash: split the data by a hash of the key, e.g. hash(
user_id) % 4. - Range: split the data by a range of values, e.g. by
user_id.
Challenges
- Hot spots: excessive access to a single shard can cause it to become a bottleneck, with the compound shard keys and dynamic shard splitting being common solutions.
- Joins: it is hard to perform joins across shards, with denormalization being a common solution.
- Transactions: it is hard to maintain consistency for cross-shard transactions, with saga patterns and accepting eventually consistency being common solutions.
Concepts: Load Balancing23
Load balancers help distribute the load across multiple machines.
There are four common load balancing algorithms:
- Hash: distribute the load across the machines based on a hash of the key.
- Least connections: distribute the load across the machines with the least connections.
- Least response time: distribute the load across the machines with the least response time.
- Round-robin: distribute the load across multiple machines one by one.
Concepts: Message Queues23
A durable buffer, stored in memory that supports asynchronous communication between systems.
There are two message queue components:
- Producers: create messages and publish them to the message queue.
- Consumers: subscribe to the message queue and process the messages.
In a message queue, producers and consumers can be decoupled, so:
- Producers can publish messages when the consumer is not available.
- Consumers can process messages when the producer is not available.
Concepts: Networking23
OSI Model
- Physical Layer: transmission and reception of raw bits over a physical medium, e.g. fiber optic cables.
- Data Link Layer: transmission of data frames between two nodes connected by a physical layer, e.g. Ethernet.
- Network Layer: structuring and managing a multi-node network, including addressing, routing and traffic control, e.g. IPv6.
- Transport Layer: reliable transmission of data segments between points on a network, including segmentation, acknowledgement and multiplexing, e.g. TCP.
- Session Layer: managing communication sessions, e.g. RPC.
- Presentation Layer: translation of data between a networking service and an application, e.g. SSL.
- Application Layer: high-level protocols such as for resource sharing or remote file access, e.g. HTTP.
Concepts: Numbers
Availability Numbers4
| Availability | Downtime per year |
|---|---|
| 99% | 3.65 days |
| 99.9% | 8.77 hours |
| 99.99% | 52.60 minutes |
| 99.999% | 5.26 minutes |
| 99.9999% | 31.56 seconds |
Latency Numbers5
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| Branch mispredict | 5 ns |
| L2 cache reference | 7 ns |
| Mutex lock/unlock | 25 ns |
| Main memory reference | 100 ns |
| Compress 1 KB with Zippy | 3 us |
| Send 1 KB over 1 Gbps network | 10 us |
| Read 1 MB sequentially from memory | 250 us |
| Round trip within same datacenter | 500 us |
| Read 1 MB sequentially from SSD | 1 ms |
| Disk seek | 10 ms |
| Read 1 MB sequentially from disk | 20 ms |
| Send packet CA → Netherlands → CA | 150 ms |
Designs
Reference designs for commonly-asked system design problems.
Designs: Link Shortener2
Requirements
Functional
- Users can submit a URL and receive a shortened version.
- Users can retrieve the original URL from the shortened version.
- Users can customize the shortened URL.
Non-functional
- The system should ensure that the shortened URL is unique.
- The system should be fast (redirecting should take < 100ms).
- The system should be fault tolerant (favour availability over consistency).
- The system should be scalable (1B shortened URLs and 100M DAU).
Core Entities
- Original URLs: the original URLs that users want to shorten.
- Shortened URLs: the shortened URLs that users receive.
- Users: the users who submit the original URLs.
API
POST /urls: submit a URL and receive a shortened version.GET /:short_url: retrieve the original URL from the shortened version.PUT /:short_url: update the original URL of the shortened URL.DELETE /:short_url: delete the shortened URL.
Architecture
Diagram
Components
- Client: the user who submits the URL.
- API Gateway: the entry point for the API.
- Write Service: the service that handles the write operations.
- Read Service: the service that handles the read operations.
- Counter: the service that generates the short code.
- Database: the database that stores the original URLs and the shortened URLs.
Challenges
Fast Redirects
- Bad: use a database with an index built on the short code column.
- Good: use an in-memory cache to store the short code and the original URL.
Response Headers
- Bad: use
301 Moved Permanentlywhich will cause the browser to cache the response, bypassing the link shortening service in the future. - Good: use
302 Foundwhich will cause the browser not to cache the response, allowing the link shortening service to track usage statistics.
Scalable Counter
- Bad: access the counter service every time a counter value is needed.
- Good: each write service instance can request a batch of counter values from the counter service and use them locally without any coordination.
Unique Short URLs
- Bad: use a prefix of the original URL as the short code.
- Good: use a hash function (e.g.
base62) to generate a short URL from the original URL or a global counter and take the first \(N\) characters as the short code (be wary of modulo bias). To handle collisions, enforce uniqueness using theUNIQUEconstraint on theshort_urlcolumn with a set certain number of retries.
Designs: News Feed2
Requirements
Functional
- Users can create posts.
- Users can follow other users.
- Users can view a feed of posts from other users they follow.
Non-functional
- The system should be fast (creating a post and viewing the feed should take <500 ms).
- The system should be fault tolerant (favour availability over consistency).
- The system should be scalable (2B users; unlimited follows/followers).
Core Entities
- Users: the users who create posts and follow other users.
- Posts: the content that users create.
- Follows: the uni-directional link between users.
API
POST /posts: create a new post with acontentfield.POST /users/:user_id/follow: follow a user.GET /feed?page_size=10&cursor={timestamp}: retrieve the feed of posts from followed users.
Architecture
Diagram
Components
- Client: the user who creates the post.
- API Gateway: the entry point for the API.
- Post Service: the service that handles the post operations.
- Feed Service: the service that handles the feed operations.
- Follow Service: the service that handles the follow operations.
- Database: the database that stores the posts and follows.
Challenges
Popular Users
- Bad: when popular users post, write millions of rows to the precomputed Feeds table.
- Good: when popular users post, fetch from Posts table instead of the precomputed Feeds table; when normal users post, write rows using async workers to the precomputed Feeds table.
Hot Keys in Posts Table
- Bad: insert a distributed cache between the Feed Service and the Posts table.
- Good: insert multiple independent caches between the Feed Service and the Posts table to prevent hot keys.
Designs: Rate Limiter2
Requirements
Functional
- The system can identify clients by user ID, IP address or API key.
- The system can enforce configurable rate limits for clients.
- The system can reject excess requests with
HTTP 429 Too Many Requests.
Non-functional
- The system should be fast (rate-limit checks should add < 10 ms per request).
- The system should be fault tolerant (favour availability over consistency).
- The system should be scalable (1M requests/second; 100M DAU).
Core Entities
- Rules: rate limit rules that define how many requests are allowed over a time window for specific clients and endpoints.
- Clients: the users, IP addresses or API keys being limited, each with state tracking their request usage.
- Requests: the incoming API calls evaluated against rate-limit rules using client identity, endpoints and timestamps.
Algorithms
Token Bucket
A token bucket is a container with a pre-defined capacity. Each request consumes one token. Tokens are put into the bucket periodically and once the bucket is full, extra tokens will overflow.
- Pros: simple and easy to implement, can handle burst requests and memory efficient
- Cons: parameters (bucket size and refill rate) need to be carefully tuned
Leaky Bucket
A leaky bucket is a first-in-first-out (FIFO) queue with a pre-defined capacity. It is similar to a token bucket, but requests are pulled from the queue and processed at a fixed rate.
- Pros: memory efficient and suitable for stable outflow rates
- Cons: unsuitable for bursty requests and parameters (bucket size and outflow rate) need to be carefully tuned
Fixed Window Counter
Timeline is divided into fixed-size windows each with a counter set at zero. Each request increments the counter of the current window. If the counter exceeds the limit, the request is rejected.
- Pros: simple and easy to implement and memory efficient
- Cons: traffic spikes at window boundaries may overwhelm the system
Sliding Window Log
Request timestamps are recorded in a cache. When a new request comes in, the system removes timestamps older than the window and allows the request if the number of requests in the window is less than the limit, adding the new timestamp to the cache.
- Pros: very accurate and can handle burst requests
- Cons: more complex to implement and requires a cache
Architecture
Diagram
Challenges
Dynamic Rule Configuration
- Bad: store the rate limit rules in the database and allow the API gateway to periodically poll for updates.
- Good: use Zookeeper for distributed configuration management with real-time notification.
Placement
- Bad: place the rate limiter in-process.
- Good: place the rate limiter at the API gateway or load balancer.
Race Condition
- Bad: use a lock to synchronize the counter.
- Good: use atomic Lua scripts or Redis sorted sets to store the timestamps.
Designs: Ticketmaster23
Requirements
Functional
- Users can view events.
- Users can search for events.
- Users can book tickets to events.
Non-functional
- The system should ensure that each ticket is only booked once.
- The system should be fast (100ms for search, 1s for booking).
- The system should be fault tolerant (favour consistency over availability).
- The system should be scalable (10M users per event).
Core Entities
- Events: the events that are available for booking.
- Tickets: the tickets that are available for booking.
- Users: the users who book tickets.
- Performers: the performers who are associated with events.
- Venues: the venues where events are held.
- Orders: the orders that are created when tickets are booked.
API
GET /events: search for events.GET /events/:event_id: view an event.POST /events/:event_id/tickets: book tickets to an event.GET /users/:user_id/orders: view orders for a user.GET /users/:user_id/payments: view payments for a user.
Architecture
Diagram
Components
- Client: the user who views events, searches for events and books tickets.
- API Gateway: the entry point for the API.
- Search Service: the service that handles the search operations.
- Event Service: the service that handles the event operations.
- Booking Service: the service that handles the booking operations.
- Database: the database that stores the events, tickets, users, performers, venues and orders.
- Stripe: the payment processor for booking tickets.
Challenges
Concurrent Bookings
- Bad: use Server-Sent Events (SSE) to push updates to the client in real-time.
- Good: use a per-user WebSocket connection to add users to a virtual waiting queue.
Improved Search
- Bad: add database indexes and optimise SQL queries using the
EXPLAINkeyword. - Good: add full-text indexes to the database or use a full-text search engine like Elasticsearch.
Ticket Reservations
- Bad: lock the row in the database or add a status field and expiration time to the
Ticketstable. - Good: use a distributed lock with TTL to reserve the tickets.
Designs: Typeahead4
Requirements
Functional
- Users can receive autocomplete results as they type.
Non-functional
- The system should return relevant results sorted by popularity.
- The system should be fast (100ms for search).
- The system should be fault tolerant (favour availability over consistency).
- The system should be scalable (10M users per day).
Data Structures
Use a Trie to store the autocomplete results.
Autocompletion
- Find the prefix.
- \(O(p)\), where \(p\) is the length of the prefix.
- Traverse the subtree from the prefix to get all valid children.
- \(O(c)\), where \(c\) is the number of children.
- Sort the children and get the top \(K\) results.
- \(O(c \log c)\), where \(c\) is the number of children.
Optimisations
- Limit the minimum and maximum length of a prefix.
- Use a cache to store the results of the most common prefixes at each node.
Architecture
Diagram
Components
- Client: the user who types in the prefix.
- API Gateway: the entry point for the API.
- Server: the server that handles the autocomplete requests.
- Cache: the in-memory cache that stores the results of the most common prefixes at each node.
- Database: the database that stores the autocomplete results.
Designs: Unique ID Generator43
Requirements
Functional
- Clients can generate unique IDs that are sortable by time.
Non-functional
- The system should be fast (10,000 IDs per second).
- The system should be fault tolerant (favour consistency over availability).
- The system should be scalable.
ID Format
Use a 64-bit integer to store the ID:
- Sign bit: 1 bit
- Timestamp: 41 bits
- Datacenter ID: 5 bits
- Machine ID: 12 bits
- Sequence number: 12 bits
Challenges
Clock Synchronization
- Bad: use a single clock source for all machines.
- Good: use a distributed clock source with Network Time Protocol (NTP).
Designs: Web Crawler2
Requirements
Functional
- The system can crawl the web starting from a given set of seed URLs.
- The system can extract and store text from each web page.
Non-functional
- The system should adhere to
robots.txtfiles. - The system should be fast and scalable (crawl the entire web in 5 days).
- The system should be fault tolerance (favour consistency over availability).
Data Flow
- Take a seed URL from the frontier queue and request the IP address from DNS.
- Fetch the HTML from the external server using the IP address.
- Extract text from the HTML.
- Store the text in an S3 bucket.
- Extract any linked URLs from the web pages and add them to the frontier queue.
- Repeat the process until all URLs have been crawled.
Architecture
Diagram
Components
- Webpage: the page that is fetched from the internet.
- DNS: resolves domain names to IP addresses so that the crawler can fetch the web pages.
- Frontier Queue: a queue of URLs that need to be crawled, starting with a set of seed URLs.
- Crawler: fetches web pages, extracts text and new URLs to add to the frontier queue.
- S3 Text Store: stores the text extracted from the web pages.
Challenges
Duplicate URL Detection
- Bad: check if the URL has already been crawled in-memory.
- Good: use a Bloom filter in-memory to check if the URL has already been crawled. A Bloom filter is a probabilistic data structure that can test membership with a controlled false-positive rate.
Retry Mechanism
- Bad: in-memory retry mechanism that is not fault tolerant.
- Good: use SQS which supports retries with exponential backoff.
Designs: YouTube2
Requirements
Functional
- Users can upload videos.
- Users can watch videos.
Non-functional
- The system should support uploading (with resume) and watching large videos (up to 10GB).
- The system should be fast.
- The system should be fault tolerant (favour availability over consistency).
- The system should be scalable (1M videos uploaded and 100M videos watched daily).
Core Entities
- Users: the users who upload and watch videos.
- Videos: the videos that are uploaded and watched.
- Metadata: the metadata for videos (title, description, tags, etc.).
API
POST /videos: upload a video.GET /videos/:video_id: view a video.GET /videos/:video_id/metadata: view the metadata for a video.
Architecture
Diagram
Components
- Client: the user who uploads and watches videos.
- API Gateway: the entry point for the API.
- Video Service: the service that handles the video operations.
- S3: the object storage that stores the videos.
- Metadata Database: the database that stores the metadata for videos.
Challenges
Resumable Uploads
- Bad: use the client to chunk the video file, upload it to S3 and send a
PATCHrequest to update the metadata. - Good: use ETags to implement server-side verification of the uploaded chunks.
Video Storage
- Bad: store the raw video files without post-processing.
- Good: store different video formats and split them into 5-second segments.
Video Streaming
- Bad: download the entire video file to the client.
- Good: use adaptive bitrate streaming to download the video file in chunks.