Up, up & away: Designing systems that scale

System design is one of those topics that I’ve heard a lot about but don’t yet know a lot about. How do we go from personal projects to businesses and tools and serve thousands or millions of people?

There is an apparent gap in my understanding of scaling applications. I’m seeking to rectify that, at least at a high level. I’ve spent some time this week familiarizing myself with some of the buzzwords — namely horizontal and vertical scaling, load balancing, and caching. Here are my notes :)

Vertical vs. Horizontal scaling

I’m also a talented artist as you can see by my elaborate drawing above ^

Vertical scaling involves increasing the amount of power of your machine — CPU or RAM — to increase the load that the server is able to handle. There are several benefits of storing all data is stored on a single server. Firstly, inter-process communication is faster than network communication amongst servers. Secondly, with this model, there’s no need to worry about inconsistent data across servers.

There are, of course, limits to vertical scaling — your server can only get so big. Also, a single server introduces a single source of failure. If that server goes down, your entire program goes down. Horizontal scaling addresses some of these issues. You can always add new machines to handle an ever-increasing load. And in the event that one server crashes, you can redirect requests to a different machine to keep your program running.

Load Balancing

Load balancers rely on ‘consistent hashing’ to determine which server to send requests to. We know that the load will be ~evenly distributed based on the number of servers. And the number of servers could change — we are talking about scaling, after all. But ideally, we’d like to keep it so that our requests mostly are sent to the same server each time. Consistent hashing is a method of scaling such that adding an additional server makes as small a disruption as possible in the distribution of requests.

My artistic talent strikes again

The above pie charts are intended to help explain this. Say you have load balancer with 4 servers (chart A) and you want to add a fifth. Charts B and C show different methods of doing so. You can shift all of the pieces to make room for the fifth (chart B), or you can take a small slice from every other piece that will in total form the fifth piece (chart C).

The blue shading represents the amount of “disruption” from adding that fifth piece of pie. By “disruption” I mean data that was previously hashed to one server but is now being hashed to a different one due, to the addition of a new server. Chart C clearly shows less “disruption” as just a small proportion of each servers’ requests are being relocated (as compared to B). This represents consistent hashing.


Caching can save time by A) reducing network calls and B) storing results of expensive calculations so they don’t need to run multiple times. A load balancer with consistent hashing is useful because if we know that user 1 will always hit server A, we can choose to store user 1’s data on a cache connected to server A.

This sounds pretty great! Maybe we should just cache everything!

Alas, there’s a limit to the effectiveness of a cache, as with anything. The more you cache, the slower the cache. There’s a balance to be struck with what and how much to cache. A popular design is the Least Recently Used (LRU) cache, which uses a doubly linked list and a hash map under the hood.

Fittingly, an LRU cache optimizes by eliminating the least recently used data points as new ones are added.

Letters represent data stored. Timestamps represent most recent time of data use.

Diagram 1 shows a cache with data A-F stored per timestamp. When a request for data “B” comes in, the cache notes that it already has “B” saved in memory, and brings “B” to the top (as it is now the most recently used). There are no insertions or deletions — just a rearrangement based on timestamp.

When a new request comes in for “G” in diagram 2, the cache notices that it does not have “G” already saved in memory, and it also does not currently have extra space. Per the LRU principle, the cache will eliminate the least recently used data, which is “A” in this case. “G” is placed on top per diagram 3. Understanding this principle can help us to make informed decisions about when and where to use caching.

So there you have it — 3 of many fundamental concepts in the world of system design. The more I dig into this topic, the more I realize just how vast this topic is…(as is the case with pretty much every topic in this field). I’m going to give myself time to digest what I’ve gathered thus far. There’s more where this came from!

Developer in New York, NY.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store