Distributed Software Systems at scale

One of the most potent skills a senior engineer can possess in modern software engineering orgs is the ability to navigate distributed software systems at a global scale. Designing, constructing, and maintaining such systems can be an immensely challenging yet profoundly satisfying intellectual exercise. For some of the top players in the industry, like Google or Meta/Facebook, the responsibilities of lead engineers are akin to the responsibilities of commanders overseeing a global battle field. The business relies on you and your peers to operate massive global systems with a myriad of wild variables. 

Distributed software systems is a well documented topic. Any engineer interviewing for a senior position in a FAANG-like org must do well in what is known in the industry as the “system design interview”. Most of the material I’ve seen however is more optimized towards passing interviews versus practical knowledge that you’d need to comprehend and manage distributed systems in real life.

The purpose of this writing is to offer newcomers to the field, as well as mid level engineers, knowledge on useful topics covering massively distributed software systems. This started as a single article, but then, considering the number of topics that are worth covering,  I decided to break it into a series of articles. My hope is that it would be deep enough to add to your knowledge, but not too deep to the point of exhaustion. 

Throughout this series, we’re going to follow a narrative that you are a key tech lead, who ascended through the ranks to be responsible for the backend of a hypothetical social media app. Let’s start! 

The Big Picture 

At a high level, mature distributed software can be defined as numerous software services deployed across the globe, communicating together to form a cohesive system serving the purposes of the business. 

Image by starline on Freepik

Let’s take a social media app as an example. You need software services that enable users to post new content. You need databases to store this content. You need services that can construct a user’s feed based on the friends of said user & their posted content. You need databases that can store graphs of friends & their friends — etc. If this seems like a lot of services to you, then it’d be my pleasure to share that we have barely scratched the surface.  What about services that handle content moderation? What about ads, the livelihood of almost all top software corporations? What about video processing & streaming?

All of the above described services must work well together, so that the app can deliver the experience your users require to keep engaged. It’s a fairly complex system. If the social media app serves a global user base hailing from different countries & regions, then your services need to be deployed to servers as physically close as possible to said countries & regions. 

Latency

Someone may ask: But why not just deploy in one place and serve all users? One key metric to consider is  latency.  In distributed software, latency is simply the amount of time it takes between you making a request to the software, then for the software to respond to your request. Latency is usually measured in milliseconds or below. It is worth noting that latency is used all over the place in computing; for example, the speed of RAM access is measured in latency terms.

Going back to our social media app, if you scroll through your feed, and it’s slow and barely responsive, you’ll turn your attention elsewhere. Overtime, you’ll likely abandon the app, or worse: move to a competitor. Latency correlates with user retention, which makes it a critical business need. A key factor that affects latency is the roundtrip time it takes for your phone to send a message & receive a response.

Obviously, if you, and hence your device, are in Europe, and the servers are across the Atlantic ocean in the United States, then your roundtrip will take a bit more time. This is why all mature software orgs have software deployed all over the world. That is also why high frequency traders deploy servers that are very close to the exchanges. The more latency a high frequency trader encounters, the more money they lose. 

Latency does not only happen between you and the social media app, it also happens between the myriad of software services that communicate together to form your app. For example: the service responsible for constructing your feed (or timeline) needs to communicate with the graph database service that holds your friends information. If the latency between them is too high, your social media feed will take a long time to form, no matter how low the latency is between the social media app front end servers and your device. Because of this, not only do you need to deploy closer to your users, but you also need to ensure that services that communicate often are not deployed too far from each other. 

Latency for the user is the sum of all the round trips

In real life, Cache storage is used heavily to limit the need for a request to propagate through numerous services. In our social media app, your recently constructed feed would get stored in a cache store. If you request to view your feed again within a short amount of time, the recently stored constructed feed will be retrieved from the cache storage instead of going through the lengthy construction process. Caching strategies is a fairly deep topic.

Another factor that can affect latency is data loads. If you handle massive amounts of data, then your latency needs to be very low, because several roundtrips are needed to handle this data, or else your whole system slows down. Data batching strategies are used to ensure that you don’t execute more round trips than necessary. Data batching is the practice of putting as much data as possible in one request before sending it out in order to reduce the frequency of requests. Heavier requests may suffer from a bit more latency, however reducing the frequency of requests in turn reduces the total latency over time. The amount of data that can be packaged within one second is upper bound by the networking hardware. For example, if the network card has a speed of 100 Mbps, then regardless of how clever the code is, you are limited by 100 megabits per second. Data compression can be used to squeeze more data in, however there is a processing cost for compression and decompression of data. Managing complex systems is all about the tradeoffs. 

Stateful Services

There are several types of software services in any distributed software system. Two of the key types that you will encounter often are stateful , and stateless services. These terms are used quite often in books, talks, and workplaces. However, not always explained. The good news is that they are pretty simple concepts. 

A Stateful service is usually a service that holds state. The word state here usually means data. In other words, a database service is stateful, a cache service is stateful and so on. When a service holds state/data, there is more complexity to operating it.


Back to our hypothetical app. If a European user of our social media app decides to take a vacation in Hawaii, data that the user generated and accessed in Europe should also be accessible to them in Hawaii with no latency penalties. This means that wherever state\data is generated, you need to ensure that it gets copied to other key regions so that the user experience is seamless. 

Also, with stateful services, you need to build mechanisms into your system that would map users to where their data is stored. If your system is global in scale & deployed across numerous locations, the mapping gets complex. There is also the fact that data is not only generated by human users, it is also generated by other services. For example, in a resilient architecture a service that constructs a new feed will ask a stateful service to store a snapshot of the newly constructed feed for caching, future analysis, or disaster recovery. 

Stateless Services

Stateless services are the opposite of stateful services. They are usually services that hold no state/data. They just receive requests, execute some business logic based on the request,  communicate with other services (stateless and stateful) to form a response, then send out the response without saving any data.

In real life distributed software systems, you’ll find much more stateless services than stateful ones. Having said that, stateless services can not deliver any value without their stateful peers, or else your application will never retain any data like users names or their friend list for example. 

In our social media app, let’s assume that the service that constructs the user timeline is a stateless service, which is usually the case. Its core functionality is to receive a request to construct the timeline of a user.  It starts constructing the timeline by asking a number of other services about some key data related to that user like their list of friends, their friends posts, and any relevant ads to insert to the timeline. After collecting the answers, our service then constructs the user timeline in the right format, then sends it to the original requester as a response. Our service does not store any of that data on its own. 

Stateless services are so popular in distributed software because they offer software architects a lot of flexibility. The same service binary would work all the same wherever it is deployed, because there is no need to move any data with it, this is the concern of stateful services. Usually, there are thousands of copies of stateless services deployed per location. You can swat many of them like flies, and tolerate that many of them go down at random without losing crucial data. 

Stateless services also allow architects to implement efficient data retrieval strategies. In our user feed/timeline construction service, it can make smart decisions on whether to use cache services or database services. Cache services offer much faster access to data, however they can’t hold all data at once, which is why database services are needed.

Instances

As established in the overview section, services need to be deployed in various locations that meet the latency needs of your users. This means that if your users are distributed across oceans & continents , you need “copies” Of your services deployed in several regions. The industry term for each copy is instance.

Not only that, but you’ll also need many instances in regions with a high load, so that the load can be distributed across many server nodes within the high load region. If you work for a FAANG-like org, every single region will be high load. Hence, every single location will hosts thousands of instances of each one of your services.

Scaling your service by dividing the load amongst several instances is known as Horizontal scaling. It is the opposite of vertical scaling, where the load is handled by increasing the compute power of your servers, i.e: more CPU/RAM..etc per server. Vertical scaling is never enough in the real world unless your user base is still small & localized. 

In the real world, we do a mix of horizontal & vertical scaling, but with heavy reliance on horizontal scaling to achieve the performance expectations of a global business that relies on software. 

Shards

If the software you need to scale is a stateful service like databases or caches, horizontal scaling gets a bit more complicated. Should you store data belonging to your European users on a server located in North America? or only in your servers at Europe? For high load regions, where you can’t store your entire database in one server, how would you divide this data into  several servers? 

The answer for almost all these questions is dependent on your business needs. This is also not only a technical question, but also a regulatory question. Are you allowed to store said data in regions different than the region where the data originated?

For stateful services like databases and caches, we horizontally scale them over shards. Each shard is simply a subset of the data deployed into a server/VM/ container. When you query for the data, hash functions are typically used to decide which shards hold the data you seek and therefore can handle your request. DB scaling is a very deep topic, which can span books to cover. Suffice to say that there are usually several teams dedicated to this in mature software shops.

One thought on “Distributed Software Systems at scale

Leave a Reply

Your email address will not be published. Required fields are marked *