19 August 2025

Scaling in System Design: From Tea Stall to Global Café

by Nilesh Hazra

When I was a kid, there was a tea stall near my school. Every morning, ten to fifteen people gathered there, sipping hot chai before rushing to work. The stall had one kettle, one helper, and one wooden bench. Life was smooth.

Then one day, a newspaper wrote about the stall’s “legendary chai.” Suddenly, there were hundreds of customers every morning. The same kettle, the same helper, and the same bench couldn’t handle the rush. People got frustrated, chai ran out, and the owner was overwhelmed.

This is exactly what happens to systems when they grow. Scaling is the art of preparing your system not just for today’s customers, but for tomorrow’s crowd.

Let’s explore the core ideas of scaling in system design, through that tea stall story.

1. Vertical Scaling – A Bigger Kettle

The first thing the chaiwala did was buy a bigger kettle and a stronger stove. This worked. He could serve more people without changing much.

In tech, this is vertical scaling (scaling up): adding more CPU, RAM, or storage to a single machine. It’s simple and effective for a while.

But here’s the problem: no matter how big the kettle, it will always have a limit.

2. Horizontal Scaling – More Stalls, More Helpers

When the crowd kept growing, he realized one kettle wasn’t enough. So, he set up two more stalls down the street and hired extra helpers. Now, instead of one giant kettle, he had many smaller ones working in parallel.

This is horizontal scaling (scaling out): adding more machines or servers that share the load. It’s harder to manage than vertical scaling, but it can handle massive growth.

3. Load Balancing – Who Gets Which Stall?

But a new problem appeared: customers crowded around one stall while the other two sat empty. Chaos returned.

So, the chaiwala introduced a helper who directed people to whichever stall had space. This is what we call a load balancer—a traffic cop that makes sure requests are distributed evenly across servers.

4. Caching – Pre-Made Cups for the Rush Hour

Some customers just wanted simple cutting chai—nothing fancy. The stall began preparing dozens of cups in advance, ready to serve instantly during the morning rush.

This is caching: storing frequently used data closer to the user so you don’t have to “brew” it from scratch every time. It reduces waiting time and frees up resources.

5. Database Scaling – The Recipe Book Problem

At first, the chaiwala had one notebook where he tracked supplies. But when multiple stalls opened, that single notebook became a bottleneck. Everyone wanted to read and update it at the same time.

That’s a database scaling challenge. To solve it, he created:

Read replicas (extra copies for helpers to read without disturbing the main book)
Sharding (dividing records: one book for sugar, one for milk, one for tea leaves)

This way, the system stayed organized even as demand exploded.

6. Elastic Scaling – Flexibility Matters

Some days, crowds were huge (festivals, cricket matches nearby). Other days, it was quiet. Hiring 10 permanent helpers wasn’t sustainable.

So, the chaiwala started calling extra helpers only on busy days. That’s elastic scaling—adding resources when needed, releasing them when not. Cloud platforms like AWS and Azure do this beautifully.

The Key Learning

Scaling isn’t about serving infinite chai with one kettle. It’s about building a system that grows gracefully:

Start with vertical scaling (bigger kettle).
Move to horizontal scaling (more stalls).
Use load balancers to keep traffic smooth.
Add caching for speed.
Scale your databases smartly.
Stay elastic to save costs.

Takeaway: Scaling is less about technology and more about foresight. If you only plan for today, you’ll collapse tomorrow. But if you design with growth in mind, your system can grow from a small tea stall to a global café—serving millions without ever running out of chai.

Have comments or questions? Join the discussion on the original GitHub Issue.

tags: system design