Optimizing Your Web App 100x is Like Adding 99 Servers

A lot of tech discussion these days is focused around scaling web app infrastructures to handle huge traffic.

Hacker News abounds with posts about Kubernetes, distributed systems and database replication; the Large-Scale System Design Primer on GitHub is extremely well-loved (113k stars) and chock full of advice on memcache clusters and DB sharding.

In all the heady excitement of getting your web-based notepad app ready to handle 10 billion daily users, it's important not to forget the power of modern computing hardware and how far simple optimizations can go.

You're probably not Facebook. Setting up FAANG-level infrastructure won't make your company into a FAANG, any more than a cargo cult can summon supply-bearing planes by building fake runways and wooden ATC towers.

There's a place for big, scalable cluster infrastructures, but it's a larger scale than many organizations think.

The case for optimizing

It may sound obvious, but - optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster. Optimizing to 1/100 the time (reducing requests from say 1.5 sec to 15ms) is like adding 99 servers.

That's a 1U server doing the work of two 42U server racks, formerly busy turning inefficient code into heat.

That may be an extreme case - but unnecessary bloat is common, and these kinds of gains could be as simple as adding a well-chosen index to speed up a common query by 10x or 100x, or caching some seldom-changing response in memory rather than re-rendering it every time.

Optimizing is good for the world, too:

1/10 or 1/100 the carbon footprint
Cleaner, more maintainable code and systems - optimizing often just means simplifying and removing unnecessary parts
Snappier experience for users (latency is generally lower if you can optimize enough to run on a single server rather than a bunch of networked ones)
Reduced hosting costs
Happier users - developers can focus more on solving users' problems and less on infrastructure

Of course there's a tradeoff between the cost of cloud infrastructure and the cost of paying developers to optimize; but paying developers to set up app clusters and database replication costs money too, and has fewer of the benefits and cost savings.

Modern computers are mind-numbingly fast

Writing the game engine in C++ for Weapon Hacker taught me a lot about how powerful modern hardware really is.

Coming from a C# / Python / Javascript / JIT-compiled world, my early experiments putting graphics on the screen in native C++ and DirectX were startling; the compiled executable was tiny, started up instantly, and could put hundreds of thousands of objects on the screen at 60 frames per second without breaking a sweat.

A 60 FPS target gives games about 16ms to do all the work they need to in a frame. Weapon Hacker doesn't even compare to what modern AAA engines do, but it generates its random worlds in about 15ms (which last for about an hour-long playthru), and it handles frames with thousands of particles, physics objects, textures, font glyphs and graphical primitives, as well as sound mixing and music streaming, in about 5ms on a 2010-era CPU.

Compared to a game engine, the job of most web apps is incredibly mundane: take a request (some HTTP text) and return a response (some HTML or JSON or some such). And yet, 100ms or 500ms request processing times are common. Sometimes the info returned is expensive to generate, but it pays not to forget that in the end it's just inputting some bytes and outputting some bytes.

Given the relative simplicity of the task, 16ms of processing per request (after subtracting network latency) should be achievable in most web applications.

On a cheap $5/month dual-core cloud VM, that would allow about 7,500 requests per minute. Or 7,500 pageviews, if you offload static stuff to a CDN. That means 10.8M per day, or 324M pageviews per month.

Of course traffic wouldn't be evenly spread throughout the day, and cloud computing is generally priced for burst loads. But, let's be honest - your traffic isn't anywhere near 324M pageviews, right?

If not, condolences that Notepadly.io isn't a market behemoth yet - but congratulations, you can divert some time from multi-master replication and spot instance auto-scaling, and add a cool Clippy virtual assistant instead. Your users will thank you for it.

Optimization basics

Server-side optimization is a big topic, and there are likely people much better at it (and at explaining it) than I am, so here are just a few general pointers.

First - measure.

This can be as simple as putting timers around sections of code to see how many milliseconds they take to execute. Or, look at TTFB (time to first byte) in the Network tab of your browser's dev tools.

2ms and 200ms requests seem similar to human perception, despite a massive 100x time difference, so it pays to have hard data.

Run simple stress tests to see how much traffic your app can handle. A simple one-page python script on a nearby VM or a dev machine can easily spin up a bunch of threads to make requests as fast as possible. Count the number of completed requests in, say, 30 seconds and watch your server's CPU, memory and IO usage to see where they max out.

Compare your stress test results to your day-to-day traffic, and aim for enough headroom that a normal day's traffic peaks at ~1% capacity (or whatever makes sense for your business).

Once you have some baseline measurements, here are a few common performance improvements:

Cache infrequently-changing responses in server memory
Optimize your database queries - add a well-chosen index, or delete an unneeded one (indices are separate internal tables, so you can sometimes save a lot of storage space and I/O time by removing one); tweak your WHERE conditions to narrow down the rows the DB has to scan through internally
Reduce the number of HTTP requests; don't let REST purists tell you you need a separate URL for each entity - if your app always needs customer & order data at the same time, make it a single API call
Let clients cache static files indefinitely; use cache-breakers on the URLs for content changes
Offload static files to a CDN
Make sure your web server and database are set up and tuned right - that they have enough memory, threads, etc.
Consider a fast single-file database like SQLite; it's not for every application, but I've found it handles individual queries about 5x faster than the big client-server DBMS's, with about 1/5 the storage space, and with much less administrative complexity. With its performance numbers it can actually power large sites. It also supports sharding scenarios (e.g. a database file per customer) which can get around the write-concurrency limitations for some applications.
Learn about code performance - in particular the dominance of memory latency as a slowdown factor, and the high price of I/O. Chandler Carruth has a great talk on these from a C++ perspective. And Mike Acton's talk on data-oriented design is a classic.

You can upgrade your VM in a pinch

Cloud hosts usually let you upgrade to a higher-powered VM pretty easily in the event of some viral flood of traffic; in AWS, on my clients' projects, we have been able to launch more powerful EC2 instances (and vise-versa, consolidate to smaller, cheaper instances) by cloning to an AMI image, launching a new instance, and rerouting the same elastic IP, in about 5 minutes. If you freeze data changes, you can even stay online during the transition.

That could buy you time to think about clustering if you reach the scale where you need it.

In summary

There's a time and a scale for big distributed computing setups. But modern CPUs and cloud VMs are insanely fast, and the job of most web apps is incredibly mundane - you can get very far with some basic optimizations, and will likely do the world, your developers and your users some favors in the process.

Update: some interesting discussion on Hacker News.