A lot of tech discussion these days is focused around scaling web app infrastructures to handle huge traffic.
Hacker News abounds with posts about Kubernetes, distributed systems and database replication; the Large-Scale System Design Primer on GitHub is extremely well-loved (113k stars) and chock full of advice on memcache clusters and DB sharding.
In all the heady excitement of getting your web-based notepad app ready to handle 10 billion daily users, it's important not to forget the power of modern computing hardware and how far simple optimizations can go.
You're probably not Facebook. Setting up FAANG-level infrastructure won't make your company into a FAANG, any more than a cargo cult can summon supply-bearing planes by building fake runways and wooden ATC towers.
There's a place for big, scalable cluster infrastructures, but it's a larger scale than many organizations think.
It may sound obvious, but - optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster. Optimizing to 1/100 the time (reducing requests from say 1.5 sec to 15ms) is like adding 99 servers.
That's a 1U server doing the work of two 42U server racks, formerly busy turning inefficient code into heat.
That may be an extreme case - but unnecessary bloat is common, and these kinds of gains could be as simple as adding a well-chosen index to speed up a common query by 10x or 100x, or caching some seldom-changing response in memory rather than re-rendering it every time.
Optimizing is good for the world, too:
Of course there's a tradeoff between the cost of cloud infrastructure and the cost of paying developers to optimize; but paying developers to set up app clusters and database replication costs money too, and has fewer of the benefits and cost savings.
Writing the game engine in C++ for Weapon Hacker, a side project of mine, taught me a lot about how powerful modern hardware really is.
A 60 FPS target gives games about 16ms to do all the work they need to in a frame. Weapon Hacker doesn't even compare to what modern AAA engines do, but it generates its random worlds in about 15ms (which last for about an hour-long playthru), and it handles frames with thousands of particles, physics objects, textures, font glyphs and graphical primitives, as well as sound mixing and music streaming, in about 5ms on a 2010-era CPU.
Compared to a game engine, the job of most web apps is incredibly mundane: take a request (some HTTP text) and return a response (some HTML or JSON or some such). And yet, 100ms or 500ms request processing times are common. Sometimes the info returned is expensive to generate, but it pays not to forget that in the end it's just inputting some bytes and outputting some bytes.
Given the relative simplicity of the task, 16ms of processing per request (after subtracting network latency) should be achievable in most web applications.
On a cheap $5/month dual-core cloud VM, that would allow about 7,500 requests per minute. Or 7,500 pageviews, if you offload static stuff to a CDN. That means 10.8M per day, or 324M pageviews per month.
Of course traffic wouldn't be evenly spread throughout the day, and cloud computing is generally priced for burst loads. But, let's be honest - your traffic isn't anywhere near 324M pageviews, right?
If not, condolences that Notepadly.io isn't a market behemoth yet - but congratulations, you can divert some time from multi-master replication and spot instance auto-scaling, and add a cool Clippy virtual assistant instead. Your users will thank you for it.
Server-side optimization is a big topic, and there are likely people much better at it (and at explaining it) than I am, so here are just a few general pointers.
First - measure.
This can be as simple as putting timers around sections of code to see how many milliseconds they take to execute. Or, look at TTFB (time to first byte) in the Network tab of your browser's dev tools.
2ms and 200ms requests seem similar to human perception, despite a massive 100x time difference, so it pays to have hard data.
Run simple stress tests to see how much traffic your app can handle. A simple one-page python script on a nearby VM or a dev machine can easily spin up a bunch of threads to make requests as fast as possible. Count the number of completed requests in, say, 30 seconds and watch your server's CPU, memory and IO usage to see where they max out.
Compare your stress test results to your day-to-day traffic, and aim for enough headroom that a normal day's traffic peaks at ~1% capacity (or whatever makes sense for your business).
Once you have some baseline measurements, here are a few common performance improvements:
Cloud hosts usually let you upgrade to a higher-powered VM pretty easily in the event of some viral flood of traffic; in AWS, on my clients' projects, we have been able to launch more powerful EC2 instances (and vise-versa, consolidate to smaller, cheaper instances) by cloning to an AMI image, launching a new instance, and rerouting the same elastic IP, in about 5 minutes. If you freeze data changes, you can even stay online during the transition.
That could buy you time to think about clustering if you reach the scale where you need it.
There's a time and a scale for big distributed computing setups. But modern CPUs and cloud VMs are insanely fast, and the job of most web apps is incredibly mundane - you can get very far with some basic optimizations, and will likely do the world, your developers and your users some favors in the process.
Update: some interesting discussion on Hacker News.