cloud waste

Waste not, want not. That was one of the well-healed quips of one the United States’ Founding Fathers, Benjamin Franklin. It couldn’t be more timely advice in today’s cloud computing world – the world of cloud waste. (When he was experimenting with static electricity and lightning, I wonder if he saw the future of Cloud? :^) )

Organizations are moving to the Cloud in droves. And why not? The shift from CapEx to monthly OpEx, the elasticity, the reduced deployment times and faster time-to-market: what’s not to love?

The good news: the public cloud providers have made it easy to deploy their services. The bad news: the public cloud providers have made it easy to deploy their services…really easy.

And, experience over the past decade has shown that leads to cloud waste. What is “cloud waste” and where does it come from? What are the consequences? What can you do to reduce it?

What is Cloud Waste?

“Cloud waste” occurs when you consume more cloud resources than you actually need to run your business.

It takes several forms:

  • Resources left running 24×7 in development, test, demo, and training environments where they don’t need to be running 24×7. (Thoughts of parents yelling at children to “turn the lights out” if they are the last one in a room.) I believe this is bad habit that was reinforced by the previous era of on premise data centers. The thinking: It’s a sunk cost any, why bother turning it off? Of course, it’s not a sunk cost anymore.

This manifests itself in various ways:

    • Instances or VMs which are left running, chewing up $/CPU-Hr costs and network charges
    • Orphaned volumes (volumes not attached to any servers), which are not being used and incurring monthly $/GB charges
    • Old snapshots of those or other volumes
    • Old, out-of-date machine images

However, cloud consumers are not the only ones to blame. The public cloud providers are also responsible when it comes to their PaaS (platform as a service) offerings for which there is no OFF switch (e.g., AWS’ RDS, Redshift, DynamoDB and others). If you deliver a PaaS offering, make sure it has an OFF switch.

  • Resources that are larger than needed to do the job. Many developers don’t know what size instance to spin up to do their development work, so they will often spin up larger ones. (Hey, if 1 core and 4 GB of RAM is good, then 16 cores and 64 GB of RAM must be even better, right?) I think this habit also arose in the previous era of on-premise data centers: “We already paid for all this capacity anyway, so why not use it?” (Wrong again.)

This, too, rears its ugly head in several ways:

    • Instances or VMs which are much larger than they need to be
    • Block volumes which are larger than they need to be
    • Databases which are way over-provisioned compared to what their actual IOPS or sequential throughput requirements actually are.

Who is Affected by Cloud Waste?

The consequences of cloud waste are quite apparent. It is killing everyone’s business bottom line. For consumers, it erodes their return on assets, return on equity and net revenue. All of these ultimately impact earnings per share for their investors as well.

Believe it or not, it also hurts the public cloud providers and their bottom line. Public cloud providers are most profitable when they can oversubscribe their data centers. Cloud waste forces them to build more, very expensive data centers than they need to, killing their oversubscription rates and hurting their profitability as well. This is why you see cloud providers offering certain types of cost cutting solutions. For example, AWS offers Reserved Instances, where you can pay up front for break in on-demand pricing. They also offer Spot Instances, Auto-Scaling Groups and Lambda. Azure also offers price breaks to their ELA customer and Scale Sets (the equivalent of ASGs).

How to Prevent Cloud Waste

So, what can you do to address this? Ultimately, the solution to this problem exists between your ears. Most of it is common sense: It requires rethinking… rewiring your brain to look at cloud computing in a different way. We all need to become honorary Scotsmen (short arms and deep pockets… with apologies to my Scottish friends).

  • When you turn on resources in non-production environments, turn on the minimum size needed to get the job done and only grudgingly move up to the next size.
  • Turn stuff off in non-production environments, when you are not using it. And for Pete’s sake, when it comes to compute time, don’t waste your time and money writing your own scripts…that just exacerbates the waste. Those DevOps people should spend that time on your bread and butter applications.
  • Clean up old volumes, snapshots and machine images.
  • Buy Reserved Instances for your production environments, but make sure you manage them closely, so that they actually match what your users are provisioning, otherwise you could be double paying.
  • Investigate Spot fleets for your production batch workloads that run at night. It could save you a bundle.

These good habits, over time, can benefit everyone economically: Cloud consumers and cloud producers alike.