According to most organizations, the biggest drivers to cloud are elasticity and agility. In other words, it allows you to instantly provision and de-provision resources based on the needs of the business. You no longer have to build the church for Sunday. Once in the cloud though, 80% of companies report receiving bills 2-3 times what they expected. The truth is, that while the promise of cloud is that you only pay for what you use, the reality is that you pay for what you allocate. The gap between consumption and allocation is what causes the large and unexpected bills.

Cost isn’t the only challenge. While most organizations report cost being their biggest problem in managing a public cloud environment, you cannot truly separate performance from cost, the two are tightly coupled. If an organization was optimizing for cost alone, moving all applications to the smallest instance type would be the way to go, but no one is willing to take the performance hit. In the cloud, more than ever, cost and performance are tied together.

To guarantee their SLAs, applications require access to all the resources they need. Developers, in an effort to make sure their applications behave as expected, allocate resources based on peak demand to ensure they have access to those resources if they need them. Without constantly monitoring and adjusting the resources allocated to each application, over-allocation is the only way to assure application performance. Overprovisioning of virtualized workloads is so prevalent, that it’s estimated that more than 50% of data centers are over-allocated.

On-premises, over-allocation of resources, while still costly, is significantly less impactful to the bottom line. On-premises the over provisioning is masked by over-allocated hardware and hypervisors that allow for sharing resources. In the cloud, where resources are charged by the second or minute, this over provisioning is extremely costly, resulting in bills much larger than expected.

The only way to solve this problem is to find a way to calibrate the allocation of resources continuously based on demand, or in other words, match supply and demand. This would result in TRULY only paying for the resources you need when you need them, the holy grail of cost efficiency. The ideal state is to have the right amount of resources at the right time, no more, and the only way to achieve that is through automation.

So why doesn’t everyone do that?

This is a complicated problem to solve. To achieve that we must look at all resources required by each application and match them to the best instance type, storage tier and network configuration in real time.

Let’s take a simple application running a front end and a back end on AWS EC2 in the Ohio region using EBS storage. There are over 70 instance types available. Each instance type defines the allocated memory, CPU, the benchmarked performance of the CPU to be expected (not all CPU cores perform equally), the available bandwidth for network and IO, the amount of local disk available and more. On top of that, there are 5 storage tiers on EBS that would further define the IOPS and IO throughput capabilities of the applications. This alone results in over 350 options for each component of the application.

Taking a closer look at network complicates matters even further.

Placing the two components across AZs will result in costly communication costs back and forth between the AZs. In addition, the latency in communication across AZs, even in the same region, is larger than within the same AZ, so depending on the latency sensitivity of the application the decision on which AZ to place the app on impacts the performance of the application, not just the cost. Placing them on the same AZ is not a great option either – it increases the risk to the organization in case of an outage on that zone. Cloud providers would only guarantee five 9s (99.99999%) up time when instances are spread across more than a single zone. In the Ohio region, there are 5 availability zones which brings us up to the need to evaluate 1,750 options for each component of the applications. Each of these options need to be evaluated against the memory, CPU, IO, IOPS, Network throughput and so on.

The problem is just as complicated on Azure, with over X instance types and different levels of premium and standard storage tiers and the recent introduction of availability zones.

Where you get the data to back up your decisions is important as well. When looking at the monitored data at the IaaS layer alone neither performance or efficiency can be guaranteed. Let’s take a simple JVM as an example. When looking at the memory monitored at the IaaS layer it will always report using 100% of the heap, but is it utilizing it? Is the application garbage collecting every minute or once a day? The heap itself should be adjusted based on that to make sure the application gets the resources it needs, when it needs them. CPU isn’t better. If the IaaS layer is reporting an application consuming 95% of a single CPU core, most would argue that it needs to be moved to a 2 core instance type. Looking into the application layer allows you to understand how the application is using that CPU. If a single thread is responsible for the bulk of the resource consumption adding another core wouldn’t help but moving to an instance family with stronger CPU performance would be a better solution.

To sum it up, assuring application performance while maintaining efficiency is more difficult than ever. The only way to truly only pay for what you use you must match supply and demand across multiple resources, from the application layer down to the IaaS layer in real time.