Cloud disaster recovery turned the world of business continuity upside down. Instead of investing millions in setting up and maintaining a backup site 250 miles away—or settling for nothing, which is what many companies did—it became possible to replicate data or entire applications to the cloud and run a DR site completely on demand.
I’ll review the three common cloud DR models, explain the advantages of setting up DR with the big three cloud vendors—AWS, Amazon and Google—and provide a snapshot (no pun intended) of each cloud’s home-grown DR offering.
3 cloud disaster recovery models
The following models are commonly used to create disaster recovery environments in the cloud:
- Backup and restore data only—synchronize data with a cloud service like Amazon S3, and plan to pipe it back on-premises if a disaster occurs.
- Replicate and fail over entire VMs—at regular intervals, package machine images of your critical applications and save them to the cloud. They can be dormant and started only upon disaster; running in a warm-backup “pilot light” mode; or fully active and participating in a multi-site setup.
- Fail over entire applications / managed DR—packaging an entire on-premise business application with multiple VMs, network configuration, etc. and moving it to the cloud. On-premise and cloud sites will be frequently synchronized to enable immediate failover to the cloud site.
The more of your application you move to the cloud, the lower RTO and RPO values you’ll be able to achieve.
Why work with the big three cloud providers?
There are dozens of providers of Cloud DR services today, ranging from large cloud providers who give you the tools to do it yourself, to independent vendors who promise to package everything in a Disaster Recovery as a Service (DRaaS).
Small DRaaS vendors have matured over the past few years and can provide an excellent solution for many use cases. However, working with the big three cloud providers—Amazon, Microsoft and Google—has a few distinct advantages:
- Trust—because disaster recovery is such a critical business infrastructure, using a small vendor could raise some concerns. Do your due diligence, prefer vendors backed by strong partners—or use the big three and trust issues fade away.
- More control—on most cases DraaS providers use one of the major clouds as the underlying infrastructure. They are essentially reselling those cloud resources with a management layer on top. Using the cloud directly will give you more control to achieve exactly what you need (while it may be more complicated).
- Cost—for the same reason as above, in some use cases working directly with an IaaS vendor will lower DR costs (get the lettuce directly from the farm).
What do the big three cloud providers offer?
Amazon Web Services Disaster Recovery
Amazon suggests that you build your DR solution on top of its plain vanilla cloud services. AWS does not provide a purpose-built DR service, nor does it offer a fully managed DRaaS solution. The assumption is that for those already running on AWS, setting up DR would be easy enough.
How it works
Amazon provides a reference architecture for four levels of disaster recovery:
- Backup/restore—syncing data to S3 and loading back if disaster strikes.
- Pilot light—setting up a minimal environment on Amazon which you can scale up upon disaster.
- Warm standby—setting up a full environment on Amazon, but one which is not actively accessed by users, and switching to it in case of disaster.
- Multi site—setting up a full environment on Amazon and load balancing users between main and backup site.
Costs
No additional costs for DR. You are charged at the regular Amazon pricing, which has per-usage fees for compute instances, bandwidth and special data operations.
The good
It’s AWS—if you love it you love it. You’ll get the tools to do exactly what you need, but go and get yourself an Amazon-certified IT pro.
The bad
While the backup/restore option is easy, all other options require you to model your application on the Amazon cloud and ensure it is identical to the on-premise environment, which is not trivial to achieve.
Google Cloud Platform Disaster Recovery
Like Amazon, Google did not go out of its way to create a packaged DRaaS offering. Instead users are encouraged to build their own disaster recovery solution using Google’s regular cloud services.
How it works
Google’s disaster recovery cookbook explains how to setup disaster recovery in several tiers:
- Replicate storage to Google Cloud storage using the Carrier Interconnect or Direct Peering service
- Replicate application data by creating a Google machine image of your database server and running it on Amazon Compute Engine.
- Google suggests running the database server at all times on a minimal machine instance, and creating additional machine images for application servers, which can be run on demand when disaster occurs (like the Amazon “pilot light” model).
- For more aggressive RTO/RPO values, the application servers can be run at all times.
Cost
Like Amazon, Google does not charge a separate fee for DR setup. Consult the general pricing for Cloud Platform. Google’s Carrier Interconect provides a dedicated uplink with 1 Gbps bandwidth for $1700 per month. Direct Peering costs $0.04/GB in North America.
The good
Compared to Amazon, setting up your environment within GCP should be easier. It’s also easy to convert existing virtual machines or cloud machine images from other providers into Google’s format.
The bad
Do it yourself, but with less configuration and optimization flexibility than Amazon provides. Like Amazon, you have the challenge of creating an identical environment on and off the cloud to enable smooth failover and failback.
Microsoft Azure Disaster Recovery
Azure goes further than AWS and Google, to provide a fully-fledged DRaaS product, called Azure Site Recovery (ASR). ASR is a no brainer if your on-premise infrastructure uses LDAP, SharePoint or other heavyweight Microsoft infrastructure.
How it works
In a nutshell:
- You create an Azure account and install the Mobility Service (an agent) on each on-premise server you want to protect.
- You create a Vault in Azure Recovery Services, setup a Source Environment and assign an on-premise server as a Configuration Server. A special master agent is installed on this machine.
- Via the Configuration Server, ASR discovers other VMs in your on-premise environment and you can add them to your Source Environment.
- Define a Target Environment composed of Azure machine instances and other services.
- Define a replication policy.
While this is simplified, most of the process is guided by a point and click interface and is not rocket science. As soon as you’ve verified connectivity and installed the Configuration Server, you are most of the way there.
Cost
For customer owned sites, Azure pricing is $16/month/instance protected. While a detailed price comparison is beyond our scope, for many use cases this will be vastly cheaper than per-hour pricing on AWS.
The good
Great for Microsoft shops, also applicable and relatively easy to setup for other infrastructure. Inexpensive.
The bad
Somewhat restrictive prerequisites for on-premise machines, including a limit of 1TB for hard disks and support for VMware virtualization only (at the time of this writing).
Conclusion
Amazon and Google make it difficult to set up your cloud-based DR environment. They provide cookbooks and ask you to step into the kitchen. On Amazon at least you have extreme flexibility for special configurations—on Google less so.
Unless you already have your on-premise environment replicated on either of these clouds, the effort required to setup full DR (not just data replication) would be large. Consider and compare costs of out-of-the-box-DRaaS services, such as N2WS for Amazon or CloudEndure for Google.
As for Microsoft—the built-in DRaaS offering, Azure Recovery Services, works and will pull your on-premise environment into the cloud with a guided point-and-click process. If you meet the prerequisites for on-premise services (note a limit of 1TB for local disks), this looks like the best option.