One site that wasn’t outwardly impacted by the February 28, 2017 AWS us-east-1 S3 outage was Netflix.

The reason? While Netflix relies on AWS, it also plans for (e.g. expects, unlike most organizations), architects for and tests failure.

Netflix practices what it refers to as multi-region, active-active replication – replicating data between different AWS regions for a resilient architecture. Netflix recognizes that a complete Region outage is unlikely (until February 28th), but still possible.

A complete discussion on Netflix’s architectural approach can be found in their December 2, 2013 Blog post – Active-Active for Multi-Regional Resiliency.

But more importantly, Netflix tests failure – on their live, production environments!

Would you do that?

Netflix does!

In fact, they created a suite of testing tools, the Simian Army, that they routinely use to invoke and test failure and failover. They started with what they call Chaos Monkey which is a service running in AWS that randomly terminates EC2 instances within Auto Scaling Groups to test resiliency. However, their suite also includes Chaos Gorilla which takes out an entire Availability Zone and Chaos Kong which simulates the outage of a Region. Many of these tools were released into the wild on Github so you can test your own architecture and ability to deal with failure.

As we know, redundancy and resiliency come at a cost. Not only in the architecture, but with the cost of additional services. Adrian Cockcroft, former chief architect for high performance technical computing at Netflix estimated that Netflix’s active-active architecture added about 25% more in costs, with most of that extra cost being in the storage replication.

The bottom line is that it is technically possible to plan for, architect for and test the possibility of an AWS region failure. The Simian Army is more than willing to join in your fight.

In today’s tech world there’s no reason for a startup to even consider purchasing hardware when there are so many cloud hosting options available. The real question that needs to be answered is WHICH hosting provider you should use.

From a startup’s perspective, I think you need to look at the following major areas:

• Cost – For your particular requirements and reasonable growth expectations going forward, how do the costs for the providers you are considering compare? Compute resource costs are generally more than 80% of your total cloud costs, so it is best to start there. However, differences in RAM and CPUs and whether you will be able to take advantage of discounts complicate the analysis process. To make this even more difficult, cloud providers are continually lowering prices with Azure attempting to match AWS pricing. In the end, pricing may not be the determining factor in the selection process.
• Grow with your business – While all of the major hosting providers offer autoscaling and load balancing, both extremely important as you grow, some of the minor providers may not. Also, what are your geographical needs? Of the major players Azure and AWS have a good worldwide distribution of data centers while Google has a late start and is working to catch up. How do the providers handle redundancy across regions and availability zones within regions? Are there contracts involved or can you just add and remove resources as your business needs dictate?
• Features and Functionality – The first things everyone thinks about are computing and storage, but what else might you need or use? VPN, Virtual Private Cloud, Direct Connection to on-premises resources, Search, Transcoding, IoT, Caching, etc. If you’re a Microsoft shop, using Azure might be the best choice. While these may not be the first issues you think about, when looking longer term will they be important to you?
• Databases – Which databases are you going to use? What is supported by the provider?
• Government Contracts – Will you have government contracts and need ITAR and/or FedRAMP compliance?
• Security – How easy are the security features to use? What security features are offered? What security features do you need? Do you need Active Directory integration?
• Documentation – How extensive is the documentation? Is it easy to use?
• Is the administrative platform simple, intuitive and easy to use?
• Support – What support is available? What are the costs of that support?
• Partners – What 3rd party partners are working with the provider? What value are they adding?
• Experienced people – Can you readily find people experienced with the provider’s services? AWS has over 26,000 Certified Solutions Architects, but it is much harder to find people with certified skills in Azure or Google or even harder for the minor providers.
• Provider’s commitment and investment in Cloud Services – What is the provider’s roadmap? Where do they fit on Gartner’s Magic Quadrant for Cloud IaaS? Gartner’s August 2016 report states that the cloud market has undergone significant consolidation around Azure and AWS leaving an uncertain future for other service providers and their customers.

Magic Quadrant

The bottom line is that the choice of a cloud provider needs to be looked at through the lens of the individual organization. Many factors need to be considered and the provider that is right for one startup may not be the best choice for another.