Amazon pulled out all the stops to throw their first Amazon Web Services (AWS) developers conference last week. Here are three key themes I noticed from the talks they scheduled to fill the time between the laser light shows, delicious food and abundant drinks.
I lost count of how many speakers stressed the need to be in multiple availability zones. This is easy to do, and is even easier if you start this from the beginning rather than trying to migrate later on. The last few EC2 outages resulted in downtime for many popular sites and some bad publicity for AWS. In response, the team is making it clear that machine failures are expected and a redundant architecture is required.
More importantly, regularly testing failure cases avoids surprises when they happen in real life. Two strategies:
The obvious advantage of cloud computing is the ability to scale capacity to meet demand. The hidden advantage is the ability to do this automatically. For example, Auto Scaling allows you to automatically launch and terminate new EC2 instances based on defined metrics or on a recurring schedule. This creates an architecture that can respond to unexpected user access patterns the same way that a redundant architecture can automatically respond to unexpected machine failures.
Another example of how to utilize swing capacity is for deployment. Rather than update existing machines, launch a new cluster of machines with the new code and point traffic to them. Rolling back is as simple as pointing traffic back to the old machines. If everything is working, tear down the old ones.
This has two implications for monitoring:
Immediate access to large amounts of processing and storage capability has always appealed to data enthusiasts, but its clear that companies are making serious progress in this area. For example, Netflix has built a data processing architecture in which S3 is their "data warehouse" and EMR jobs run when they need to interact with the data.
There are many best practices around the extraction and transform stages of a data warehouse pipeline (essentially, save everything to S3 and use EMR). I'm excited to see more progress in the later two stages — warehousing and analytics.
If you want to relive the brainwashing magic, here are some of
the highlight sessions: