Amazon AWS announces load balancing

On May 17 2009, Amazon announced a new set of services. Included are two features I always wanted to see in their portfolio, namely auto scaling of an entire web application as well as load balancing. Unsurprisingly, Amazon calls these features Auto Scaling and Elastic Load Balancing.

Load balancing was something that, from a developer’s point of view, I didn’t want to be bothered with. Sure, I wanted it to work, but I didn’t want to get it to work. In the past, developing an application on top of an Infrastructure as a Service (IaaS) provider like Amazon often involved setting up your own load balancer, for example by creating a new instance from an Amazon Machine Image (AMI) with some suitable load balancing software. This was when the offerings of a Platform as a Service (PaaS) provider like Microsoft’s Azure platform became quite compelling (For differences between IaaS and PaaS, have a look here). Hosting an application on Azure already includes / will include load balancing, auto scaling and auto curing. That’s what PaaS is about: It is a more specialized stack than IaaS taking away some of your freedom of choice (Azure currently supports .NET and PHP apps while IaaS lets you use virtually everything) but in turn offers some features that IaaS normally doesn’t provide out of the box.

This is the reason why this offering affects Microsoft: it increases pressure on the development of the Azure platform. Amazon just took away one of the reasons to choose a tightly integrated PaaS provider over an IaaS provider. Of course, that feature has to come with a price, otherwise Amazon would lose money for all the running instances that currently do the work of a load balancer. I think that price is reasonable, which currently is at “$0.025 per hour for each Elastic Load Balancer, plus $0.008 per GB of data transferred through an Elastic Load Balancer”.

Let’s compare both approaches: Without Elastic Load Balancing, one had to pay for (at least) one instance running the load balancer (assumed this was an extra instance) but no traffic costs since traffic between EC2 instances within the same availability region is free of charge. We could have been using a reserved instance (linux), so we would have to pay the 1-year fee plus the hourly charges for one year, leaving us with

325 + (365 * 24 * 0.03) = {\bf 587.8} $

With Elastic Load Balancing, we have to pay

365 * 24 * 0.025 = {\bf 219} $ plus traffic charges, which makes it hard to put any number here. Still, we can tell how many GB we could serve through Elastic Load Balancing before it gets more expensive than our first solution:

(587.8 - 219) / 0.008 = {\bf 46,112} GB. That’s right, that’s about 45 Terabyte. I say, that’s fair enough to get anyone started 🙂

We have to keep three things in mind here:

  1. Normally, we wouldn’t use only one load balancer for the first solution because this would be a single point of failure. But if we trust Amazon’s Elastic Load Balancer to be inherently fault tolerant, we still could use one Elastic Load Balancer instead of two, making this solution even more worth the money.
  2. You can keep the amount of GB flowing through the Load Balancer small. You do not have to serve all your pictures, videos etc through your web server, instead you should use another server or S3 or CloudFront or whatever. This separation for media files is no special do-fancy-stuff-to-reduce-my-amazon-fee workaround; it’s a common approach.
  3. In a small company / startup, I would definitely go for Elastic Load Balancing because it has this nice aaS aspect: It hides complexity from me, thus saving me time and nerves. It also integrates nicely with other services from Amazon AWS such as auto-scaling and monitoring.

Combined with Auto Scaling, Amazon provides the possibility to let an application react upon increasing and decreasing traffic, distributing load across several servers and even cure itself by replacing unresponding instances with freshly booted ones which register themselves to the load balancer.

This will probably also give a headache to the guys over at Rightscale and Scalr which filled the gap by providing monitoring, auto-scaling, loadbalancing and self-curing on top of Amazon AWS. Of course, their services are more comprehensive than Amazon’s beta features but they are also more expensive. Over at RightScale it says “The RightScale Platform is available in editions starting at $500 a month with a one-time integration and access fee of $2,500.” For small companies, Amazon just saved the day.

Wrap-up: I believe the new offerings from Amazon (Auto Scaling, Monitoring and Elastic Load Balancing) are very attractive to developers with limited knowledge in infrastructure and network topics as well as to start-ups or other small businesses which do not want to pay for full-blown equivalents like RightScale. Amazon expands a core concept of cloud computing (grow dynamically, pay dynamically) to new features.

Links on that topic:

Azure Storage Manager

Recently I’ve been playing around with Windows Azure and wanted to get the log files for my hosted app.

I tried to get the logs using PowerShell and that worked in one case, on another box I got errors with PowerShell and couldn’t quite tell why. Anyway, I found it tedious to set up. What I wanted was a point + click solution to have all my logs on the hard drive. Another time I realized I created a lot of tables with Azure Table Storage and wanted to clean them up. I was missing a simple tool that would help me with these tasks. So I sat down and fired up Visual Studio.

I gave this app the humble name “Azure Storage Manager” as it can deal with tables as well, at a later point in time possibly even with queues.
Currently it can do this to tables:

  • List
  • Delete

ahem, that’s about it. Now for blobs:

  • List + show properties (well, some)
  • Delete
  • Copy to hard drive
  • All of the above also applies to whole blob containers. This is important because it allows you to get a container full of blobs with one click. Multi selection of blobs and containers is also supported.

It can store and use different account settings, which might come in handy if you happen to have different storage projects on Windows Azure.

This app is completely standalone in that it does NOT require PowerShell or Azure Development SDK installed. It worked for me under XP, Vista and Windows 7, which is hardly surprising as this is what the .NET runtime is for. Just wanted to make the point 🙂

To install / download the application, please head over to its ClickOnce installer site.

The following walkthrough shows how the application works.

When you start the application for the first time, it will complain that there are no settings stored. You will be presented with this screen where you can enter the information that has been given to you when you created your Azure Storage Project. You need to fill in your account name and shared key. The actual endpoints are being inferred from that information. After you press Save, your account settings will be persisted as XML files. To do so, you should specify where to put the files using Set Folder.

Set Settings Folder

In case you already have a folder with settings files in it, simply set the folder and it will show all the settings in the list on the left. Double-clicking on an item in the list will apply these settings to the application, just like pressing Use Setting does.

Settings loaded

After that initial setup,  you can switch to the Blob tab and you will find a listing of your blob containers and blobs. You will have to set a path in order to be able to copy blobs or containers. In this screenshot a container is selected, so any delete or copy operation will apply to the whole container includin all contained blobs. Copying on container level adds the container name to the path you specified; in this example it would create a folder C:\AzureBlobs\production to put all the listed blobs in.

Container scope

Here, we selected multiple blobs. As you can see, copy and delete now works on blob level. In this case, container name is excluded from save path.

Blob Scope

I might post the code for this app in a more technical follow-up, however I have to warn you that according to Phil Haack’s Sample Code Taxonomy, this is still prototype code. It works for me and does its job if you treat it well, but the amount of error handling that is not included in this code is tremendous 😉

What do you think about it?

Cloud Computing demystified

When putting my head into The Cloud for the first time, I found it very confusing to understand what Cloud Computing means as a term and what it means to me as a developer, specifically

  • What are the core concepts of Cloud Computing?
  • What are the different types of Cloud Computing?

A short while ago, I happened to dig into this topic because I was giving a talk on Cloud Computing. I’ll post a quick roundup in case anybody is interested.

I like to compare the term Cloud Computing to AJAX. Just like with AJAX, the technology behind Cloud Computing is not completely new or even revolutionary. And just like with AJAX, it actually makes a fair bit of sense to give a bunch of known technologies a new name once you herd them together for a specific purpose, because now whenever someone uses this new term everybody else will know what the heck he is talking about, including

  • What technology is being used
  • What kind of problems are likely being addressed
  • That he is indeed very cool

Or so it should be. I have to say, since Cloud Computing became such a hype and everybody is doing Cloud right now, it doesn’t help in understanding what actually is going on.

Enough talk, you might say, and you’d be right. Let’s get started.

Core Concepts

The most important idea behind Cloud Computing is <Anything> as a Service. Again, the idea of aaS is not brand new. As Rob Conery once pointed out (can’t quite remember where it was), by taking a taxi you’re already leveraging the idea of aaS. Rob called this Car as a Service, and we will quickly look at what aaS means and how we can apply it to cars.

  • No aquisition costs, no overhead costs:  You did not have to buy a car to get you to the train station and you do not have to pay a fixed amount of money per month to get the right to call a taxi whenever you need it.
  • Variable costs: You only pay for what you consume. In case of a taxi, this is being calculated by distance and time.
  • Hidden complexity and maintenance: Some of the major headaches in using a system are being abstracted so you don’t have to care about them. In this case, you don’t care how difficult it is to start that old engine or how often, why and when they have to take it to a workshop to keep it from falling apart as long as you still catch your train.
  • Economies of scale: Okay, maybe our example doesn’t work too well for that one, but image that they’re buying lots of taxis and therefore get an amazingly low price and then you as a customer will benefit from that. Right?
  • Scalability: In some scenarios, most of the time you need very little resources and suddenly there is a peak and you need lots of it. For example, one day you need to get your whole family (really the whole family: children, parents, relatives) to the train station. You certainly don’t want to buy 4 cars and have them standing around the whole time just to be prepared for this rare occasion? Just get 4 cabs instead of one and you’ll be fine. That’s what I call scalability.

That CaaS comparison worked pretty well for me to show what aaS is about. Then I started to wonder what it means when applied to computers instead of cars.

Different Types

As mentioned above, we can apply this Service principle to virtually anything by just sticking one letter in front of aaS and you’re done. Let anyone else figure out what you meant by it. There are a lot of services in the area of Cloud Computing, that’s one reason why it is such a big mess when you look at it first. I found it helpful to distinguish between 3 different types of Cloud Computing, although there are more and one size doesn’t fit all.

Infrastructure as a Service (IaaS)

In this simplified 3-tier view of Cloud Computing, IaaS is at the bottom of our stack and the most basic layer. It basically provides abstraction over hardware (in fact, IaaS was formerly called Hardware as a Service) by applying the principles of aaS to bare metal. I like to see Amazon Web Services (AWS) as a good example for IaaS. Their services include:

  • Elastic Compute Cloud (EC2):  provides you with virtual machines (they are using XEN virtualisation) that are running in Amazon’s datacenter once you fired them up (the VMs, not Amazon). Their EC2 pricing model says you pay them on an hourly basis for that. That’s it! Here is our aaS principle again: No registration fees, no monthly fees, no overhead costs.
  • Simple Storage Service (S3): This is a persistent storage service (see, we could already call that Storage as a Service). It goes like this: Once you’ve uploaded your file, they will replicate it so you don’t have to think about backup. You will pay them based on your usage which is, in this case, measured by 1) storage size, 2) time and 3) data transfer (in and out)

They have a lot of other services and are constantly addind new ones, however these two are what I believe to be the most commonly used ones. All of their services are directly accessible as web services (useful for programmatic control) as well as by their management console.

In IaaS, you are admin for every virtual machine you create. You can install every software you need (web server, database and so on), you can also use the programming language of your choice to implement your application you want to host. This gives you great flexibility, but also great responsibility because, at this point, it’s only the hardware you do not have to care about. You still have to patch your OS and all the software you installed, for example. In theory, this gives you infinite scalability in matters of minutes since all resources are virtualized and at every given point in time you could step up and say “give me two more of these”.

Now let’s move on in the stack.

Platform as a Service (PaaS)

PaaS provides more abstraction than IaaS because in addition to hardware, it also provides an abstraction layer for the OS as well as commonly used components like web server, database, load balancer and the likes. One example for PaaS is Microsoft’s Azure platform. Although we’ve been told that Azure is an OS for the cloud, we never actually see this OS because we’re using PaaS.

We’re not admins, we don’t get access to the OS which means we can’t choose what to install and we don’t have to. The responsibility for patching and maintenance for all necessary components is now up to the PaaS provider. For every programming language (Azure currently supports .NET and PHP) a PaaS provider will need another set of tools to run your app. You as developer take your application, load it up to the PaaS provider and say ‘run it’.

See the difference to IaaS? PaaS is a more specialised stack which takes away work as well as decisions. If this doesn’t limit you too much (do they support your programming language?), PaaS will get you started more quickly.

Time for the next step.

Software as a Service (SaaS)

With SaaS, we’re leaving the Cloud Computing domain for developers and enter that for end users. What can Cloud Computing do for end users? All the principles of taxis should apply here too.

With SaaS, you don’t buy the software. You only pay for it when you use it. You don’t have to install it on your machine, it’s running somewhere else and someone else takes care of it. You just use it. Sometimes it’s even free until you need more advanced functionality. Picnik, for example, lets you edit your pictures online, right in your browser. It’s free and gives you enough options for the occasional picture manipulation session, removing the need to install a full-blown image manipulation solution on your machine.

To summarize …

There are 3 manifestations of Cloud Computing:

  • Infrastructure as a Service (IaaS)
  • Platform as a Service (PaaS)
  • Software as a Service (SaaS)

With CC, you get different levels of SEP (Somebody Else’s Problem). Depending on which layer of Cloud Computing you’re using, someone else will take care of necessary hardware, operating system or even a complete application.

The goal is to reduce upfront investments, complexity and time to market and to allow for great scalability and economies of scale using virtualized resources.

Usage in Cloud Computing typically is measured as

  • Time (CPU time or real time)
  • Data transfer (inbound and outbound)
  • Storage (measured in MB or GB)
  • Number of transactions (like GET and PUT)

I hope that helped anybody to see Cloud Computing in a less foggy way.