I often get asked about VM reliability and how Azure users can design their VM infrastructure to stay online when Microsoft Azure conducts planned maintenance, or if an unplanned event causes downtime.

There are a few things to consider when it comes to VM uptime in Azure:

  • Physical network and power supply
  • Storage
  • Platform maintenance
  • OS maintenance

What if the power cuts out?

Network links and power supplies can sometimes go down. When setting up VM infrastructure, especially if it is intended for production use, you should always make sure that your VMs are in availability sets.

When two or more VMs are in an availability set, Azure will assume that they serve the same purpose and that they are failovers of each other.

The availability set lets you load balance traffic across all VMs within the set. At the same time Azure ensures that the host machines of the virtual machines in the set are connected to different power supplies and network switches to ensure a potential impact is minimised during an outage.

Note that you need at least two virtual machines in an availability set, it is however recomended to use more than two as it enables easier on-demand scaling and further redundancy in case of a power problem or outage.

It is also important to understand that simply placing a machine in a set does not enable the additional redundancy features. A second machine needs to be created, added, and be on at the time of the problem, for the failover to work as described.

You should have one availability set per application tier. (for example: one web tier availability set, one application tier availability set, etc.)

More information on availability sets: https://azure.microsoft.com/en-in/documentation/articles/virtual-machines-windows-manage-availability/

But that means I need to create several machines instead of one? Isn’t this expensive?

Yes and no. You will end up creating more than one machine for  each tier of your application, but because the traffic will be balanced across them, you will be able to reduce the size of each individual VM in most cases.

Availability sets also enable you to flexibly scale out and in – if required – which can actually lead to cost savings in some scenarios.

Where should the virtual machine disks go?

Every VM should have its own storage account. These storage accounts should be hosted in different storage locations (not Azure locations).

If the VMs are then arranged in an availability set, you can then easier compensate for a potential storage problem.

So does putting my machines into availability sets also protect my machines from being down at the same time during Azure updates?

If it is a platform update (Azure updates software on the Hyper-V host or – more generally – on the platform), then yes, the hosts behind machines that are in an availability set will communicate with each other and make sure they do not go down at the same time. Updating them in sequence enables Azure to always have part of the availability set online.

To find out more about this, check out this link: https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-manage-availability/#configure-multiple-virtual-machines-in-an-availability-set-for-redundancy

That being said, if the described update is an activity on the guest (for example: Windows Update requiring a restart), Azure will not be able to protect you against downtime. A restart due to a guest operating system update cycle is classed as a user-initiated restart, something that the availability set is not designed for.

You can however configure your guest operating system to update during a time set by you. Desired state configuration tools can also be used to install updates in an ordered sequence.