ESX 4 Fault Tolerance

In the next step in virtualization evolution; VMware is introducing vLockStep. This technology is the beginning of a true active, near zero downtime clusters. David Davis reports on techtarget, that vLockStep will create a standby running image of the VM. This VM is running in step with the primary VM, any change on the primary VM is nearly instantaneously completed on the standby image. In case of hardware failure of the ESX node hosting the primary image, the standby will become active and continue on servicing clients as if nothing happened. A video on VMware’s website demonstrates this active/standby connection very effectively.

There are some additional requirements that Scott Lowe wrote about in his blog at One of the biggest is that the FT pair must use thick provisioned vmdks. Then provisioned disks will be expanded to thick. On an NFS based datastore, one of the benefits is to leverage thin provisioning. Scott also writes that a minimum of 4 NICS are required to support vLockStep: Service Console, Clients, FT, and vMotion. Given that redundancy is a given in the production environment, we are now up to a minimum of 8 nics, and at least 2 more if the NFS datastore is used.

Also, there are limitations to vLockStep. One such limitation is that the VM guest can only be single vCPU. A second limitation is that the secondary VM must reside on the same datastore as the primary. I’m making an assumption on the third limitation: It would appear that the secondary VM will need to run on the same VI cluster as the primary.

The first limitation of single vCPU is the largest shortcoming of vLockStep. Most of the critical applications that would benefit from a truly active cluster are going to be database servers and mail servers. Critical servers that the entire business requires, most of these servers are going to be multi vCPU. The second and third limitation would have been niceties. Placing a vLockStep enabled VM across both multiple datastores and multiple clusters would have truly enhanced the disaster recovery capabilities of VMware. Firms would be able to survive not only a small ESX host failure, but also entire storage failures, Power Failures, natural disasters. Hopefully, the vLockStep technology will be enhanced in future releases of ESX.

