Mordtech’s Blog

General Technology Blog

Imaging VMware ESX Guest using ImageX

VMware ESX has the ability to clone templates, and this is a great feature when using iSCSI or FC datastores. When leveraging NFS datastores however, you lose thin provisioning on those NFS datastores. One way to get around this is to use a third party imaging software.

Microsoft provides a free imaging package titled imageX. You can read about on technet from here. A quick synopsis is that, unlike other imaging software, ImageX focuses on the files instead of disk blocks. This allows ImageX to leverage a Single Instance Store (SIS). Image X will compress the first image at around 33 to 50% of the on disk size, and will store the image in a file with a WIM extension. The WIM file holds the SIS and also indexes of which blocks of data are associated with the image. The benefit of the SIS and imageX can be found when appending a second image to the WIM. imageX will create a second index in the WIM file. It will then start imaging the machine, it will find a file and compare that to the SIS. If the file is found, it will add a pointer to the new index and move on. If the file is not found, imageX will add the file into the SIS and then add a pointer in the new index and move on.

To use imagex, you will need to download the Windows Automated Installation Kit (WAIK). After installing the WAIK, you can follow the instructions found here, on svrops.com, to create a WinPE boot CD. Before you create the CD using he OSCDIMG, you will need to inject the Network and SCSI drivers required for ESX. To do this, you will first need to get the correct drivers. You can either scour the internet looking for them, or you can just select install VMware tools from a guest VM. On the Guest VM, open my computers, Open the CDrom, \program files\VMware\VMware Tools\Drivers\ and copy the Vmxnet and SCSI folders. Now, on the machine that you installed the WAIK on, run the following commands

peimg /inf=DRIVE:<location you copied the network driver>vmxnet.inf /image=DRIVE:<mount location of wimfile>

peimg /inf=DRIVE:<location you copied the network driver>vmscsi.inf /image=DRIVE:<mount location of wimfile>

you can also use the same commands to inject other drivers, if you are going to leverage the boot CD across other hardware also. After you’ve injected all of the drivers that you plan on, go ahead and complete the instructions found on svrops.com. Now, after the OSCDIMG command, you will have a bootable WINPE iso. One point, make sure that you when you run the unmount command, ensure that you use the /commit command. If not, all of your changes will be lost, and you get to do it again.

Image Capture

So now that you have a bootable winpe iso, with imagex and the correct drivers, what do you do. First, build a gold image(s) of Windows XP, vista, 2003 and/or 2008. Patch everything with the latest service packs, security patches, etc… Also, its best to build the initial gold image with multi processors. That way you can use the same image for single/multi processors with out needing to change the HAL. next sysprep the gold image. You can find instructions for running sysprep here. After running the sysprep, start the VM and mount the CDrom as the iso created.

After booting, you will need to mount a network share, use a command similar to: Net use m: \\<servername>\share.

next go to x:\program files\imagex. Depending on whether this is a new WIM that you will be creating , or an existing WIM that you will be adding to. If new, type imagex /capture <driveletter> M:\<wimfilename>.wim “<description of capture>”. If existing, change the /capture to /append. This is important, if you do a capture into an existing wim file, it will overwrite the wimfile. Bad Juju!!! If you have multiple drives, after the initial capture, just change the <drive letter> to the next drive letter and repeat.

Image Apply

Create a new VM manually. It doesn’t need to be identical, but ensure that the hard disks are large enough to hold the uncompressed data from the gold image. Next, boot to the ImageX WinPE iso created earlier. After booting up, you will need to run diskpart, you can find websites online that detail everything about diskpart, but to create a basic C drive, you will need to run the following commands.

Diskpart.exe

Select disk 0

Create partition primary size=<size of disk in Mbytes>

Select partition 1

Format fs=NTFS label=”Sys” Quick

Active

Now, for each additional disk, select disk <disk> and run all of the same commands as above except, change the label to a description of the drive. Also, run the active command on the sys drive. After you have configured all of the drives type exit to get back to the command prompt. From the command line, run the net use command again. Next, change to X:\program files\imagex\. Type imagex /apply m:\<wimfile>.wim <index number> c: /verify. After the image is applied, you can rerun the imagex /apply command change the index number and the drive. After complete, unmount CDrom and reboot the VM. You should now be greeted by the Windows mini-setup.

While not as fast as VMware builtin clone from template, it does allow you to continue to leverage the thin provisioning inherent in NFS datastores. In our environments, it takes roughly 20 minutes to build a Windows 2003 VM, versus about 10 minutes to build the same VM from template.

December 14, 2008 Posted by mordtech | ESX, Microsoft, NFS, VMware | , , , | No Comments Yet

Microsoft Licensing 3 – Clusters

In this post, I’ll discuss licensing when working in a clustered Microsoft environment. I’ll pick some of the more common Microsoft Apps and detail what is required to properly license them. Windows 2003 and 2008 support eight node clusters. In a two node cluster, you can technically have them configured in an active/active configuration. However, this is not considered best practice by Microsoft. They recommend running in an Active/Passive configuration. Three, four and five node configurations must have one passive node; the other nodes can be active. In the cluster, you can only have up to four active nodes, so nodes five through eight must be passive. All nodes must be licensed with either Windows Server Enterprise or Windows Server Datacenter. In most cases, Windows Server Enterprise makes the most financial sense. Enterprise has an MSRP of $4,000 per server up to eight physical processors versus $3,000 per physical processor using the Datacenter SKU. If there are more than 8 physical processors, you must use Datacenter.

Hyper-V cluster

The Datacenter SKU makes the most sense as the basis of your Hyper-V cluster in most Hyper-V environments. With the free unlimited guest OS licensing on each server, the breakeven point is 8 guest OS across the 2 node cluster. You receive the right to run 4 instances of Windows OS with each license of Enterprise. In a cluster, during a failover situation, more than 4 VMs might be running on the single node. Therefore, you would need to buy an additional 4 licenses of Standard. That would put Enterprise and datacenter, both at $12,000 for a 2 node cluster. Above 8 guests on the cluster, or when you are running copies of Enterprise on the guest VMs. If you have four physical processors, you would need to run roughly 20 guest VMs to break even.

This is a good point to add a quick discussion about virtualized environment hardware. Should you buy bigger 4 or 8 processor machines with multi-core or go wide in your cluster with dual processor boxes. When I’m designing clusters for a virtualized environment whether VMware Infrastructure or Hyper-V, I go wide first, and then scale up. My reasoning for this is simple, in both the VMware licensing model and Microsoft’s Datacenter licensing model; it is per socket costs. If you have 4 cluster nodes of dual processors or 2 cluster nodes of 4 processors. Both VMware and Microsoft will charge the same cost. Normally though the hardware costs to purchase two quad processor servers would be higher than four dual processor servers. You also gain the ability to have a higher utilization rate when going wide. In a two node cluster, you can only run at 50% capacity. But with a 4 node cluster, all four nodes can run at 75%. Another issue is that when running the larger hardware, a single physical server failure will shutdown basically 50% of your environment until they restart on the other node. When going wide, only about 25% of your environment will go down.

SQL Server

With SQL server, you would more than likely use the Windows 2003 Enterprise license. Unless you’re servers have more than eight processors, and if they do, you probably need this blog entry to explain licensing. Starting in SQL 2005, SQL enterprise is no longer a requirement for a SQL cluster. The Enterprise license now gives you additional features such as data warehousing. Microsoft is generous with SQL licensing in the cluster. You do not need to purchase a license for the passive node. License the active node with either per device or processor licenses and go. Again, in the same scenario as with Hyper-V, the licensing is the same whether you have two eight processor servers versus eight dual processors servers. Again, if you need more processing power on your database, you probably don’t need this blog. I should be talking to youJ.

Exchange

In a clustered environment of Exchange, you must run Exchange Server Enterprise per the Exchange Server 2007: Platforms, Editions and Versions web page. Also, you need one copy of Exchange for each node in the cluster. You do not get the benefit of not licensing passive nodes like you would on the SQL cluster. Exchange Enterprise licenses have an MSRP of $4,000 per server. As previously stated, and supported by Dell and IBM tests, Exchange does not scale well above 2 processors. So again take the cluster wide.

There are whitepapers by Dell and VMware that Exchange actually scales better in the Virtual environment than in physical. On a quad processor quad core IBM server, VMware was able to scale to 16,000 mailboxes. This was done with eight dual vCPU VMs each hosting 2,000 users. A blog discussing this can be found here. Dell wrote a similar paper on a dual quad core server.

My suggestions here; skip Microsoft clustering; get a couple of dual quad core processor servers and two licenses of VMware ESX 3.5. Load the servers up with as much ram as they will take and buy Exchange 2007 standard at $700 per VM. Build two Exchange servers using Standby Continuous Replication (SCR) between the two. Configure a Rule in the VMware cluster to put the two servers on different Physical ESX hosts. The VMware HA will protect you from a physical hardware failure; SCR will minimize the impact of an OS or application failure on the primary Exchange server. Of course you will still need additional servers for the other functionality in Exchange 2007: Edge transport, Client Access, Hub transport and Unified messaging. But with the cost savings of not buying additional servers, you can build standalone VMs to provide each piece of the Exchange environment.

SharePoint Server

With SharePoint Server, the best play would be to run a network load balance cluster for the SharePoint front ends and place databases on SQL cluster above. This will be a significantly cheaper solution as it will not required cluster able hardware and would only required Windows Server Standard instead of Windows Server Enterprise. It would also provide as high if not better uptime as a clustered front end.

 

SQL Server Pricing: http://www.microsoft.com/sqlserver/2005/en/us/pricing.aspx

Exchange Server licensing: http://www.microsoft.com/exchange/howtobuy/default.mspx

Exchange licensing comparison: http://technet.microsoft.com/en-us/library/bb232170.aspx

December 4, 2008 Posted by mordtech | ESX, Exchange 2007, Microsoft, VMware, Windows | , , , , , , | 2 Comments

Microsoft licensing

We are beginning to go a usage audit to true-up our Microsoft licensing. For the most part, the licensing is straight forward. Use a product, get a license. Don’t use the product, don’t get a license. But where confusion creeps in is around items such as virtualization, Public web access, Clustering. In this blog I’ll discuss Microsoft licensing in the virtualization arena. I’ll write another entry on public web access and clustering within the next day.

Licensing in the virtualization arena:

You have three options for licensing the Windows Server operating system. The first is that you buy a license for each virtual machine based on whether it is running Windows 2003 Standard or Enterprise: Easy enough. Option two is a bit more tricky, according to the Microsoft licensing for Virtualization web page, you can run “…you to run up to four software instances at a time in virtual operating system environments (OSEs) on a server under a single server license.” The third option is to purchase a license of Windows 2003 Datacenter, which is licensed per socket, for each of your Physical Hypervisor Hosts. This allows you to run an unlimited number of Windows Server based guest VMs on that particular host.

Lets look at a quick cost benefit analysis of each licensing type. We will use a two node cluster of dual processor Quad core servers. We will exclude networking, storage, electrical and cooling consumption. Those would be similar under any of the three licensing options. I also won’t even begin to do a hardware cost comparison between physical and virtual as there is enough information on the web to make an accountant cry about how much you will save virtualizing your environment. We will use a Server vCPU to pCPU ratio of 5:1, which should give us roughly 40 vCPUs. Given that we need the overhead to allow a hardware failure, we will not account for the second Host node. We’ll break down the license usage as 34 Windows 2003 Standard and 6 Windows 2003 Enterprise Guests.

License option 1: (one license for each Guest VM)

   MSRP Amount Option 1 Cost
Std

$1,000

34

$34,000

Ent

$4,000

6

$24,000

        

$58,000

As you can see in the graphic above, the MSRP of those 40 servers would be approximately $58,000.

License option 2: (Windows Server Enterprise – 4 free on the same server)

   MSRP Amount Option 2 Cost
Ent

$4,000

12

$48,000

 

Here is where it can a little dicey, the license states that you can run 4 instances of the OS on one server. When you license in the two node environment, especially when using a product such as VMware Infrastructure DRS; you can not be sure how many VMs will reside on one physical host at any one time. It might be 20-20 or it might be 22-18, etc.. While it would look like you only need 10 Enterprise licenses to cover those 40 servers, you would probably need at least 1 extra for each node to ensure that you never have more guest VMs running on one node. Even with purchasing two extra licenses of Windows Enterprise, you still save $6,000 over the one-license per guest option. Another benefit is that you can run either Standard or Enterprise and still be in the good graces of Microsoft.

License option 3: (Windows Server Datacenter – run what you brung!)

 

   MSRP Amount Option 3 Cost
DataCenter

$3,000

4

$12,000

 

Here is where Microsoft licensing in the virtualized arena begins to shine. Microsoft DataCenter licensing has an MSRP of $2,999 per physical processor. Not per core, per physical socket. That means that for each node in the cluster, we need $6,000 worth of Microsoft OS, to cover everything. This licensing option also allows us the opportunity to load whatever OS, the business unit needs. Or, we just standardize on Windows Enterprise for the Virtualized servers and not worry about any features that are disabled on the standard version.

 

 

Microsoft Licensing for Virtualization: https://www.microsoft.com/licensing/highlights/virtualization.mspx

Microsoft Windows Server 2008 Pricing:

December 3, 2008 Posted by mordtech | Microsoft, VMware, Windows | , , , , | 1 Comment

ESX 4 Fault Tolerance

In the next step in virtualization evolution; VMware is introducing vLockStep. This technology is the beginning of a true active, near zero downtime clusters. David Davis reports on techtarget, that vLockStep will create a standby running image of the VM. This VM is running in step with the primary VM, any change on the primary VM is nearly instantaneously completed on the standby image. In case of hardware failure of the ESX node hosting the primary image, the standby will become active and continue on servicing clients as if nothing happened. A video on VMware’s website demonstrates this active/standby connection very effectively.

There are some additional requirements that Scott Lowe wrote about in his blog at ScottLowe.com. One of the biggest is that the FT pair must use thick provisioned vmdks. Then provisioned disks will be expanded to thick. On an NFS based datastore, one of the benefits is to leverage thin provisioning. Scott also writes that a minimum of 4 NICS are required to support vLockStep: Service Console, Clients, FT, and vMotion. Given that redundancy is a given in the production environment, we are now up to a minimum of 8 nics, and at least 2 more if the NFS datastore is used.

Also, there are limitations to vLockStep. One such limitation is that the VM guest can only be single vCPU. A second limitation is that the secondary VM must reside on the same datastore as the primary. I’m making an assumption on the third limitation: It would appear that the secondary VM will need to run on the same VI cluster as the primary.

The first limitation of single vCPU is the largest shortcoming of vLockStep. Most of the critical applications that would benefit from a truly active cluster are going to be database servers and mail servers. Critical servers that the entire business requires, most of these servers are going to be multi vCPU. The second and third limitation would have been niceties. Placing a vLockStep enabled VM across both multiple datastores and multiple clusters would have truly enhanced the disaster recovery capabilities of VMware. Firms would be able to survive not only a small ESX host failure, but also entire storage failures, Power Failures, natural disasters. Hopefully, the vLockStep technology will be enhanced in future releases of ESX.

David Davis discussing vLockStep Technology: http://searchvmware.techtarget.com/video/0,297151,sid179_gci1336986,00.html?track=NL-915&ad=670476&asrc=EM_NLN_4862015&uid=2353064

VMware Site demonstrating vLockStep Technology: http://download3.vmware.com/vdcos/demos/FT_Demo_800×600.html

Scott Lowe’s Blog on vLockStep: http://blog.scottlowe.org/2008/09/16/bc2621-fault-tolerant-vms-in-vi-operations-and-best-practices/


 

November 29, 2008 Posted by mordtech | VMware | , , | No Comments Yet

Virtualization Server Sprawl

When a firm is contemplating virtualization, there are many positives: server consolidation, improved DR, reduced energy consumption, reduced infrastructure costs, etc…. There are also negatives: increased risk due to single point of failure, additional complexity, server sprawl…

Server sprawl has long been a part of the Windows Server realm, due to the overwhelming mindset of one application – One Server. One of the few things that kept server sprawl in check was the cost to procure another server. The firm would need to purchase a new server, this normally entails

  • Research (find systems that meet the application requirements)
  • Verify datacenter can absorb additional server (are there enough network ports, are power and cooling sufficient, is there space in the rack?)
  • Requesting quotes (must work with multiple vendors to ensure best value proposition)
  • Select quote (best value proposition, not always lowest cost)
  • Submitting the PO to purchasing (wait out the steeping period)
  • Purchasing department orders the server
  • Waiting for vendor to ship (anywhere from a week to a month)
  • Request storage and networking ports from the groups responsible.
  • Waiting for equipment to arrive in datacenter (we work in a union facility where we are not allowed to move equipment, this usually takes a week)
  • Submit change request to rack mount the server and bring it on the network (this usually must happen after hours when the server is being installed in racks that have production servers)

Now, the administrator can begin configuring the server.

  • Install OS
  • Install Service Packs
  • Patch
  • Install VirusScan
  • Install backup software
  • Install monitoring software
  • Configure monitoring
  • Install support features for the application
  • Patch support features
  • Install application
  • Test
  • Verify

Now, what does this have to do with server sprawl? After virtualization, skip most of the steps required to purchase a server. Instead of the approximately two months to purchase new hardware, as soon as the request is made, the VM can built almost immediately and at almost zero cost to the firm. Of course there are costs, OS licensing, backup, virusscan, monitoring, storage. But the portion of the costs that are immediately seen by the purchasing department are only for the application.

When your firm decides to move forward on your virtualization project, do not underestimate the concept of server sprawl. My previous two firms after the virtualization infrastructure was in place, experienced significant server sprawl. Projects that were originally slated for two servers became 10. Applications that were on the bubble, immediately became a go. Applications that would have previously been denied due to limited use by only a few in the company, became feasible.

November 26, 2008 Posted by mordtech | Microsoft, VMware, Windows | , , , | No Comments Yet

Poor performance when using ESX SMP

Our development environment consists of two dual HP Proliant DL380 G5s. Both nodes have two dual core processors and 36GB of ram each. We have DRS enabled and are running about 32 vCPUs. We started getting complaints from the application owners that their dev environments were becoming extremely sluggish. Looking at the Virtual Center server, each node was using roughly 60% CPU. We started looking into it more and realized that we had 5 VMs running dual vCPU.

All five of these boxes were the machines being reported as sluggish by the application owners. The issue appears to be in the way that VMware allocates CPU cycles when a guest VM needs to perform work. ESX locks a Core when a guest VM requests a CPU cycle. While that VM is using the processor, no other VM can access it. When the work is complete, the core is released for the next VM awaiting processing cycles. The problem with dual vCPU guests is that when it needs to do processing work, it locks two cores. Even if it is only a single thread awaiting processing, it still locks to cores. If there are not two cores available, the guest VM must wait until it is granted access to two cores. On our dev cluster, we are hitting 4:1 vCPU to pCPU. Our dual vCPU systems were nearly always waiting for two cores to become available. After dropping all 5 guests to single vCPU, our cluster nodes dropped to roughly 35% CPU utilization. Only one of the guests appears to need the dual vCPU. It is a Lotus Notes dev server that is pegging the single vCPU at 99%. Apparently that is a common occurrence for Notes, and we have our Administrator looking into how he can reduce the CPU utilization of a server that doesn’t host any production databases.

We do leverage resource pools at this time. On the memory side, we are still under the 26GB in allocated memory when the guests are spread across both nodes. As all but 3 of these machines are Windows 2003 SP2, we are experiencing a nearly 10:1 memory sharing ratio. On the CPU side, I don’t believe that resource pools would have helped us. If the dual vCPU guests were placed into the High resource pool, they would have starved out the remaining 26 guests.

Our lesson learned: do not allocate more multi vCPU guests than you have physical cores in the cluster. We are going to do testing with the resource pool. We’ll place the one of the dual vCPU guests in the high resource pool and run tests to see how it and other guests respond. As for now, we are going to limit our multi vCPU guest count to only 1/4 of our physical CPU count. Example, we have 8 cores in our cluster, we can run 2 dual vCPU guests. We have 2 additional boxes waiting licensing, we could then increase our count to 4 dual vCPU guests. We might find that the resource pools do help, and we’ll touch this again at that time.

November 25, 2008 Posted by mordtech | VMware | , | No Comments Yet

VMware ESX 4 Storage vMotion

One of the features being updated in the next release of VMware ESX is storage vMotion. Storage vMotion was released in ESX 3.5 and while great for administrators that leverage either iSCSI or VMFS, it was of limited use to administrators leveraging NFS datastores.

One benefit of using NFS datastores is the default creation of thin VMDKs. Administrators can create larger VMDKSs on VM creation, and not waste space on the storage unit. I consider this a set it and forget it option. We only need to monitor the datastore and increase the size of the NFS mount when required. While thin provisioning is the default configuration option when using NFS datastores, the thin provisioning was not kept when moving the VMDKs from one storage unit to another. This is a limitation of the underlying ESX hypervisor, as we could not even move the VMDKs through the service console without losing the thin provisioning.

Enter ESX 4. According to VMware demo at http://download3.vmware.com/vdcos/demos/Storage_VMotion_800×600.html, we now have the option to live migrate from thin to thin. It also appears that we can now move VMDKs from thick to thin, thin to thick, even RDM to either thick or thin or any of the above to RDM. One of the selling points of storage vMotion in 3.5 was the ability to move the VM live between storage. You could build your server on tier 2 storage, when ready to go to production; you could move the VM to tier 1 storage with no downtime to the VM. Now, NFS datastore administrators can fully leverage this technology.

What are some benefits of the enhanced storage vMotion?

  1. Build VMs on tier 2 storage and move to tier 1 storage when ready for production.
  2. Remove our lock-in to FC, iSCSI or NFS. Build using one technology now and migrate in the future based on performance requirements.
  3. Migrate to new storage when replacing aged storage heads.

While on the topic of storage, one feature that I would like to see implemented by VMware is the ability to leverage VCB for NFS datastores. While many might question why you would want to use VCB for backing up NFS, when you can use the the same storage backup solution as the NFS host, there are good reasons. Backing up VMs becomes more difficult as the numbers increase. By leveraging VCB, the stress of backups is moved from the virtual infrastructure to a separate windows server. This could provide options such as backing up outside of the normal backup nightly backup window, or adding additional incremental or differential backups.

November 12, 2008 Posted by mordtech | NFS, VMware, vmotion | , , | No Comments Yet

ESX host not responding while VMs still up

While doing a P2V, one of my ESX hosts started reporting as offline. The hostd daemon on the service console hung. Below are the following steps taken to restart the daemon. As soon as the mgmt-vmware is restarted, it immediately popped in virtual infrastructure successfully. The following steps did not cause any downtime of the virtual machines running on the ESX host.

One point it that the VMs did not restart on other nodes in the cluster, even though the node was no longer responsive. This is probably due to the esx host service console still responded to pings. Oh well, no downtime for the guest VMs, means no upset business units.

  • ps -auxwww|grep -i hostd
  • kill -9 <process ID>s
  • ps -auxwww|grep -i hostd
  • cd /var/run/vmware
  • rm watchdog-hostd.PID vmware-hostd.PID
  • ps -auxwww|grep -i hostd
  • ps -auxwww|grep -i hostd
  • service mgmt-vmware start

November 5, 2008 Posted by mordtech | VMware | | No Comments Yet

VMware ESXi guest unknown (invalid)

First, A little background of my home network; I have a two system network consisting of Dell workstation. The first runs ESXi 3.5 Update 2 and the second runs Open Solaris 2008.05. I utilize the ZFS for both the root and the data drives. I use an NFS share for my ESX datastore. Well today, while working from home, we experienced a brownout and down came both servers and my Vista workstation. I waited to boot the ESXi server until the Solaris box successfully booted. After booting, I started the ESXi server and was greeted by my two VMs stating unknown (invalid).

I tried browsing the datastore and was able to see my VMs. So I thought, OK, I’ll try to remove one of the VMs from inventory and re-import. ESX imported it and lo and behold……. Unknown (invalid). Ok, during this time, I checked my music folder on the workstation, which is an ISCSI mount on the same server. It looked fine. So I rebooted the ESX server to see if that helped. After the reboot, still no dice. I then logged into the Solaris server and decided to reboot it and then the ESXi server again. This time, we had a winner. My VMs were there with the correct names, and I was able to start them up. The VMware lock files were in place and weren’t cleared until after the storage reboot. The next time, I might try to un-mounting the NFS share and remounting.

Live and learn.

November 3, 2008 Posted by mordtech | NFS, Solaris, VMware, ZFS | , , , | 1 Comment