Mordtech's Blog

General Technology Blog

Poor performance when using ESX SMP

Our development environment consists of two dual HP Proliant DL380 G5s. Both nodes have two dual core processors and 36GB of ram each. We have DRS enabled and are running about 32 vCPUs. We started getting complaints from the application owners that their dev environments were becoming extremely sluggish. Looking at the Virtual Center server, each node was using roughly 60% CPU. We started looking into it more and realized that we had 5 VMs running dual vCPU.

All five of these boxes were the machines being reported as sluggish by the application owners. The issue appears to be in the way that VMware allocates CPU cycles when a guest VM needs to perform work. ESX locks a Core when a guest VM requests a CPU cycle. While that VM is using the processor, no other VM can access it. When the work is complete, the core is released for the next VM awaiting processing cycles. The problem with dual vCPU guests is that when it needs to do processing work, it locks two cores. Even if it is only a single thread awaiting processing, it still locks to cores. If there are not two cores available, the guest VM must wait until it is granted access to two cores. On our dev cluster, we are hitting 4:1 vCPU to pCPU. Our dual vCPU systems were nearly always waiting for two cores to become available. After dropping all 5 guests to single vCPU, our cluster nodes dropped to roughly 35% CPU utilization. Only one of the guests appears to need the dual vCPU. It is a Lotus Notes dev server that is pegging the single vCPU at 99%. Apparently that is a common occurrence for Notes, and we have our Administrator looking into how he can reduce the CPU utilization of a server that doesn’t host any production databases.

We do leverage resource pools at this time. On the memory side, we are still under the 26GB in allocated memory when the guests are spread across both nodes. As all but 3 of these machines are Windows 2003 SP2, we are experiencing a nearly 10:1 memory sharing ratio. On the CPU side, I don’t believe that resource pools would have helped us. If the dual vCPU guests were placed into the High resource pool, they would have starved out the remaining 26 guests.

Our lesson learned: do not allocate more multi vCPU guests than you have physical cores in the cluster. We are going to do testing with the resource pool. We’ll place the one of the dual vCPU guests in the high resource pool and run tests to see how it and other guests respond. As for now, we are going to limit our multi vCPU guest count to only 1/4 of our physical CPU count. Example, we have 8 cores in our cluster, we can run 2 dual vCPU guests. We have 2 additional boxes waiting licensing, we could then increase our count to 4 dual vCPU guests. We might find that the resource pools do help, and we’ll touch this again at that time.

Updated: November 25, 2008 — 7:48 am
Mordtech's Blog © 2015
%d bloggers like this: