VMware VMotion, how fast can we go?

Lately while I was testing out specific failover behaviors in vSphere, I accidently discovered that VMotion Speeds (MB/s) are logged in the the /var/log/vmkernel, now that’s cool!

Issue the command tail -f /var/log/vmkernel and than initiate a VMotion. You should get info like this:

Host VMotionning to (receiving)

Nov  7 21:13:14 xxxxxxxx vmkernel: 10:06:06:18.104 cpu3:9131)VMotionRecv: 226: 1257624621919023 D: Estimated network bandwidth 280.495 MB/s   during pre-copy
Nov  7 21:13:15 xxxxxxxx vmkernel: 10:06:06:18.756 cpu2:9131)VMotionRecv: 1078: 1257624621919023 D: Estimated network bandwidth 280.050 MB/  during page-in

Host VMotionning from (sending)

Nov  9 17:44:00 xxxxxxxxvmkernel: 12:01:47:02.229 cpu12:11150)VMotionSend: 2909: 1257781936902648 S: Sent all modified pages to destination (network bandwidth ~287.381 MB/s)

The last notice: “Sent all modified pages to destination (network bandwidth ~xxx.xxx MB/s)” is the overall counter that rates the whole VMotion action.

While seeing this MB/s counters I wondered if there is any speed limit on VMotion other then obviously the network speed limit.  Second I wanted to know if we are using the full 7 Gb that I configured in our current vSphere environment.

So…… Testing Time! 

I’ve got myself 2 vSphere hosts with 64 GB each and VMotionned a newly created Windows 2003 32 GB VM around (The VM uses 30 GB memory thanks to the memory allocation tool)

So the first conclusions:

1) I see is that VMotion is using a “send buffer size of 263536” which is 256 KB;
2) Average VMotion Speed was 285.871 MB/s on the 7 Gb link;
3) I don’t see anything near 750 MB/s (875 MB/s would be the theoretical speed of the 7 Gb link).

Next step: I’ve created a VMotion-Virtual Machine Port Group to the vSwitch serving VMotion to be able to attach my test VM’s

vSwitch VMotion Configuration

I’ve took 2 other Windows 2003 VM’s (with vmxnet3) and ran iperf.exe (a Windows Network Bandwidth Tester) on them with the same frame size VMotion uses (256KB) to measure bandwidth between those 2 VM’s.
First tests were on the default VM Network (1Gb) and second test were done on the VMotion vSwitch (with theVirtual Machine Port Group created on it)

Speeds on 1 Gb Windows 2003Speeds on 1 Gb

Speeds on 7 Gb Windows 2003Speeds on 7Gb

The results show that my speed on the 1 Gb network is 114 MB/s which is good since theoretically it could be 125 MB/s
The 7 Gb network however only gives me an average of 220 MB/s which is bad since I would except something around ~750 MB/s.
Please again note that VMotion is running at ~285 MB/s which is somewhat the same as the maximum I’m measuring with iperf (~220 MB/s excluding the VM network stack overhead)

So, why can’t I go any faster?

Running the iperf test on the 7 Gb link with 2 Windows 2008 VM’s (vmxnet3) gave much better results! (almost twice as fast)
To make this test more complete I ran longer iperf tests, i.e. 90 seconds just as long as an average VMotion of the 32 GB VM takes.

Speeds on 7Gb Windows 2008Speeds on 7 Gb Windows 2008

So it’s obvious that my link can do better!

Last thing to make this test complete is to deploy a Windows 2008 32 GB VM (with again the memory allocation tool filling it up to 30 GB) which brings us with the following conclusions (wrap-up):

1) Average VMotion Speed was 285.871 MB/s on the 7 Gb link using Windows 2003 VM’s
2) Average VMotion Speed was 251.839 MB/s on the 7 Gb link using Windows 2008 VM’s
3) iperf tests with the Windows 2003 VM’s measure ~220 MB/s on the 7 Gb link
4) iperf tests with the Windows 2008 VM’s measure ~497 MB/s on the 7 Gb link

So despite all my tests I have to conclude that I’m not getting VMotion to run faster while it’s proven that my link can reach much higher speeds. I’ve even played around with the Advanced Network Setting from within the ESX Host Configuration without luck.

Credits to whom solves this puzzle for me 😉

— Edit 12 november 2009: —

As described in my own comments:  The night after publishing this blog I was thinking that maybe CPU has something to do with it. I can remember that during the ESX courses (yeaaars ago, so very old info from either ESX 2.5 or 3.0) they told me that transferring on 1 Gb speed would take up 1000 MHz of CPU cycles.

So, testing time again to get an answer on “VMotion, how fast can we go?”

First of all I wanted to know what my cores were doing while VMotionning my VM around (I took the 32 GB allocated VM again since that is taking the longest time).
Speed: 253 MB/s
Conclusion: All cores are used but at an average of 20 to 30%

Core behavior while VMotionning

Next I wanted to know what my cores would do with the faster iperf tests:
Speed: 532 MB/s

Cores Behavior with iperf tests

Next I wanted to make sure that the “switching between cores” (image above) was done by the scheduler who is giving my VM CPU cycles, so I used CPU Affinity to bind 2 cores to this 1 vCPU VM:
Speed: 531 MB/s

Core Behavior while using CPU Affinity within the iperf tests

So now that it’s proven that the “switching between cores” is done by the scheduler I wanted to know if adding a second vCPU would speed up my performance:
Average Speed on multiple tests: 603 MB/s

Cores Behavior with a 2vCPU VM while doing iperf tests

And the last test, what happens if I’m going to VMotion multiple VM’s:
Average Speed: 282 MB/s

Multiple VMotions

So for a final wrap-up I can make the following conclusions:

– VM Network performance is based on pure processor power, adding a second vCPU if you have very high bandwidth needs is recommended;
– Using CPU Affinity for pure network performance doesn’t win you anything;
– And last but not least, it appears that the vmkernel is somehow limited in using its CPU resources for VMotion and I guess I will never be able to use the full 7 Gb for VMotion in our current environment 🙁

20-07-2010 Update for vSphere 4.1: The new features of vSphere 4.1 are stating:

vMotion Enhancements. In vSphere 4.1, vMotion enhancements significantly reduce the overall time for host evacuations, with support for more simultaneous virtual machine migrations and faster individual virtual machine migrations. The result is a performance improvement of up to 8x for an individual virtual machine migration, and support for four to eight simultaneous vMotion migrations per host, depending on the vMotion network adapter (1GbE or 10GbE respectively). See the vSphere Datacenter Administration Guide.

Leave a comment

13 Comments

  1. Nice post Kenneth !!!

  2. Kenneth van Ditmarsch

     /  November 9, 2009

    Thanks, but without a real explanation YET…. 😉

  3. Steve Cleveland

     /  November 9, 2009

    Sorry if it’s a dumb question, but how are you getting a 7Gbit network? And I assume the troubles you’re having will translate to 10GBit as well, so definitely something that needs to be solved. Good research on this!

  4. Very interesting…
    Nothing on the page size side ?

  5. iguy

     /  November 10, 2009

    Seems like the network device in the vmkernel is limited. Is this network stack based on the vmx or the linux stack?

  6. Kenneth van Ditmarsch

     /  November 10, 2009

    Hi Steve,

    If you read my post about Flex-10 it should clear up how I got the 7 Gb link. (it’s actually a 10Gb network that is shared into pieces from within the HP hardware)

  7. Kenneth van Ditmarsch

     /  November 10, 2009

    Not that I’m aware of. The transferred VM isn’t doing anything and I use the .vswp file on the datastore stored with the VM itself. (no reservations, no limits)

  8. Kenneth van Ditmarsch

     /  November 10, 2009

    Well I’m assuming this is based on the linux stack. Whenever I’m testing the iperf speed between VM’s it would be based on the VMX since obviously the OS (and thus the OS overhead) is in between the transfers.

    Last night I was thinking about this and maybe CPU has something to do with it. I can remember that during the ESX courses (yeaaars ago, so very old info from either ESX 2.5 or 3.0) they told me that transferring on 1 Gb speed would take up 1000 MHz of CPU cycles. If this information is still relevant (which I doubt) I could explain it this way: I’m running 2.26 GHz procs (8 cores) assuming I’m getting only 1 core that is serving my VMotion the math goes like this: 2260 MHz = 2.2 Gb speed = approx. the values I see when VMotionning.

    Anyone any thought on that? I haven’t checked esxtop to check this behavior yet BTW

  9. anon

     /  November 17, 2009

    are jumbo frames enabled? vmotion nic’s directly connected between hosts or via a switch?

  10. Kenneth van Ditmarsch

     /  November 17, 2009

    No Jumbo isn’t enabled, however I did the same test on a native 10 Gb line (with Jumbo Frames enabled) and my speed didn’t came much higher.
    The VMotion nic’s are connected via a switch, but as stated my line is able to do higher speeds with the VM to VM so my line obviously isn’t the bottleneck in this test.

  11. Does vMotion faster when changing a higher HZ CPU?

  12. Todd Mottershead

     /  November 4, 2015

    I have been doing some testing with vMotion and found some interesting elements:

    1) Jumbo frames SIGNIFICANTLY improved my performance
    2) In addition to adding bandwidth dedicated to vMotion, you also need to beef up your management LAN bandwidth

    To test this, I used the Vmmark benchmark approach where I loaded 4 tiles (8 VM’s each) onto each server until I had 4 servers fully loaded (128 VM’s). I then started up the benchmark and let it settle in until each server was >90% processor utilization (E5-5630 v3’s). I had setup QOS on each of the 2 10Gb/s NIC’s on each server with the following settings:

    1-mgmt 15% guaranteed, 100% max
    1-data 15% guaranteed, 100% max
    1-iSCSI 40% guaranteed, 100% max
    1-vMotion 30% guaranteed, 100% max

    Without Jumbo frames, it took ~5min to move all 128 workloads to a second enclosure. With Jumbo frames this was reduced to ~2min. Thinking I could do better, I moved the vMotion NIC up to 40% and reduced the Mgmt NIC to 5% (total of 1Gb/s across 2 NIC’s) but instead of going faster, the vMotion times jumped back up to 5min so clearly, the resulting bottleneck on the mgmt lan was having an impact. With the original settings I produced a decent vmmark score and all cpu’s were running full out so the data network didn’t appear impacted but it became obvious that I needed to balance vmotion bandwidth with mgmt bandwidth. HTH

  1. VMware VMotion, how fast can we go? « VirtualKenneth's Blog

Leave a Reply

%d bloggers like this: