Lately while I was testing out specific failover behaviors in vSphere, I accidently discovered that VMotion Speeds (MB/s) are logged in the the /var/log/vmkernel, now that’s cool!
Issue the command tail -f /var/log/vmkernel and than initiate a VMotion. You should get info like this:
Host VMotionning to (receiving)
Nov 7 21:13:14 xxxxxxxx vmkernel: 10:06:06:18.104 cpu3:9131)VMotionRecv: 226: 1257624621919023 D: Estimated network bandwidth 280.495 MB/s during pre-copy
Nov 7 21:13:15 xxxxxxxx vmkernel: 10:06:06:18.756 cpu2:9131)VMotionRecv: 1078: 1257624621919023 D: Estimated network bandwidth 280.050 MB/ during page-in
Host VMotionning from (sending)
Nov 9 17:44:00 xxxxxxxxvmkernel: 12:01:47:02.229 cpu12:11150)VMotionSend: 2909: 1257781936902648 S: Sent all modified pages to destination (network bandwidth ~287.381 MB/s)
The last notice: “Sent all modified pages to destination (network bandwidth ~xxx.xxx MB/s)” is the overall counter that rates the whole VMotion action.
While seeing this MB/s counters I wondered if there is any speed limit on VMotion other then obviously the network speed limit. Second I wanted to know if we are using the full 7 Gb that I configured in our current vSphere environment.
So…… Testing Time!
I’ve got myself 2 vSphere hosts with 64 GB each and VMotionned a newly created Windows 2003 32 GB VM around (The VM uses 30 GB memory thanks to the memory allocation tool)
So the first conclusions:
1) I see is that VMotion is using a “send buffer size of 263536” which is 256 KB;
2) Average VMotion Speed was 285.871 MB/s on the 7 Gb link;
3) I don’t see anything near 750 MB/s (875 MB/s would be the theoretical speed of the 7 Gb link).
Next step: I’ve created a VMotion-Virtual Machine Port Group to the vSwitch serving VMotion to be able to attach my test VM’s
I’ve took 2 other Windows 2003 VM’s (with vmxnet3) and ran iperf.exe (a Windows Network Bandwidth Tester) on them with the same frame size VMotion uses (256KB) to measure bandwidth between those 2 VM’s.
First tests were on the default VM Network (1Gb) and second test were done on the VMotion vSwitch (with theVirtual Machine Port Group created on it)
Speeds on 1 Gb Windows 2003
Speeds on 7 Gb Windows 2003
The results show that my speed on the 1 Gb network is 114 MB/s which is good since theoretically it could be 125 MB/s
The 7 Gb network however only gives me an average of 220 MB/s which is bad since I would except something around ~750 MB/s.
Please again note that VMotion is running at ~285 MB/s which is somewhat the same as the maximum I’m measuring with iperf (~220 MB/s excluding the VM network stack overhead)
So, why can’t I go any faster?
Running the iperf test on the 7 Gb link with 2 Windows 2008 VM’s (vmxnet3) gave much better results! (almost twice as fast)
To make this test more complete I ran longer iperf tests, i.e. 90 seconds just as long as an average VMotion of the 32 GB VM takes.
Speeds on 7Gb Windows 2008
So it’s obvious that my link can do better!
Last thing to make this test complete is to deploy a Windows 2008 32 GB VM (with again the memory allocation tool filling it up to 30 GB) which brings us with the following conclusions (wrap-up):
1) Average VMotion Speed was 285.871 MB/s on the 7 Gb link using Windows 2003 VM’s
2) Average VMotion Speed was 251.839 MB/s on the 7 Gb link using Windows 2008 VM’s
3) iperf tests with the Windows 2003 VM’s measure ~220 MB/s on the 7 Gb link
4) iperf tests with the Windows 2008 VM’s measure ~497 MB/s on the 7 Gb link
So despite all my tests I have to conclude that I’m not getting VMotion to run faster while it’s proven that my link can reach much higher speeds. I’ve even played around with the Advanced Network Setting from within the ESX Host Configuration without luck.
Credits to whom solves this puzzle for me 😉
— Edit 12 november 2009: —
As described in my own comments: The night after publishing this blog I was thinking that maybe CPU has something to do with it. I can remember that during the ESX courses (yeaaars ago, so very old info from either ESX 2.5 or 3.0) they told me that transferring on 1 Gb speed would take up 1000 MHz of CPU cycles.
So, testing time again to get an answer on “VMotion, how fast can we go?”
First of all I wanted to know what my cores were doing while VMotionning my VM around (I took the 32 GB allocated VM again since that is taking the longest time).
Speed: 253 MB/s
Conclusion: All cores are used but at an average of 20 to 30%
Next I wanted to know what my cores would do with the faster iperf tests:
Speed: 532 MB/s
Next I wanted to make sure that the “switching between cores” (image above) was done by the scheduler who is giving my VM CPU cycles, so I used CPU Affinity to bind 2 cores to this 1 vCPU VM:
Speed: 531 MB/s
So now that it’s proven that the “switching between cores” is done by the scheduler I wanted to know if adding a second vCPU would speed up my performance:
Average Speed on multiple tests: 603 MB/s
And the last test, what happens if I’m going to VMotion multiple VM’s:
Average Speed: 282 MB/s
So for a final wrap-up I can make the following conclusions:
– VM Network performance is based on pure processor power, adding a second vCPU if you have very high bandwidth needs is recommended;
– Using CPU Affinity for pure network performance doesn’t win you anything;
– And last but not least, it appears that the vmkernel is somehow limited in using its CPU resources for VMotion and I guess I will never be able to use the full 7 Gb for VMotion in our current environment 🙁
20-07-2010 Update for vSphere 4.1: The new features of vSphere 4.1 are stating:
vMotion Enhancements. In vSphere 4.1, vMotion enhancements significantly reduce the overall time for host evacuations, with support for more simultaneous virtual machine migrations and faster individual virtual machine migrations. The result is a performance improvement of up to 8x for an individual virtual machine migration, and support for four to eight simultaneous vMotion migrations per host, depending on the vMotion network adapter (1GbE or 10GbE respectively). See the vSphere Datacenter Administration Guide.