Just a short blog post about Microsoft Exchange 2010 in combination with VMware vMotion. We are running this combination hosted on vSphere platform and noticed that whenever we vMotion over a Exchange 2010 Mailbox server that is using DAG (Database Availability Group’s), the DAG will fail.
25-05-2011 Update for Exchange 2010 SP1: Thanks to Toti for pointing this out to me, Exchange 2010 SP1 does support vMotion with DAG.
I’m not an Exchange guru but in short this is what the Database Availability Groups look like. The green databases are active and the blue databases are the passive databases which are spread across the rest of the mailbox servers.
Anyway, the story behind the failing DAG is because the DAG is relying on Windows Failover Clustering which doesn’t work and more important, isn’t supported with vMotion (Same counts for Microsoft Hyper-V Live Migration)
VMware’s Setup for Failover Clustering and Microsoft Cluster Service manual states:
Before you set up MSCS, review the list of functionality that is not supported for this release, and any
requirements and recommendations that apply to your configuration.
The following environments and functionality are not supported for MSCS setups with this release of vSphere:
– Clustering on iSCSI or NFS disks.
– Mixed environments, such as configurations where one cluster node is running a different version of
ESX/ESXi than another cluster node.
– Clustered virtual machines as part of VMware clusters (DRS or HA).
– Use of MSCS in conjunction with VMware Fault Tolerance.
– Migration with VMotion of clustered virtual machines.
I knew that Microsoft Cluster wasn’t supported with vMotion but when I heard about the Database Availability Group it didn’t ring a bell for me since I didn’t do any fancy stuff on the VM definition like shared RDM’s and SCSI Bus Sharing.
So to be honest for me this VM was just like the others with the remark that we disabled this VM from participating in DRS and HA because of the Exchange 2010 design.
That brings us to the next unsupported thing that is stated in the VMware Manual: Clustered virtual machines as part of VMware clusters (DRS or HA).
I was assuming that if we manually disabled a VM from participating in a DRS/HA cluster we were running a supported scenario until I contacted Duncan Epping this morning while reading his blog about vSphere Update 1
vSphere Update 1 states:
Enhanced Clustering Support for Microsoft Windows – Microsoft Cluster Server (MSCS) for Windows 2000 and 2003 and Windows Server 2008 Failover Clustering is now supported on an VMware High Availability (HA) and Dynamic Resource Scheduler (DRS) cluster in a limited configuration. HA and DRS functionality can be effectively disabled for individual MSCS virtual machines as opposed to disabling HA and DRS on the entire ESX/ESXi host
Manually disabling a VM from participating in a DRS/HA cluster like I did apparently isn’t a supported configuration! If you want to do this you need vSphere Update 1.
20-07-2010 Update for vSphere 4.1: The new features of vSphere 4.1 are stating:
Windows Failover Clustering with VMware HA. Clustered Virtual Machines that utilize Windows Failover Clustering/Microsoft Cluster Service are now fully supported in conjunction with VMware HA. See Setup for Failover Clustering and Microsoft Cluster Service.
Dennis Agterberg
/ November 20, 2009Nice posting Kenneth. I have a little question. I don’t know anything about Exchange 2010 yet, haven’t had the time to look into it. But shouldn’t you bring a little more nuance to the blog head. Like Exchange 2010 with DAG not supported with VMotion? As I understand from the text you wrote it all has to do with MSCS.
Kenneth van Ditmarsch
/ November 23, 2009Thanks!,
Yeah I guess you are right, I assume that you can also run Exchange 2010 without DAG (I’m actually not sure of that)
Michael All
/ January 6, 2010Kenneth:
Good article! I’m doing a design right now for a customer that will be exactly the same as yours, for the most part, but with only three Exchange 2010 servers instead of five. With Database Mobility on Multi-Role Exchange servers (M/C/H) (such as your configuration) MSit does not support running Windows Network Load Balancing Services on a server with Cluster Services on it. The only mention I’ve seen so far from MS on this is that you should use a hardware NLB in this scenario. Is that what you’re doing here or are you using another type of solution to load balance the traffic to the CAS server interfaces?
Thanks in advance for your response!
Michael All
Kenneth van Ditmarsch
/ January 6, 2010Hi Michael,
Thanks! 🙂 I’m not the Exchange architect at the current project but I indeed know that they are using hardware loadbalancers. Next to that they are currently building a CAS array since in the first setup everything was redundant except for the Outlook Profiles that pointed out to either one of the CAS servers (which obviously created a single point of failure on that side 😉
Edward Walton
/ January 27, 2010DAG’s aren’t required to install and run Exchange 2010. It is just an option to introduce some Hardware and mail database high availability and DR.
Edward Walton
Senior Technical Advisor
Softchoice Corporation
Tim H.
/ January 29, 2010Thank you for the article Kenneth.
Your article states that DAGs rely on Windows clustering. Just to be clear, MS states that although some components of clustering are still present, they have been integrated into the core architecture of Exchange 2010. The idea of a server level failover cluster is gone. Failover is now at the database level and there is no need to build a Windows cluster before installing Exchange and creating a DAG.
In my mind, this should render any recommendations – from VMware or othewise – regarding setting up MSCS in a virtual environment a moot point as it pertains to Exchange 2010 and database availability groups.
You are correct about Microsoft not supporting DAGs in a root clustered or vmotion/HA environment. See the section on Hardware Virtualization at http://technet.microsoft.com/en-us/library/aa996719.aspx.
Another thing that is often overlooked by both architects and customer’s alike is the 2:1 virtual to physical processor support limit imposed by MS. That means for a 2-way quad-core machine, you can have a maximum of 16 virtual processors across all VM’s on that host. I’m not sure if that limit is specific to just Exchange.
No snapshots and no thin provisioned VMDK’s either.
Tim Hollingworth
National Technical Lead – Systems
ePlus Technology Inc.
Ståle Hansen
/ February 17, 2010Hi Kenneth. I posted on the Exchange 2010 technet forum about this and asked if Exchange 2010 would be supported if we manually disabled a VM from participating in a DRS/HA cluster in vSphere Update 1.
Apparently its still not supported by Microsoft.
My question is what consequences it have when we run Exchange 2010 DAG and disable it from participating in a DRS/HA cluster. Is it just that MS have not tested it and will get supported later or it does not work and breaks the DAG.
What do you think.
Kenneth van Ditmarsch
/ February 17, 2010Hi Ståle,
I guess the only reason for this not being supported is the fact that MS hasn’t tested it.
In our site it’s running fine, Exchange Availability and Failover is designed from an Exchange perspective so “pinning” the mailbox servers to one ESX Hosts seems like a fair solution IMHO which works fine over here.
Whenever a ESX Host fails the Exchange Mailbox Server that resides on it will fail as well, which is covered by the Exchange Availability design as stated.
Cheers,
Kenneth
Samantha
/ February 19, 2010Nice blog, i like it, its informative,
i will visit his blog more often.
i like your article specially about
VMotion and Exchange 2010, not supported
Cheers
Kenneth van Ditmarsch
/ February 19, 2010Hi Samantha,
Thanks! I’m busy finding other subjects to blog about.
Stay tuned 😉
Kenneth
Adam
/ April 28, 2010I don”t think the problem is in the DRS/HA, but in fact you have all 3 roles on the same server. Configuring NLB on VMWare is not just simply enable it, there are router settings that need to be done, and you have to make exceptions on the NLB. I have done NLB with a CAS role, on ESX, and it was a pain to do with all the router configurations, we endded up just putting in a hardware NLB instead. I would almost bet that if you put the CAS servers on a seperate server and left the HUB and Mailbox roles mixed, it would work. There is no reason why DRS/HA would not work on VMWare as long as NLB is not enabled.
Kenneth van Ditmarsch
/ April 29, 2010Hi Adam,
The problem that I experienced here was the fact that I was unable to do a VMotion on a Exchange Mailbox Server since they were using DAG (Microsoft Failover Clustering).
Microsoft Failover Clustering was the one that was giving the problems.
BTW, we also had hardware load balancers in place vor the CAS servers.
Kenneth
Andrew
/ May 12, 2010While VMotion support is required for VMware HA, the failure of an ESX host would trigger Exchanges’ own HA features if the DAG members are spread across different ESX servers, as you said.
Two different ways of achieving the same goal of protecting against the potential failure of physical hardware, that just happen to not work well together.
What I would consider more important is what benefits do we lose by not having VMotion enabled for Exchange 2010 server roles? Is disabling VMotion only required for Mailbox servers that are part of a DAG, or does it also apply to CAS, HUB and Edge?
Kenneth van Ditmarsch
/ May 12, 2010Hi Andrew,
Why is VMotion support dependend on VMware HA?
Our experience was that we weren’t able to VMotion the DAG Mailbox Servers but that we could VMotion all other roles like CAS, HUB or Edge.
Kenneth
Scott Lowe
/ May 12, 2010I’m researching this right now as a part of our migration to Exchange 2010. We’re small – 1600 or so mailboxes – and I’d like to do this as virtual as possible. Our only physical component will be the UM server.
We have availability needs, but do not yet have a DR site.
Is there anything, other than MS’ official support policy, that would keep us from setting up two mailbox servers in a DAG and disabling HA/DRS/etc (two virtual servers) and then using a single CAS/Hub virtual machine that does have HA enabled? That would mean I’d need three virtual machines, wouldn’t need load balancing at all and, in my head at least, I’d have the availability that I need by virtue of the DAG for the Mailbox servers and HA/Vmotion for the Hub/CAS.
Scott
Kenneth van Ditmarsch
/ May 17, 2010Hi Scott,
This is exactly the way I did it with the customer, we however initially did have 2 virtual CAS servers that were loadbalanced because of the big environment that would be migrated (20.000+ mailboxes)
Only thing that you have to keep into account is that it isn’t supported to disable HA/DRS for a MSCS VM prior to vSphere Update 1
vSphere Update 1 states:
Enhanced Clustering Support for Microsoft Windows – Microsoft Cluster Server (MSCS) for Windows 2000 and 2003 and Windows Server 2008 Failover Clustering is now supported on an VMware High Availability (HA) and Dynamic Resource Scheduler (DRS) cluster in a limited configuration. HA and DRS functionality can be effectively disabled for individual MSCS virtual machines as opposed to disabling HA and DRS on the entire ESX/ESXi host
Further I assume that you are using a FC environment?
Kenneth
Nick
/ May 19, 2010This policy from MS really needs to be thought through. This is not a direct overlap of functionality. If there are specific technical concerns then MS should state them and allow the 3rd party vendors to come up with solutions.
DAGs are going to be used for backups as well as a HA option so this could impact a large number of deployment scenarios.
On the hypervisor side, clearly you might cluster for a host of reasons that again do not touch on HA. I might want to move other servers off box to allow the mailbox server to have more cycles, you may be doing a general hardware refresh, you may be wanting to deploy exchange onto some form of virtual DC where the choice of hypervisor and it’s configuration are out of your hands.
If Exchange continues to demand specific “hardware” in the DC and thus become a support island, then it will contribute to people evaluating other solutions.
Eamonn Deering
/ July 8, 2010Hi Kenneth
Would this work without breaking DAD (using vMotion).
I want the freedom to move EXC1 to another host while live. User mailbox count is under 80.
I don’t have a test environment to do testing but an IT policy like this might work (after some refinement).
EXC1-08 = H,C,M EXC2-08 =M (DAG)
Using vSphere U1 disable DSR and HA on the Two exchange servers. I’m assuming vMotion will still work at this point.
IT policy for vMotion of Exchange servers.
Any IT person wishing to vMotion Exchange 2010 must follow these steps.
First check that the Live Exchange database are on EXC1-08 (needs to know Exchange DAG).
Power down the backup server EXC2-08.
If EXC2-08 needs to move to another Host then move it now while offline.
vMotion EXC1-08 while online (Not sure what affect this will have).
Power back on EXC2-08.
Check that database replication if functioning.
Kenneth van Ditmarsch
/ July 20, 2010Hi Eamonn,
The procedure we had in-place was to remove the VM (that we needed to VMotion) from the DAG, VMotion the VM and re-enable the DAG so that the VM would again participate in the replication.
Why is your procedure stating to completely power-down the VM?
Kenneth
Totie
/ May 25, 2011Kenneth, Exchange 2010 SP1 now support Vmotion with DAG… UM role as well… here’s the source:
http://technet.microsoft.com/en-us/library/aa996719.aspx
http://justincockrell.com/?p=166
Greg Davis
/ June 10, 2011Hi Kenneth,
Nice article, this has been very useful to me today in clearing up some support queries about E2K10 DAG with vMotion. And I hope all is well!
Greg Davis
Kenneth van Ditmarsch
/ June 10, 2011Hi Mister Singapore!
ha, good stuff, thanks 🙂