Migrating to embedded PSC, mind the decommission order.

I recently migrated a few vSphere environments with external PSC’s to vSphere 7 and noticed some PSC leftovers afterwards. Now a lot is already written about the deprecation of the External Platform Services Controller (PSC) deployment model so I will not go into this.

What I do want to highlight is the importance of the decommission order for the PSC’s and the way you can actually check if there are stale records under the hood.

Lets first start with some key commands:

Which PSC is vCenter pointing to:
/usr/lib/vmware-vmafd/bin/vmafd-cli get-ls-location --server-name localhost

Show PSC replication partners (showpartners parameter)
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartners -h localhost -u administrator -w <password>

Show all PSC servers (showservers parameter)
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showservers -h localhost -u administrator -w <password>

Now let me elaborate a bit more on one of my recent migrations.
The environment existed of:

  • 2x VCSA 6.7 U3 (Enhanced Linked Mode)
  • 4x External PSC
  • 2x Extenal Load Balancer

Steps taken in the process:

  1. Upgrade the first VCSA6.7 to VCSA7 and within Phase 2 you select “This is the first vCenter Server in the topology that I want to converge”.
  2. After successful migration and logon* to the new VCSA7 you will notice that Linked Mode is still active between VCSA7 and VCSA6.7.

    (*) In my case the previous PSC’s were added to the AD Domain (Integrated Windows Authentication) and were facilitating SSO. However pre-migration the VCSA6.7 wasn’t added to AD so I manually needed to add the upgraded VCSA7 to AD to re-enable the SSO. (/opt/likewise/bin/domainjoin-cli join <domain> <username> )
  3. Upgrade the second VCSA6.7 to VCSA7 and within Phase 2 you select “Subsequent vCenter Server” and point out to the first VCSA7.
  4. After successful migration (needed to add this VCSA7 to AD as well) you will notice the following components within the System Configuration:
  5. Now it is important to decommission the remaining PSC’s in the correct order using the cmsso-util. I will explain a bit more on this step below so continue reading.

What I initially did was check which PSC’s were replicating with both VCSA7 appliances using the showpartners parameter (described above)

The output was as follows:

VCSA7–01 connected to:
VCSA7-02
PSC67-01

VCSA7-02 connected to:
VCSA7-01

So it is clear that my new VCSA7’s are replicating with each other and that one of them is also linked to one of the old PSC’s. Following the cmsso-util instruction I powered down PSC67-01 and ran the unregister command:
cmsso-util unregister --node-pnid PSC67-01 --username administrator@vsphere.local --passwd 'password'

Now the showpartners parameter only shows replication between VCSA7’s. The showservers parameter however still gave me 3 of the 4 old PSC’s in the list.

I contacted VMware support about this as I could not get them removed with the cmsso-util. Apparently this happens quite often and long story short, the pointed me to the following command to get the remaining PSC’s in the list removed:
vdcleavefed -h <psc.domain.lan> -u Administrator -w <password>

This worked for me and the showservers parameter now only shows me two servers (the VCSA7’s). All good right? Well not completely.

The following commands can be executed to find out how many component registrations are made under the hood.

vCenter 7:
/usr/lib/vmware-lookupsvc/tools/lstool.py list --url https://localhost/lookupservice/sdk --no-check-cert >/tmp/psc.txt

grep -i 'service type:' /tmp/psc.txt |sort |uniq -c

vCenter 6.x:
/usr/lib/vmidentity/tools/scripts/lstool.py list --url https://localhost/lookupservice/sdk --no-check-cert >/tmp/psc.txt

grep -i 'service type:' /tmp/psc.txt |sort |uniq -c

The output is similar like this:

5 Service Type: applmgmt
2 Service Type: certificateauthority
5 Service Type: certificatemanagement

etc..

and basically tells me that I’ve got 5 registrations (for most services).
In my case, post-migration, this was: 2x VCSA7 and the 3 remaining external PSC’s , despite the vdcleavefed command which ran earlier.

Now the good news is that these stale records can be corrected by VMware Support, the bad news is that this will take some time to walk through as you need to go through each service.

Lets go back and rewind as I wanted to know how I can successfully get things migrated without the stale records appearing. What I noticed after the upgrade of both VCSA’s to 7.0 is that the PSC replication somehow got disturbed.

Looking at the topology I found this to be the case after migration:

So I reran the complete migration with one difference, the order of using the cmsso-util.

Steps taken:

  1. Power off PSC67-04
  2. From PSC67-03: cmsso-util unregister PSC67-04
  3. Power off PSC67-03
  4. From PSC67-01: cmsso-util unregister PSC67-03
  5. From PSC67-01: cmsso-util unregister PSC67-02
  6. Power off PSC67-01
  7. From VCSA7-01: cmsso-util unregister PSC67-01

And while going through this process everything got cleaned up successfully and no stale records appeared.

So what about these stale records? It can well be that during normal operations you will not even know about them and everything works correctly. However, you will run into issues with future upgrades.

Batch migrate VM’s with VMware Cross vCenter vMotion (Fling)

VMware already features Cross vCenter Migrations since vSphere 6.0. The biggest constraint for this feature is that you can’t use it via the vSphere Web Client if the vCenter Server instances are not in Enhanched Linked Mode and within the same vCenter Single Sign-On domain. (a full list of requirements can be found here.)

If vCenter Server instances are in different SSO domains VMware does give you the ability to use vSphere API/SDK,  but for most of us the usage of API/SDK is hard to understand.

Read the full post »

Filtering and Auditing within vRealize Log Insight

The article below describes one of the ways to implement vRealize Log Insight into an environment where a (security departments) syslog server is already available and where this server ideally only receives filtered events and auditing information around changes to the filters.

Let’s start of by saying that with the introduction of ESXi6, VMware makes it possible to do Log Filtering directly on the syslog service. This however is not recommended as it will make troubleshooting potential future issues impossible.

The overview below shows you a basic setup where Log Insight is used by the operations team and the security departments  syslog server only receives filtered events relevant to its purpose. This filtering can be setup in numerous ways like “everything,  except …” or “only these specific matches …”. So you could easily filter out all vpxa, vMotion, scsi, etc events that are not relevant for the security department but are useful for the operations team.

  1. Devices log towards Log Insight;
  2. Log Insight applies filtering;
  3. Filtered events get forwarded to the security departments syslog server.

Read the full post »

New-DeployRule : Cannot process argument transformation

While doing some vSphere 6 Auto Deploy development I found out that when using PowerCLI 6.3 R1 (latest release when writing this article) and using the New-DeployRule command you will receive an error message stating: New-DeployRule : Cannot process argument transformation on parameter ‘Item’. Unsupported version URI urn:rbd1/3.0

UnsupportedVersion

Read the full post »

Reset VMware Auto Deploy Database to default

One of the things that the VMware Auto Deploy Database contains is the ESX Images that are uploaded and related to the DeployRules that are created. For instance, if I add an ESXi Image to the software depot and create a new DeployRule using the following commands:

Add-EsxSoftwareDepot "VMware-ESXi-6.0.0.update01-3380124.x86_64-Dell_Customized-offline-bundle-A04.zip"
Get-EsxImageProfile | ft -autosize 
New-DeployRule -Name "Cluster01" -Item "Dell-ESXi-6.0U1-3380124-A04"  -Pattern "model=PowerEdge M630"

Read the full post »

The OVF package is invalid and cannot be deployed.

While deploying a random OVF image within vSphere I received the following error message: The provided network mapping between OVF networks and the system network is not supported by any host.

DeployOVF-Error

Read the full post »

NSX bug while using Logical Switch as a Destination in the Edge Firewall

While working on a cool VMware NSX project we have discovered a bug within the Edge Firewall when using a “Logical Switch” as a “Destination”.

Lets first start off with making troubleshooting easy and adding the “Rule Tag” and “Log” information to the Firewall view.

Firewall RuleTag

Read the full post »

Creating a single bootable ISO with HP SPP and MSB

This blog post will guide you through the process of creating a single bootable ISO file which consists of the latest HP Service Pack for Proliant (SPP) ISO and an additional HP Maintenance Supplement Bundle (MSB) which is delivered in a ZIP format. Additionally you can add single “Firmware Supplemental Updates” as well.

The HP Service Pack for Proliant is a complete system software and firmware solution that is delivered several times a year, mainly driven by the release of a new server. This SPP is delivered as a bootable ISO and can be used to do offline firmware upgrades. Due to the fact that the SPP is only delivered several times a year it will most likely not contain the most recent updates, that’s why HP lets you also download a Maintenance Supplement Bundle (MSB). The SPP combined with a MSB contains a fully supported set of software.

Read the full post »

Unresponsive/BSOD VM’s on ESXi 5.1 and 5.5

Since a few days I got several customers complaining about unresponsive or blue screening VM’s (both Windows 2008 and 2012) on ESXi5.1 and 5.5 environments. Troubleshooting at the customer site pointed out that the vnetflt.sys driver was causing these issues. This driver is part of the vShield Endpoint components that are installed whenever you a) explicitly installed them or b) when you installed VMware Tools with the “Complete Setup”-option.

Following this VMware KB article is appears that there is a memory leak in the vShield Endpoint and consequently this resolution is described:

This is a known issue affecting VMware Tools 5.1 and can impact ESXi 5.1 and 5.5.
This issue is resolved in:
Currently, there is no resolution in VMware ESXi 5.1.

To resolve this issue when you are using vShield Endpoint Protection in your virtual environment, uninstall and reinstall VMware Tools with the Custom or Complete setup option.

For ESXi 5.5 this is a easy solution but what about ESXi 5.1 environments that are dependent on the vShield Endpoint Components? (like environments running Trend Micro Deep Security or Symantec Endpoint Protection Manager)

There’s another VMware KB article which gives this as a resolution:

This issue is resolved in:

ESXi 5.5 Update 2, available at VMware Downloads. For more information, see the VMware ESXi 5.5 Update 2 Release Notes.

ESXi 5.1 Patch 04, available at VMware Download Patches. For more information, see VMware ESXi 5.1, Patch Release ESXi510-201404001 (2070666).

So be advised on these patches since it’s currently unclear to me why all these customers recently experienced these issues.

 

Failed to attach filter ‘pxdacf_filter’

During a recent customer visit we had a testing environment available on where some VM’s couldn’t be powered on/vMotionned to some of the ESXi Hosts. The error message:

An error was received from the ESX host while powering on VM xxxxx.
Failed to start the virtual machine.
Module DevicePowerOn power on failed.
Unable to create virtual SCSI device for scsi1:0, ‘/vmfs/volumes/39dfa56f-83350d20/xxxxxx/xxxxxx.vmdk’
Failed to attach filter ‘pxdacf_filter’ to scsi1:0: Not found (195887107).

The error message is similar to the one which VMware is describing in this KB article around vShield Endpoint: The virtual machine power-on operation fails with this error when a virtual machine that was earlier protected by vShield Endpoint is either moved or copied to a host that is not protected by the vShield Endpoint security solution.

However this customer wasn’t using vShield in this test environment and Google didn’t got any hits on the “pxdacf_filter”. Troubleshooting eventually pointed out that some of the ESXi Hosts had Proximal Data AutoCache installed and VM’s that are accelerated by Proximal contain the following lines in their .vmx file:

scsix:x.filters = “pxdacf_filter”

Which obviously caused the VM to be unable to power-on on an ESXi Host without Proximal Data AutoCache installed.