vSphere Authentication Proxy: Failed to bind CAM website with CTL

While troubleshooting the vSphere Authentication Proxy (vSphere 5.1) I ran into a bug which will I highlight in this article.

After successfully installing the Authentication Proxy I discovered an error message in the C:\ProgramData\VMware\vSphere Authentication Proxy\logs\camadapter.log:

2012-xx-xx 14:03:35: Failed to bind CAM website with CTL
2012-xx-xx 14:03:35: Failed to initialize CAMAdapter.

Read the full post »

Unable to enable the vSphere Authentication Proxy Plug-in

I recently had some issues getting the vSphere Authentication Proxy Plug-in to be enabled within vCenter Server 5.1. This is the error message that was displayed: “The server could not interpret the client’s request. (The remote server returned an error: (404) Not Found.)” 

 

Read the full post »

View a .vpf file (VMware Profile Format)

This article describes a very easy method on how to fashionable view a VMware Profile Format file (.vpf) that is used with VMware Host Profiles.

Why would you like to read the .vpf file?
Certain troubleshooting scenarios would ease you work if you are able to search for a specific content within the Host Profile

So how do we view this .vpf file?

  1. Export the Host Profile from the vCenter Server, this will give you the .vpf file;
  2. Rename the profile.vpf to profile.xml (so change the extension);
  3. Open the profile.xml with for instance  Google Chrome or Internet Explorer.
Google Chrome gives you a very easy overview like displayed below:

 

Migrate to new vCenter Server while using dvSwitches

During vSphere 4.x to 5.x updates I occasionally run into the situation where dvSwitches are used on the current vCenter4  and the customer wants to install a fresh copy of vCenter5 (with a new dvSwitch).

Within this article I will describe a minimum downtime (1 ping-loss or less) procedure to cover this, guided by pictures to get a clear understanding.

Step 1 – Starting point
First of all we got the ESXi Host and VM’s connected to the dvSwitch.

Read the full post »

Random ESXi LUN loss in an 8Gb FC Brocade infrastructure

Recently Dennis Agterberg, a fellow contractor, experienced some major issues with random LUN loss on all his ESXi Hosts. The environment consists of HP Virtual Connect FC modules connected to 8Gb Brocade switches.

From a vCenter perspective repetitive “lost access to volume … due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly” messages appear while the vmkernel log is showing messages like:

2012-05-02T01:34:02.805Z cpu3:4099)<3>lpfc820 0000:02:00.2: 0:(0):0717 FCP command x12 residual underrun converted to error Data: xff xff x24

2012-05-02T01:34:02.805Z cpu3:4099)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x412400d086c0) to dev “naa.60060160d8901f00e859cf520978e111” on path “vmhba0:C0:T7:L3” Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.Act:EVAL

2012-05-02T01:34:02.805Z cpu3:4099)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device “naa.60060160d8901f00e859cf520978e111” state in doubt; requested fast path state update…

In short Dennis pointed out the problem to be an incorrect setting of the portCfgFillWord mode on the Brocade FC Switches.

So what’s this Fill-Word stuff? A Fill-Word is a primitive signal which is needed to maintain bit and word synchronization between two adjacent ports and is more error prone with the increased clockspeed on 8Gb links. Read the complete article about Fill-Words here.

The HP Virtual Connect Release Notes indicate that the portCfgFillWord mode needs to be set to 3 whenever you are connecting to a Brocade 8Gb FC Switch:

When VC 8Gb 20-port FC Module and VC FlexFabric 10Gb/24-port Module Fibre Channel uplink ports are configured to operate at 8Gb speed and connect to HP B-series (Brocade) Fibre Channel SAN switches, the minimum supported version of the Brocade Fabric OS (FOS) is v6.4.x. In addition, the Fill Word on those switch ports must be configured with option Mode 3 to prevent connectivity issues at 8Gb speed.

On HP B-series (Brocade) FC switches, use the portCfgFillWord (portCfgFillWord <Port#><Mode>) command to configure this setting.

 Mode Link Init/Fill Word

Mode 0 IDLE/IDLE

Mode 1 ARBF/ARBF

Mode 2 IDLE/ARBF

Mode 3 If ARBF/ARBF fails, use IDLE/ARBF

Modes 2 and 3 are compliant with FC-FS-3 specifications (standards specify the IDLE/ARBF behavior of Mode 2, which is used by Mode 3 if ARBF/ARBF fails after 3 attempts). For most environments, Brocade recommends using Mode 3, as it provides more flexibility and compatibility with a wide range of devices. In the event that the default setting or Mode 3 does not work with a particular device, contact your switch vendor for further assistance.

Basically the key takeaway here is to mind the portCfgFillWord mode whenever setting up or migrating towards an 8 Gb FC Brocade infrastructure.

Datacenters vs Clusters within the vCenter Inventory

Recently I got involved in a discussion about “Creating multiple Datacenters with one Clusters rather than one Datacenter with multiple Clusters”

First of all lets get the definitions of these objects summarized (as stated in the vSphere Datacenter Administration Guide)

Clusters

A collection of ESX/ESXi hosts and associated virtual machines intended to work together as a unit. When you add a host to a cluster, the host’s resources become part of the cluster’s resources. The cluster manages the resources of all hosts. VMware Features like EVC, DRS, DPM and HA are enabled on a per Cluster basis.

Read the full post »

Running scripts on the ESXi shell with correct time-stamps

While working on the ESXi shell you might notice that the “date” command returns the UTC time instead of displaying the correct time zone like ESX Classic does.

This is by design within ESXi and stated in this VMware KB article: ESXi uses UTC time and does not support changing time zones

Although it’s not supported to change the time zone it could come in handy to get the correct time-stamps for custom scripts within the ESXi Shell.
To get the correct time-stamps you only need to add the following line in the beginning of your script:

export TZ=MET<-/+><no of hours>

In my case UTC is off with -1 hour with the actual time, which means that I need: export TZ=MET-1

Please note that this command only changes the time zone for the current Shell session as long as it’s active, so it doesn’t change the system time zone.
Also don’t forget to check the output with daylight saving time if applicable to your current time zone.


Dutch vBeers – 14 July 2011

Another vBeers gathering will be held on Thursday 14th of July starting from 6:00pm in ‘Cafe de Omval’ which is located near the Amsterdam Amstel station. This venue is selected since it’s  easy to reach for people not coming from Amsterdam and serves a fine of selection of beers along with soft drinks and bar food.

Note that  Frank Denneman, one of the two great authors of the VMware vSphere Clustering Technical Deepdive (available now!) will also be attending. This is a great moment to meet up with him!

Drinks will not be paid for, there will not be a tab. When you buy a drink please pay for it as no one else will be paying for your drinks.

  • Location: ‘Cafe de Omval’ Amsterdam
  • Address: Weesperzijde 250 (hoek Omval), 1097 EB Amsterdam
  • Nearest Train Station: Amsterdam Amstel
  • Time: 6:00pm
  • Location: Map

VM Swapfile (.vswp) placement with SRM

Nowadays VMware Site Recovery Manager (SRM) gets implemented more and more and like vSphere, VMware SRM needs a good architectural design before starting off.

One of the design considerations is around the placement of the Virtual Machine Swap File (.vswp) which I want to give some more information about in this article.

Let’s first take a look at the VM Swap File (.vswp), by default this file is placed in the VM “working directory” which also contains all the other VM files. The .vswp is created every time the VM is started and equals the size to the unreserved memory configured on the VM. If the VM is configured with 2 GB and memory reservation is set to 0 MB (default) the VM Swap File will be 2 GB. If memory reservation in this example would be 1 GB than the .vswp file will be 2GB – 1GB = 1 GB total.

The design considerations are about:

  1. Keeping the .vswp file in its default “working directory”;
  2. Placing the .vswp on a separated non-replicated datastore.

Keeping the .vswp file in its default “working directory” means that the .vswp file will be replicated to the recovery site as indicated in the next overview:

Pros:

  • Ease of manageability, all VM files are together and it’s default;

Cons:

  • More replication bandwidth is needed for files (.vswp) that aren’t used at the recovery site;
  • Cost is higher since more replicated storage space is used;
  • Increases the recovery speed of both the test and real failover. This is due to the fact that SRM explicitly deletes the useless .vswp files on the recovery site before starting the VM’s.

Placing the .vswp on a separated non-replicated datastore involves some manual work, possibly even reconfiguration of all the current available Virtual Machines. The following overview shows the configuration:

Pros:

  • Does not consume unused storage replication traffic;
  • Uses less replicated storage, which could be more expensive than non-replicated storage.

Cons:

  • Can be more difficult to manage because some parts of a virtual machine reside on a separate datastore.
  • Requires additional configuration and management processes within SRM since the VM would be detect with a non-replicated datastore which consequently causes SRM to remove the VM from its protection group.

An important note that needs to be made is around NFS storage. As indicated, one of the drawbacks on keeping the .vswp file in the “working directory” is the fact that this increases the recovery speed since SRM deletes the replicated, useless, .vswp file before starting the VM.

Deleting the .vswp file from a newly recovered NFS datastore can take up some time since ESX needs to wait for the replicated file lock to expire (default 35 seconds).  A quote from the the Best Practices on NAS Whitepaper:

Once a lock file is created, VMware periodically (every NFS.DiskFileLockUpdateFreq seconds) send updates to the lock file to let other ESX hosts know that the lock is still active. Changing any of the NFS locking parameters will change how long it takes to recover stale locks. The following formula can be used to calculate how long it takes to recover a stale NFS lock:

(NFS.DiskFileLockUpdateFreq * NFS.LockRenewMaxFailureNumber) + NFS.LockUpdateTimeout

If any of these parameters are modified, it’s very important that all ESX hosts in the cluster use identical settings. Having inconsistent NFS lock settings across ESX hosts can result in data corruption!

This timeout isn’t applicable on VMFS datastores because the auto-resignaturing process  drops the file locks automatically.

As with every design decision it’s all about knowing the pros/cons of the available options you have and as such select the best option for your environment.

Self-employed, delivering Virtualization Consultancy

After working for VMware for over a year I decided to take the next step in my career: to start Van Ditmarsch Consultancy

As a contractor I will be available to do Virtualization Consultancy, primarily focused on designing, implementing and/or validating VMware Virtualization solutions with a strong focus on business continuity, high availability and disaster recovery.

I want to thank VMware for their understanding and I want to thank all the great people I met while working for VMware since they definitely have the best people around!

Thanks guys, all the best and we will certainly stay in touch!