Tuesday, May 7, 2013

Workaround for the vCenter Server appliance 5.1U1 update delay

This blog has moved to its own domain: www.vaspects.com
Please update your bookmark.

The update process from 5.1.x to 5.1 Update 1 contains a serious flaw. The update may take more than 45 minutes, some report more than one hour. VMware even mentions this in their release notes:
Update of vCenter Server Appliance 5.1.x to vCenter Server Appliance 5.1 Update 1 halts at web UI while showing update status as installing updates*
When you attempt to upgrade vCenter Server Appliance 5.1.x to vCenter Server Appliance 5.1 Update 1, the update process halts for nearly an hour and the update status at Web UI shows as installing updates. However, eventually, the update completes successfully after an hour.

Workaround: None.

The generic update documentation KB article 2031331 "Updating vCenter Server Appliance 5.x" mentions even longer durations:
The update process can take approximately 90 to 120 minutes. Do not reboot until the update is complete.

Well, there is a workaround, even a very simple one:
  • log in to the appliance via SSH as root
  • execute "rm /usr/lib64/.lib*.hmac"
  • perform the update using the web UI
The update will take only a few minutes, in my case less than 10. The appliance needs to be rebooted and runs fine afterwards. Don't worry about these files, they will be deleted during the update anyway.

The .hmac files contain hashes of /usr/lib64/libcrypto.so.0.9.8 and /usr/lib64/libssl.so.0.9.8 used for FIPS compliance. When the corresponding packages are updated, these files are not deleted immediately:

-r-xr-xr-x 1 root root 1685176 Jul 10  2012 /usr/lib64/libcrypto.so.0.9.8
-r-xr-xr-x 1 root root  343040 Jul 10  2012 /usr/lib64/libssl.so.0.9.8

-rw-r--r-- 1 root root      65 Jan 11  2012 /usr/lib64/.libcrypto.so.0.9.8.hmac
-rw-r--r-- 1 root root      65 Jan 11  2012 /usr/lib64/.libssl.so.0.9.8.hmac

The mismatch between libraries (binaries) and hashes causes all applications using OpenSSL to fail with messages like
fips.c(154): OpenSSL internal error, assertion failed: FATAL FIPS SELFTEST FAILURE
Regarding the appliance update the vami-sfcb fails to start, thus delaying the whole update process until the maximum retry limit for this service is reached. If the appliance is rebooted before this timeout, the postinstall phase was not executed and the vCenter will not start anymore. Either because of said OpenSSL error or because the vpxd does not start with the error message
Database version id '510' is incompatible with this release of VirtualCenter.
I was able to revive the appliance in my lab, but this is of course neither supported nor recommended. It runs fine again, but the state is not consistent and I would always recommend to boot it just one more time to perform a migration to a fresh installation and save the configuration & data. Depending on when the update was interrupted, your results may vary.

If the appliance itself does not properly start anymore, boot it from a Linux live CD (GParted or Parted magic are sufficient), mount the filesystem and delete the .hmac files. Perform a normal boot afterwards.
If the web UI allows to do a normal update, do so, and you should be fine.

Otherwise try it manually (the following steps assume you're familiar with Linux and you should check the prerequisites):
  • Log in to the appliance via SSH as root
  • cd /opt/vmware/var/lib/vami/update/data/job
  • cd to the latest subdirectory, which should have the highest number
  • Check if the update belongs to 5.1U1
    head manifest.xml
    You should see build
  • Attach the updaterepo ISO to the VM
  • mount /dev/sr0 /media/cdrom   (create if necessary)
  • cd /opt/vmware/var/lib/vami/update/data/package-pool
  • ln -s /media/cdrom/update/package-pool package-pool
  • cd back to the job subdirectory
  • ./pre_install '' ''
  • ./test_command   (may report "failed dependencies")
  • cp -p run_command run_repair
  • vi run_repair and change the first command from "rpm -Uv" to "rpm -Uv --no-deps --replacepkgs"
  • ./run_repair   (ignore "insserv: script jexec is broken" etc)
  • Check if a duplicate vfabric-tc-server-standard package exists
    rpm -q vfabric-tc-server-standard
  • If yes (more than one line of output), delete the older version, otherwise /usr/lib/vmware-vpx/rpmpatches.sh will fail
    rpm -e vfabric-tc-server-standard-2.6.4-1   (in my case)
  • ./post_install '' '' 0
  • ./manifest_update
  • That's it basically, now just the cleanup
    cd /opt/vmware/var/lib/vami/update/data
    rm -r job/*
    rm cache/*
    umount /media/cdrom
  • reboot
Be aware that most likely old versions of some packages will still be installed. Again: this is not a stable state, just (hopefully) enough to save your data... good luck!

No comments:

Post a Comment