Managing_VIC_Container_Hosts

December 13, 2016

Managing VIC Container Hosts (VCH)

To reboot or power off a vSphere Integrated Container host (VCH), enter “debug” mode using the VIC cli from a Docker client. Do not use the vSphere client.

“IMPORTANT: Do not use the vSphere Web Client to perform operations on VCH appliances or container VMs.
Specifically, using the vSphere Web Client to power off, or delete VCH appliances or container VMs can
cause vSphere Integrated Containers Engine to not function correctly. Always use vic-machine to perform
operations on VCHs. Always use Docker commands to perform operations on containers.”

Taken from Github:
https://vmware.github.io/vic-product/assets/files/pdf/0.8/vic_admin.pdf

Pre-requisites: make sure all containers running on the Virtual Conatiner Host are exited, stopped.

docker@ubuntu:~/vic$ sudo ./vic-machine-linux debug –target=192.168.1.3 –user=administrator@vsphere.local –password=xxxxxx –thumbprint=11:41:53:20:A1:60:81:CB:2B:9B:C3:1D:AF:42:76:AF:A4:35:C3:63 –name vmvicvch001

Sample output: entering debug mode and then shutting down a VCH

docker@ubuntu:~/vic$ sudo ./vic-machine-linux debug –target=192.168.1.3 –user=administrator@vsphere.local –password=xxxxxxxx –thumbprint=11:41:53:20:A1:60:81:CB:2B:9B:C3:1D:AF:42:76:AF:A4:35:C3:63 –name vmvicvch002
INFO[2016-12-10T22:45:00-08:00] ### Configuring VCH for debug ####
INFO[2016-12-10T22:45:00-08:00]
INFO[2016-12-10T22:45:00-08:00] VCH ID: VirtualMachine:vm-24
INFO[2016-12-10T22:45:01-08:00]
INFO[2016-12-10T22:45:01-08:00] Installer version: v0.8.0-7315-c8ac999
INFO[2016-12-10T22:45:01-08:00] VCH version: v0.8.0-7315-c8ac999
INFO[2016-12-10T22:45:01-08:00]
INFO[2016-12-10T22:45:01-08:00] SSH to appliance:
INFO[2016-12-10T22:45:01-08:00] ssh root@192.168.1.11
INFO[2016-12-10T22:45:01-08:00]
INFO[2016-12-10T22:45:01-08:00] VCH Admin Portal:
INFO[2016-12-10T22:45:01-08:00] https://192.168.1.11:2378
INFO[2016-12-10T22:45:01-08:00]
INFO[2016-12-10T22:45:01-08:00] Published ports can be reached at:
INFO[2016-12-10T22:45:01-08:00] 192.168.1.11
INFO[2016-12-10T22:45:01-08:00]
INFO[2016-12-10T22:45:01-08:00] Docker environment variables:
INFO[2016-12-10T22:45:01-08:00] DOCKER_HOST=192.168.1.11:2375
INFO[2016-12-10T22:45:01-08:00]
INFO[2016-12-10T22:45:01-08:00] Connect to docker:
INFO[2016-12-10T22:45:01-08:00] docker -H 192.168.1.11:2375 info
INFO[2016-12-10T22:45:01-08:00] Completed successfully

Notice ssh is enabled now. “ssh root@192.168.1.11”

docker@ubuntu:~/vic$ sudo ssh root@192.168.1.11
The authenticity of host ‘192.168.1.11 (192.168.1.11)’ can’t be established.
ECDSA key fingerprint is SHA256:OVBVYt/Bzc3HrnMpDfTqpevxk5tXLmyGiXGCa/7y7DM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘192.168.1.11’ (ECDSA) to the list of known hosts.
Password:
root@vmvicvch002 [ ~ ]# shutdown -h now
root@vmvicvch002 [ ~ ]# Connection to 192.168.1.11 closed by remote host.
Connection to 192.168.1.11 closed.

Verify the VIC VCH performed the operation using the vSphere Web Client.

To power on a VIC VCH, use the vSphere Web client and power on the vAPP container host, not the containers. There currently is no power on command within vic-machine. See below output from vic-machine help. Containers must be stopped and started using docker commands.

See wiki here: vSphere Integrated Containers – Docker stop and start commands

docker@ubuntu:~/vic$ sudo ./vic-machine-linux –help
NAME:
vic-machine-linux – Create and manage Virtual Container Hosts

USAGE:
vic-machine-linux [global options] command [command options] [arguments…]

VERSION:
v0.8.0-7315-c8ac999

COMMANDS:
create   Deploy VCH
delete   Delete VCH and associated resources
ls       List VCHs
inspect  Inspect VCH
version  Show VIC version information
debug    Debug VCH

GLOBAL OPTIONS:
–help, -h     show help
–version, -v  print the version

Rob Shaw – 12/13/2016

0

vSphere_Integrated_Containers_docker_stop_start

December 13, 2016

vSphere Integrated Containers – Docker stop and start commands

vic-splash

To get info on containers running, use “docker ps -a”
It is useful for container names, IDs and IP addresses of web containers/applications.

Example 1:

docker@ubuntu:~/vic$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
5f19f6793ffa        busybox             “sh”                5 hours ago         Up 5 hours                              naughty_goldberg

Example 2:

docker@ubuntu:~/vic$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                     NAMES
f23b1f10281f        nginx               “nginx -g daemon off;”   About a minute ago   Up 54 seconds       192.168.1.12:80->80/tcp   condescending_euclid

With the container ID, we can and issue “docker start” or “docker stop” to manage power state in vCenter.

docker@ubuntu:~/vic$ docker start 5f19f6793ffa
5f19f6793ffa

vch-shutdown

If we wanted to remove and delete a container, use docker rm {container ID}
docker@ubuntu:~/vic$ docker rm 5f19f6793ffa
5f19f6793ffa

“docker info” is another useful command for seeing details about your container host

docker@ubuntu:~/vic$ docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: v0.8.0-7315-c8ac999
Storage Driver: vSphere Integrated Containers v0.8.0-7315-c8ac999 Backend Engine
VolumeStores: default
vSphere Integrated Containers v0.8.0-7315-c8ac999 Backend Engine: RUNNING
VCH mhz limit: 7800 Mhz
VCH memory limit: 13.18 GiB
VMware Product: VMware vCenter Server
VMware OS: linux-x64
VMware OS version: 6.5.0
Plugins:
Volume:
Network: bridge
Swarm:
NodeID:
Is Manager: false
Node Address:
Security Options:
Operating System: linux-x64
OSType: linux-x64
Architecture: x86_64
CPUs: 7800
Total Memory: 13.18 GiB
Name: vmvicvchAdmiral
ID: vSphere Integrated Containers
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
Registry: registry-1.docker.io

To get details of a virtual host container, run inspect. The Docker endpoint environment variable is what you would give to your developer.

The endpoint info is added into the vSphere Web client after installing the VIC UI plug-in. Shown below:

docker@ubuntu:~/vic$ sudo ./vic-machine-linux inspect –target=192.168.1.3 –user=administrator@vsphere.local –password=xxxxxx –thumbprint=11:41:53:20:A1:60:81:CB:2B:9B:C3:1D:AF:42:76:AF:A4:35:C3:63 –name vmvicvch001
INFO[2016-12-10T21:47:58-08:00] ### Inspecting VCH ####
INFO[2016-12-10T21:47:59-08:00]
INFO[2016-12-10T21:47:59-08:00] VCH ID: VirtualMachine:vm-22
INFO[2016-12-10T21:47:59-08:00]
INFO[2016-12-10T21:47:59-08:00] Installer version: v0.8.0-7315-c8ac999
INFO[2016-12-10T21:47:59-08:00] VCH version: v0.8.0-7315-c8ac999
INFO[2016-12-10T21:47:59-08:00]
INFO[2016-12-10T21:47:59-08:00] VCH Admin Portal:
INFO[2016-12-10T21:47:59-08:00] https://192.168.1.8:2378
INFO[2016-12-10T21:47:59-08:00]
INFO[2016-12-10T21:47:59-08:00] Published ports can be reached at:
INFO[2016-12-10T21:47:59-08:00] 192.168.1.8
INFO[2016-12-10T21:47:59-08:00]
INFO[2016-12-10T21:47:59-08:00] Docker environment variables:
INFO[2016-12-10T21:47:59-08:00] DOCKER_HOST=192.168.1.8:2375
INFO[2016-12-10T21:47:59-08:00]
INFO[2016-12-10T21:47:59-08:00] Connect to docker:
INFO[2016-12-10T21:47:59-08:00] docker -H 192.168.1.8:2375 info
INFO[2016-12-10T21:47:59-08:00] Completed successfully

Portlet UI added from VIC plug-in (Virtual Container Host)

docker-endpoint-portlet

Rob Shaw – 12/13/2016

0

vRealize Automation 7.x – Cannot remove storage

August 4, 2016

vRealize Automation 7.x and “You cannot remove storage with name [Your Datastore Name] because there are managed machines still attached to it.”

Synopsis:

I had a virtual machine in vRealize Automation 7 that showed up in the UI and in the IaaS database but, did not actually exist anywhere in vSphere. The VM is question was deleted in vRA using the user’s tenant entitlement options. The VM was deleted in vSphere but, was stuck forever in vRA’s interface as “Finalized” and I could no longer perform any actions on this VM.

Removing the VM from the IaaS database allowed me to successfully remove the stuck Datastore from the reservation.

Caveat, if the VM still shows up in the vRA UI, you may need to do some further cleanup.

If at all possible, try using the VMware Cloud Client 4.x (4.1 is current) and remove the VM before touching the IaaS database. Try deleting the VM and removing the Datastore object after purging any VM objects tied to it.

If unsuccessful, try deleting it from the database.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144269

SELECT * FROM VirtualMachine WHERE VirtualMachineName = Your VM Name’

Note the Primary key in the IaaS database in the dbo.VirtualMachine table – ‘VirtualMachineID’

Verify the ID is a match on the VM in question by comparing the name in the VirtualMachineName column:

SELECT * FROM VirtualMachine WHERE VirtualMachineId = ‘3B23C483-C39D-40DE-AF7E-5D9BBDBBFE2D’

DELETE FROM [dbo].[VirtualMachine] where VirtualMachineId = ‘3B23C483-C39D-40DE-AF7E-5D9BBDBBFE2D’

Run an Inventory refresh “Data collection” in vRA and then you should be able to remove the Datastore object successfully from the compute resource’s reservation.
Once the managed machine was deleted from the database, it allowed the removal of the Datastore from the reservation. All managed machines = 0 or false freeing any vRA VM to Datastore dependencies.
SELECT * FROM VirtualMachine WHERE StoragePath = ‘Datastore Name”

Rob Shaw – 8/4/16

0

vRealize Automation 7.x Fix Snapshot operation in progress

August 4, 2016

vRealize Automation 7.x hung process cleanup Snapshot operation in progress…

1) Stop all DEM related services on the IaaS server (so no new requests can enter IaaS)

2) Run backup on the vRA database - can right-click the vra database in SQL and choose Backup

3) Run query to check for pending actions:

Select * from [dbo].[WorkflowOperations]

4) View results: validate IDs

5) After validation, delete the workflow operation by its ID.

Delete from [dbo].[WorkflowOperations] where Id = WorkflowOperationId

Example:

Delete from [dbo].[WorkflowOperations] where Id = 6297

6) Check pending actions:

Select * from [dbo].[WorkflowOperations]

7) Review the failed or hung action item(s) in the vRA tenant’s UI and validate that the message has cleared.

8) Restart the DEM worker services on the IaaS server. I typically do a gratuitous reboot of both vRA and IaaS servers when done.

Complete

Refer to VMware kb 2137019 – Note this kb is documented for vRA 6.x

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2137019

Rob Shaw – 8/4/16

0

vRealize Automation 7.0.1 Upgrade

July 8, 2016

vRealize Automation 7.0.0 to 7.0.1 Upgrade Process

High level steps:
Backup data
Upgrade vRA
Upgrade IaaS
Test/Validate

1) Stage upgrade binaries in a software repository, IIS or Apache – http://your-software-repository-host/repo/vcac/7.0.1/ – note if using IIS, you need to add MIME extensions. See method from VMware – https://kb.vmware.com/kb/1019288
Or use the CDROM ISO method by mounting the upgrade ISO image to the vRA appliance.

2) Backup vRA data from vRA core:
 /etc/vcac/
 /etc/vco/
 /etc/apache2/
 /etc/rabbitmq/

3) Backup vRA Postgres database – vRA core

4) Backup IaaS MS SQL database

5) Power off (in order)
5A) IaaS Windows Server
5B) vRealize Automation appliance – vRA core

6) Clone vRA core as backup (cleanup/delete after upgrade is successful) – Note you cannot increase a VM disk that has snapshots associated with it – cloning the VM in case of partition corruption on the vRA appliance.

7) Extend the vRA core appliance "Hard Disk 1" to 50GB from 18GB – required space for the upgrade per VMware HW requirements. Also, validate that the vRA core appliance has 18GB RAM and 4 vCPUS on 1 Socket.

8) Snapshot vms (vRA core and IaaS)

9) Power on vms – IaaS and then vRA core

10) Log into IaaS server and shutdown vRA services (in order)
10A) All VMware vCloud Automation Center agents – for each vCenter endpoint
10B) All VMware DEM workers
10C) VMware DEM orchestrator
10D) VMware vCloud Automation Center Service

Leave the IaaS server console open and do not reboot it once services are shutdown, they'll be restarted by the installer later on…

11) Validate that the IaaS service hosted in IIS is functional by testing its URL – https://your-IaaS-host/Repository/Data/MetaModel.svc
This should display an XML manifest page

12) Validate the IaaS repository log located at "C:\Program Files (x86)\VMware\vCAC\Server\Model Manager Web\Logs" on the IaaS server and verify that its status is OK in the logs – "context=""  token="" (22) Response: OK 0:00.013"

13) SSH into the vRA core appliance and stop all vRealize Automation services (in order)
13A) service vcac-server stop
13B) service vco-server stop
13C) service vpostgres stop

Example:
your-vra-host:~ # service vcac-server stop
Stopping tcServer
——————————————————————————–
=== Checking vCAC Configuration…
Found in /etc/vcac/security.properties: csp.host
Found in /etc/vcac/security.properties: vmidentity.websso.host
Found in /etc/vcac/security.properties: vmidentity.websso.tenant
Found in /etc/vcac/security.properties: vmidentity.websso.solution
Found in /etc/vcac/security.properties: certificate.store.file
Found in /etc/vcac/security.properties: certificate.store.password
Found in /etc/vcac/security.properties: csp.component.registry.url
Found in /etc/vcac/security.properties: certificate.store.websso.alias
Found in /etc/vcac/security.properties: certificate.store.ssl.alias
websso, Jan 12, 2016, PrivateKeyEntry,
Certificate fingerprint (SHA1): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
apache, Jan 12, 2016, PrivateKeyEntry,
Certificate fingerprint (SHA1): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
VCAC configuration seems to be ok.
VCAC stop is allowed.
——————————————————————————–
Instance is running as PID=5804, shutting down…
Instance is running PID=5804, sleeping for up to 60 seconds waiting for shutdown
Instance shut down gracefully
your-vra-host:~ # service vco-server stop
Stopping tcServer
Instance is running as PID=6120, shutting down…
Instance is running PID=6120, sleeping for up to 60 seconds waiting for shutdown
Instance shut down gracefully
your-vra-host:~ # service vpostgres stop
Stopping VMware vPostgres: Last login: Wed Jul  6 19:44:00 CDT 2016 on console
ok

Special Note: While executing the commands in steps (14 – 20), you will receive a message from the system that you must perform a reboot before continuing, disregard and continue exactly as follows below. DO NOT REBOOT UNTIL STEP 21.

14) Unmount the vRA swap partition with "swapoff -a"

15) Delete the existing Hard Disk 1 partitions and create a 44 GB root partition and a 6 GB swap partition (1 command)
(echo d; echo 2; echo d; echo 1; echo n; echo p; echo ; echo ; echo '+44G'; echo n; echo p; echo ; echo ; echo ; echo w; echo p; echo q) | fdisk /dev/sda

16) Change the swap partition type
(echo t; echo 2; echo 82; echo w; echo p; echo q) | fdisk /dev/sda

17) Set the Disk 1 bootable flag
(echo a; echo 1; echo w; echo p; echo q) | fdisk /dev/sda

18) Register the partition changes with the Linux kernel
partprobe

19) Format the new swap partition
mkswap /dev/sda2

20) Mount the swap partition
swapon -a

21) Reboot vRA core appliance

22) Wait for the appliance to boot and resize the Disk 1 partition table – this may take a couple minutes
resize2fs /dev/sda1

Example:
your-vra-host:~ # resize2fs /dev/sda1
resize2fs 1.41.9 (22-Aug-2009)
Filesystem at /dev/sda1 is mounted on /; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 3
Performing an on-line resize of /dev/sda1 to 11534336 (4k) blocks.
The filesystem on /dev/sda1 is now 11534336 blocks long.

After resizing your vRA appliance, it should look like this using the disk free command df -h
sda1 should now be 44G

your-vra-host:~ # df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        44G   11G   31G  27% /
udev            8.9G  112K  8.9G   1% /dev
tmpfs           8.9G   12K  8.9G   1% /dev/shm
/dev/sdb1       6.9G 1015M  5.6G  16% /storage/log
/dev/sdb2       7.9G  147M  7.4G   2% /storage/ext
/dev/sdc1        25G  173M   24G   1% /storage/artifactory
/dev/sdd1        50G  1.4G   46G   3% /storage/db

23) Prepare for the upgrade. Go to the vRA VAMI – https://your-vra-host:5480
log in as root, go to the Update tab, click settings and add your repository URL – http://your-software-repository-host/repo/vcac/7.0.1/, then click save.

24) Verify ALL vRA services are running all except "iaas-service" and at least 2 vCO services must be running in the VAMI prior to launching the upgrade, on services tab – https://your-vra-host:5480

25) Go back to the updates Status tab and click "Check Updates" – you should see "Available Updates in green font appear.

Update Status

26) Click "Install Updates" and proceed, you will see an "installing updates…" screen

27 Optional) SSH into the vRA core and watch logs related to the upgrade progress from the following locations:
/opt/vmware/var/log/vami/updatecli.log
/opt/vmware/var/log/vami/vami.log
/var/log/vmware/horizon/horizon.log

28) Once the installer completes, reboot the vRA core appliance and wait for all vRA core services to start in the VAMI except IaaS- https://your-vra-host:5480

29) Download the newly upgraded IaaS server components and stage them on the IaaS server-  https://your-vra-host:5480/installer – the "IaaS installer" link

30) Log in as "the admin/SQL DBO acct/ AKA administrative service account" and upgrade the new IaaS server components – run the installer on the IaaS server Desktop location and make sure that "Upgrade" is selected.
(admin account is "root" when prompted) after the installer runs and does its prerequisite check, leave all of the defaults selected on the "Detailed Components" screen except unselect "Use SSL for database connection.
on the "Component Registry" screen click "Load" and "Download" buttons to auto-polulate the fields, SSO admin = "administrator". click on the "test" links to validate. you should receive a "Passed" result status displayed.
finally click the "Upgrade" button and watch the output window for progress. This process, (IaaS installer) will take approximately 1.5 hours including data input. From clicking "Upgrade", it took "Execution time:86.656" to complete.
Note that the prerequisite checker should all return successful without having to modify the IaaS configuration.

31) Validate the upgrade version and functionality – test logging in, provisioning, logging and monitoring, vCO configuration, tenant settings such as endpoints, business groups, compute resources, naming prefixes, network pools/profiles and blueprints.

From VAMI post upgrade: Appliance Version: 7.0.1.150 Build 3622989

This is a condensed version of the steps included within the nicely written VMware document, "Upgrading from vRealize Automation
7.0 to 7.0.1" located here:

http://pubs.vmware.com/vra-70/topic/com.vmware.ICbase/PDF/vrealize-automation-701-upgrading.pdf

Thank you, Rob Shaw

7/8/2016

0

Determining the list of users logged in to vRealize Automation 7

March 9, 2016
 
I was searching for a way to discover which users were currently logged in to our vRealize Automation 7 suite so we could do some maintenenace and found this VMware kb article that was slightly dated but, pointed me in the right direction which was the IaaS database of course.
 
VMware kb article referenced:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2034809
 
I did a quick sanity check and found the tables referenced no longer existed in the vRA 7 IaaS database but, that there was a table named "UserLog" which had some interesting columns in it.
 
Note: the "Users" table seems to be a placeholder for all the users that have ever logged into vRA, not all of the Users allowed to log into vRA.
 
Running this SQL query from the vRA 7 IaaS database will return all the users sorted by date to find current logged in users along with what actions the user(s) performed.
 
SELECT UserName,Message,Timestamp AS 'Date' from UserLog ORDER BY Timestamp DESC;
 
Happy hunting!
Rob Shaw
3/8/16
0

Coffee XaaS, VMware Orchestrator Create Coffee Workflow – Arduino Project

January 16, 2015

Create Coffee Workflow:

I used VMware Orchestrator's REST API's to consume my coffee pot as an XaaS (Anything as a Service) element to let the Java flow.

I developed an Orchestrator workflow called Create Coffee to make me a cup of Java for any operational tasks that may take an extended amount of time.

Part of my motivation was wanting to challenge the notion of being able to really consume anything as a Service using VMware Orchestrator.
I thought, if I can consume or provision a cup of coffee in my physical World, then I should be able to consume most anything using vCO Dynamic Types provided
that thing has a SOAP or REST interface to leverage 🙂

The Create Coffee workflow element can be used in a workflow wrapper. For demonstration purposes, I chose to wrap up my Create Coffee within a
workflow that puts an ESXi host into maintenance Mode, a task that can sometimes take a few minutes depending on workloads.

Using the Arduino Yun board simply allowed me to present a REST interface in front of my coffee maker.

To achieve this, I used an Arduino Yun microcontroller, a PowerSwitch Tail II, a coffee maker and VMware Orchestrator.

I connected the Arduino Yun's digital pin #13 for the positive "+in" and the GND port for the negative "-in" connection to the PowerSwitch Tail II.

Sketch uploaded to the Arduino Yun:

Using RESTClient, a FireFox add-in, I tested my REST commands against the Arduino digital pins.


In Orchestrator, I added a REST host, REST operation, and then generated a workflow from the REST operation to communicate with my Arduino Yun microcontroller.

REST Operation created:

Creating a workflow from the REST operation:

Create Coffee workflow executed successfully:

The "kit":

Our host is in Maintenance Mode and our cup of coffee is ready!

So, if a coffee system ever arrives in the workplace that we can safely order coffee beverages from via web services, we'll be ready to workflow our
precious coffee into a repeatable task 🙂

Thanks VMware vRealize Automation!

Rob Shaw – 1/16/2015
 

 

0

Virtual machine “Edit Settings” are greyed out or not available for a virtual machine in the Virtual Infrastructure Client (VIC)

February 3, 2014

1) Putty to a host in the cluster (I always Putty to the host the target vm is on) and edit the vm's "edit vmx file" or the vmx` file with the tilde on it.  Do not edit the vmx file directly.

2) Change the value from "true" to "false" on the device place holder for the "CD-ROM" device using VI Editor.

ide0:1.present = "TRUE"

2a) Enter *i "insert mode", make the edit
2b) Type the esc key to exit VI edit mode
2c) Type :wq to save the file and exit VI Editor

3) Restart the management agents on the ESXi host using Putty or iLO to the ESXi host's DCUI.

4) Check to see if the vm's "Edit Settings" are available (probably wont be yet) in VIC.

5) Disconnect the host.

6) Reconnect the host.

7) "Edit Settings" are available now. Change the CD-ROM to Client Device, pass through IDE mode.

8) Migrate vm or enter Maintenance Mode.

0

VMware HA Cluster Failure – Split Brain Interrogation

September 13, 2010

If one or more VMWare ESX cluster nodes have suffered a hard crash or failure, you must reintroduce them back into the cluster by following these steps below. Do these steps for each host one at a time. This guide is helpful when multiple ESX hosts in an HA cluster have crashed due to a power outage, massive hardware failure, etc and the HA service on all or some of the ESX nodes in the cluster are non-functional. Furthermore, virtual machines have been displaced by the God forbid this ever happens to you "split-brain scenario".

It may be useful using PowerShell to initially query the cluster for your HA Primaries. I use the VMware PowerCLI and run this simple script I call Get-HA-Primaries.ps1

Connect-VIServer YourVirtualCenterServerNameHere
((Get-View (Get-Cluster YourESXClusterNameHere).id).RetrieveDasAdvancedRuntimeInfo()).DasHostInfo.PrimaryHosts

This will output what the cluster currently knows about HA Primaries.

1)      At the root of the cluster, disable VMotion by setting it to “Manual”. This is to ensure that migrations do not start until all nodes are correctly configured  and are back in the cluster. In Virtual Center, right click the root of the cluster and choose “Edit Settings”, click on “VMWare DRS”, set it to “Manual” and  click OK.

2)      Power on the ESX host if it is off and watch it from the console to make sure it boots properly.

3)      Next, log into the SIM page of the host (if applicable) as root to validate that the hardware is not displaying any obvious problems.

4) In Virtual Center, verify that the ESX host is back in the cluster. If the host shows disconnected or has any HA errors, do steps 4 thru 8 in their exact   order.

5)      Restart the Virtual Center Server service – “VMware VirtualCenter Server”

6) Run the following commands from the problematic ESX host’s console (KVM, local console or Putty) as sudo or root.

        5) service vmware-vpxa restart

        6) service mgmt-vmware restart

        7) service xinetd restart

Verify that the VMware core services are running on the host server by typing:

         ps -ef | grep hostd

It should show results similar to this: The following shows that hostd is running.

root      1887     1  0 Oct31 ?        00:00:01 cmahostd -p 15 -s OK
root      2713     1  0 Oct31 ?        00:00:00 /bin/sh /usr/bin/vmware-watchdog -s hostd -u 60 -q 5 -c /usr/sbin/hostd-support /usr/sbin/vmware-hostd -u
root      2724  2713  0 Oct31 ?        00:11:41 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u
root     21263 12546  0 11:34 pts/0    00:00:00 grep hostd

End of host commands

        8) Reconfigure HA within VMCenter by right-clicking on the VM host and selecting    “Reconfigure for HA”. If any HA or connection errors persist, try disconnecting and reconnecting the host. These are both right-click operations on the host from within VMCenter. You may be asked to re-authenticate the host to VMCenter. Simply provide the root password for the host if you are prompted by this wizard.

If the host cannot be re-connected after following these steps, either call the VMWare lead or VMWare support at 1-877-4VM-Ware.

If the host becomes connected and operational, you may have VM guest registration issues.

There are several different scenarios that may require you to remove and re-add the virtual machines back into inventory. If multiple hosts crash simultaneously, you will most likely have HA issues that create a known state called “split-brain” whereas virtual machines are split around the cluster due to the SAN locking mechanism used by the ESX host servers. This results in more than one host “thinking” it has the same virtual machine registered to it. Also, the SAN locking on the hosts could have locks on the guest’s vswap files on several hosts at the same time. You must release the lock manually on each host with the outdated vswap file location info. This is time consuming. The virtual machine(s) will not boot until the lock is freed. The following command allows one to view where the lock is located (always on either vmnic0 or vmnic1) by enumerating the MAC address to determine which host has the invalid data.

vmkfstools -D /vmfs/volumes/sanvolumename/vmname/swapfile

tail -f /var/log/vmkernel

Once you identify the host, reboot it to flush the memory and locks to force the release of bad, outdated vm inventory data. Be sure to migrate all of the guests off and put the host into maintenance mode prior to rebooting it.

If the MAC indicates that the vm guest is actually locked on the host the guest is attempting to boot from, simply delete the vswap file and let the guest re-create it upon booting. The way to determine if the host booting the guest is the owner, the output command will contain all zeroes in the hex field the MAC would be otherwise. The vswap file is in the virtual machines folder in /vmfs/volumes/sanvolumename/vmname.

To view vm registration on a host, view /etc/vmware/hostd/vmInventory.xml

This is the esx host’s local database file for vm inventory.

Also can view this file via, vmware-cmd –l from the \ directory.
 

Good luck.

0

How To – Commit VMware snapshots

September 13, 2010

How To – Commit VMware snapshots
 
Snapshot activities are far more consistent and reliable when using the ESX host's Service Console in lieu of the vCenter GUI.
 
VMware recommends having free space equal to the snapshots and base disk size before committing snapshots.
 
If you do not have enough free space on the source LUN, migrate to another disk that has enough free space and consolidate the snapshots into a new virtual disk file (VMDK).

Virtual machines with snapshots ironically cannot be migrated with Storage vMotion. The server will need to be powered off and the virtual machine files will need to be manually migrated in a cold state (powered off).

For more information on consolidating disk files, see Consolidating snapshots (1007849).
http://kb.vmware.com/kb/1007849
 
To commit snapshots to a base disk from the command-line:
 
1. Find the path to the VMX file of the virtual machine either from the Virtual Infrastructure Client or by running the following command:
 
sudo vmware-cmd -l
 
/vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine1.vmx
/vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine2.vmx
/vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine3.vmx
/vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine4.vmx
/vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine5.vmx
/vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine6.vmx

2. Determine if the virtual machine has snapshots:
 
sudo vmware-cmd /vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine1.vmx hassnapshot
 
The output will look like one of the following:
hassnapshot() =
or
hassnapshot() = 1
If the result is not equal to one (1), there are no snapshots for the virtual machine and there is no reason to proceed further.

3. Remove (or commit) the snapshot by running the following command:
 
sudo vmware-cmd /vmfs/volumes/VM_DATA_C000_01/SomeVirtualMachine1.vmx removesnapshots
 
removesnapshots() = 1
 
If the result is one (1), the snapshots have been successfully committed. If the result is something other than one (1), file a Support Request with VMware Support and note this KB Article ID in the problem description. Note: The above procedure deletes all snapshots on the virtual machine and commits the changes in the delta disks to the base disc. The base disc has all changes to the data.
This process can take over an hour to complete. It all depeneds on the amount of snapshot deltas and the size of the disks to be committed.

0