Workaround for Cause 1:
This problem is currently has a fix in version VMS 02.00.06.00. This version is supposed to prevent the memory leak causing this issue.
If not at this version, then reboot of VMS will resolve issue.
NOTE: If the system being worked on has multiple VMS Servers, and the uptime is the same, then please reboot all the VMS Servers. Failure to reboot the
other VMS Servers will result in more incidents that could be avoided if time had been taken to reboot all the VMS Servers showing the same amount of uptime.
1. Check the uptime on the VMS Servers by running the following command as root on the SWS
vmscmd -a uptime
2. Create a maintenance window
3. Try connecting to the VMS Server
How to connect to or reboot a VMS Servererver:
- Can you ping any IP on the VMS Server?
- If Yes then go to step 4
- If No then go to step 5
4. Try connecting to the VMS Server using the following procedure?
- Connect to the VMS Server using KCS004526 as root and run the following command
ipmitool chassis power reset
- If you can't connect then go to step 5
NOTE: It can take up to 45 minutes to reboot a VMS Server that has more than 4 drives due to fsck needing to be run on all drives. If 45 minutes has passed and the
VMS Server has not recovered, then a CSR must be dispatched to pull the power plugs and bring the VMS Server back up.
5. Use the procedure in KAP1B2EF2 to reboot the VMS Server remotely.
6. If steps 1-5 fail, then schedule a onsite visit to do the following
- Pull the power plugs on the VMS Server that’s hung
- Wait about 20 seconds
- Plug the power plugs back in and power up the VMS Server
- Look for errors during power up
7. Open SMweb again after the VMS Server has rebooted
8. If you see any nodes showing "Chassis No Contact", and you have the latest gsctools package installed, then run the following command as root on the SWS
9. If you don't have the gsctools package that has the racreset script then either download and install the latest version of gsctools or use KAP2BAE4A
Solution for Cause 2:
1. Connect to the VMS Server as root. Refer to KCS004526 if assistance is needed.
2. Make sure the vmname= entry in the CMIC Config xml file matches the vm names from the output of the vm-list command
3. Refer to knowledge article KAP315BAB2 for how to change a vmname in the CMIC Config xml file
In this example the SWS was migrated from XEN to KVM but the vmname was not changed from sws1 to sws1-kvm in the CMIC Config xml
Example from VMS vm-list
Name ID Mem VCPUs Type State
cmic95 2 10240 7 HVM Running
sws1-kvm 3 10240 7 HVM Running
sws1 10240 7 XEN Not under KVM domain management
From CMIC Config xml file
<Chassis idnum="62" vmname="sws1" vmoerole="SWS"> <== in this case sws1 should have been changed to sws1-kvm
Solution for Cause 3:
1. Connect to SMweb and take note of the chassis positions of the VMS Servers
2. Make sure the entries in the CMIC Config xml for the hosted VM's have the correct chassis position for the VMS Server hosting that VM
If you look at the /datapart/cmic/trace/mepluginhost.MEPlugin_VMS_R1000GZ-1.log on the SOV or Master CMIC you will see what hosted vm it's complaining about.
MEPlugin_VMS_R1000GZ_22.214.171.124:CVMSCIMStateUpdateThread::ValidateVMConfig(): VM cmic1 (KVM) (oerole ) is unmanaged -- bad config?
In the example below SMweb was showing "Self Managed, Unmanaged VM(s) detected" due to the wrong chassis position 11 for the VMS Server. It should have been 12
In this case you would simple change the <ChasisID> from 11 to 12 to resolve this problem
<Chassis idnum="63" type="cmic" vmname="cmic1" vmoerole="CMIC">
<ChassisID>11</ChassisID> <============= Should have been 12