Troubleshooting
Problem
Solving LPM issues with problem is all stack (PowerVC,HMC,VIOs,AIX,Linux and IBM i)
Symptom
Varies of customers configuration and levels.
Cause
Varies based on customer configuration and levels.
Environment
PowerVC,HMC,VIOs,AIX,Linux and IBM i
Diagnosing The Problem
Data Collection:
Note: Below requires Internet Explorer to work.
Resolving The Problem
Need to review various logs based on problem description.
First Created: May 04, 2019 - rajpat@us.ibm.com
Last updated - May 17, 2019 - rajpat@us.ibm.com
Last updated - June 06, 2019 - rajpat@us.ibm.com
Last updated - August 19, 2022 - rajpat@us.ibm.com
=================================================
*** NOTE: Contains dump, perf, devscan ( AIX,Linux,IBM i Client )
*** NOTE: Only highlight the points or items from this sections for customer to capture from this complete list.
*** If using dual Hardware Management Console, provide data from both Hardware Management Console
High level problem description and data collection required.
1) Problem description.
2) VIO
3) Client ( AIX,Linux,IBM i )
4) HMC pedbg ( If using dual HMC data from both HMC )
5) RSCT
6) System Firmware Resource and Platform Dump
7) Fabric switch ( For NPIV configurations )
8) Devscan ( For NPIV configurations )
9) Additional data from HMC for NPIV config.
10) Checking network performance using open source IPERF (executable can also be down loaded from this link)
11) In case of hang or SRC 2005
12) Requirements for PowerVC
13) FTP Data. for Blue Diamond Follow BD steps.
=============== Start ===================
1) Problem description
a) Make sure all date and time on hmc, vio, etc are all in sync.
b) Is this using single or dual HMC ?
c) Are you using GUI or CLI ?
d) Are these single or concurrent LPMs?
e) Are you using LPM tookkit ?
f) Is this a test or production ?
g) Is this a new configuration ?
h) When was this last working ?
i) What changes if any were made to system firmware, vio level, hmc level, client lpar, network, switches etc ?
j) ** Provide complete date and time of error and any screen shots. **
k) How long has the LPM been running if hung or slow and SRC code on HMC on Source and Destination frames ?
l) Is this using PowerVC. If so provide PowerVC logs ( see section item 12 for PowerVC )?
2) VIO snaps from both VIOs on source and both VIOS on target. Rename to indicate source and target.
$ snap ( run from padmin. Creates file /home/padmin/snap.pax.Z)
3a) AIX client snap
# snap -r ( clear old snap )
# snap -ac ( data in /tmp/ibmsupt/snap.pax.Z)
3b) Linux snap ( sosreport requires root permissions to run )
# sosreports ( data in /var/tmp )
3c) IBM i Client: IBM i must gather (QMGTOOLS), which the customer should update, and then run the IBM i SYSSNAP
- LPM Precheck For IBM i clients
- MustGather: How To Obtain and Install QMGTOOLS
- QMGTOOLS: System Snapshot (SYSSNAP)
- IBM HMC Classic View: Collecting PEDBG from the HMC
- IBM HMC Enhanced View: Collecting PEDBG from the HMC
In short CLI command:
# pedbg -c -q 4;
- Say, YES when prompted to collect ctsnap.
(if there are RSCT problems. ctsnap can hang)
- File created is
"HsClogsXXzz2007zzzzzz.zip", created in /dump directory.
- Copy file to the common directory for FTP to IBM.
# scp /dump/HSClogsXXzz2007zzzzzz.zip
@:.
(NOTE: the ":." means keep the same name on remote,
do not forget the :. At the end of the line )
- Rename the file to include the PMR number
# mv HSClogsXXzz2007zzzzzz.zip case.pedbg.zip
5) RSCT
- From VIO Servers within oem_setup_env ( Included in snap with VIO 2.2.2.1 but may still need below for RMC / RSCT )
$ oem_setup_env -
# ctsnap -x runrpttr - This will create => /tmp/ctsupt/ctsnap*.tar.gz
- From AIX lpar:
# ctsnap -x runrpttr - This will create => /tmp/ctsupt/ctsnap*.tar.gz
- From Linux lpar:
# ctsnap -x runrpttr - This will create => /tmp/ctsupt/ctsnap*.tar.gz
- From HMC ( Requires pesh password ) - pedbg option 7
# ctsnap -x runrpttr
6) System Firmware non disruptive resource and platform dump FOR BOTH Source and target managed system:
If using dual HMC, provide from both HMC
a) Access HMC restricted shell To get {managed_system}:
# lssyscfg -r sys -F name
b) startdump -t resource -m {source_managed_system} -r "system"
c) startdump -t resource -m {target_managed_system} -r "system"
d) This creates a file in /dump/SYSDUMP.SRLNMBR.DUMPNMBR.TIMESTAMP....
e) Please re-name the file to indicate if its for source_managed_system or target_managed_system
- How to Initiate a Resource dump from the HMC - Classical
- How to Initiate a Resource dump from the HMC - Enhanced GUI
7) Fabric switch ( For NPIV configurations )
Switch Logs for Cisco & Brocade:
- Switch logs ("show tech-support details" for Cisco)
- 'show tech detail' if more then 1 switch.
- 'supportshow' collected via CLI using either
HyperTerm or Putty to collect the output for Brocade.
- 'supportsave' as that has additional debug information.
- How the RSCN ( Registered State Change Notification )
events are sent when a zoning change is done on the switch?
Switch Logs for McData:
- "data collection" from the switch management console, EFCM.
- any other related to ones described above under Cisco / Brocade.
Switch Logs using iSCSI TOE:
- igroup show
- lun show -m
** For FCOE Types also include: **
- "show tech-support fc"
8) Devscan ( For NPIV configurations )
From the HMC get the Live Partition Mobility WWPNS of client lpar.
# lssyscfg -r sys -F name ( To get {managed_system} )
# lssyscfg -r prof -m {source_managed_system} -F name virtual_fc_adapters
From VIO1 target that current has inactive WWPN
$ oem_setup_env
# script /tmp/devscan_vio1_active_wwpn_src.log
# devscan -t f -n [wwpn_inactive_lowercase] ( -t and f may be optional )
# devscan -t f -n [wwpn_inactive_lowercase] --dev=fcxx ( -t and f may be optional )
# exit
From VIO2 target that current has inactive WWPN
$ oem_setup_env
# script /tmp/devscan_vio2_inactive_wwpn_src.log
# devscan -t f -n [wwpn_inactive_lowercase] ( -t and f may be optional )
# devscan -t f -n [wwpn_inactive_lowercase] --dev=fcxx ( -t and f may be optional )
# exit
From AIX and Linux Client LPAR:
Run devscan on the moving lpar, and compare the output with the one you get on the Virtual I/O Server :
# devscan --dev=fscsi0 --concise | awk -F '|' '{print $2}' | sort -n | uniq
Complete Devscan Client to Destination VIOs Checks - Step by step ( AIX & Linux ):
9) Additional data from HMC for NPIV config for AIX and Linux Clients. If using dual HMC data from both HMC
To get {managed_system}:
# lssyscfg -r sys -F name
To get lsnportlogin from HMC:
# lsnportlogin -m {managed_system} --filter lpar_names={name_of_client_lpar}
To get complete listing from HMC
# lssyscfg -r sys F name ( to get managed_system )
# lshmc -v
# lshmc -V
# lshwres -r virtualio -rsubtype scsi -m {managed_system} -level lpar
# lshwres -r virtualio -m --rsubtype fc --level sys
# lshwres -r virtualio -m --rsubtype fc --level lpar
# lshwres -r virtualio --rsubtype fc --level lpar -m [system_name] -F lpar_name,lpar_id,slot_num,adapter_type,state,is_required,remote_lpar_id,remote_lpar_name,remote_slot_num,wwpns,topology > CASE_NUM.fc.before.topology.out
10a) Performance related: IPERF, VMSTAT and LPARSTAT:Enable stats on HMC ( Menu selection to get to this may vary depending on HMC levels )
- HMC enabling_data_collection
- From HMC activate Performance Information Collection - Right click on the specific LPAR - Properties - Hardware - Processors - Allow performance information collection
- 10b) VMSTAT and LPARSTAT ( Collect during problem ) vmstat -It 5 200 | tee vmstat_vio1_src.log & vmstat -It 5 200 | tee vmstat_vio2_src.log & vmstat -It 5 200 | tee vmstat_vio1_target.log & vmstat -It 5 200 | tee vmstat_vio2_target.log & lparstat -ht 5 200 > lparstat_vio1_src.log & lparstat -ht 5 200 > lparstat_vio2_src.log & lparstat -ht 5 200 > lparstat_vio1_target.log & lparstat -ht 5 200 > lparstat_vio2_target.log & 10c) IPERF ( Basic info shown below, full details refer to ( iperf_Instructions.txt attached )
- I) Start the iperf test at the other end as a server ( Target MSP ) ./iperf -s -P 4 --> Starts the server
- II) At the problem vios start the iperf as a client ( Source MSP ) ./iperf -c
-P 4 -t 10 -w 240k III) Start the iptrace at both the ends as well as at the switch. On the aix VIOS # startsrc -s iptrace -a " -a /big_directory/iptrc.bin " Wait for 20 seconds. REPEAT ABOVE IN BOTH DIRECTION BY CHANGING THE SERVER AND CLIENT AND CAPTURE ALL STDOUT INCLUDING THE SUM AT THE END. THIS MAY NEED TO BE RUN AT VARIOUS TIMES INCASE OF IRREGULAR / RANDOM NETWORK BEHAVIOR. Repeat above several time to make sure network through put is good.
13) FTP the file to IBM:
ftp testcase.software.ibm.com,
login: anonymous,
passwd: your email address,
ftp> cd /toibm/aix
ftp> bin
ftp> put (case number.pax.gz)
ftp> quit
For Blue Diamond: Registration Link: Blue Diamond Registration
====================== End ================
Related Information
Document Location
Worldwide
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Component":"PowerVM","Platform":[{"code":"PF002","label":"AIX"}],"Version":"2.2.6.0 and 3.1 and higher","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]
Was this topic helpful?
Document Information
Modified date:
19 August 2022
UID
ibm10887093