IBM Support

Complete Guide To Must Gather LPM Data Collection on PowerVC, VIO, AIX, Linux and IBM i

Troubleshooting


Problem

Solving LPM issues with problem is all stack (PowerVC,HMC,VIOs,AIX,Linux and IBM i)

Symptom

Varies of customers configuration and levels.

Cause

Varies based on customer configuration and levels.

Environment

PowerVC,HMC,VIOs,AIX,Linux and IBM i

Resolving The Problem

Need to review various logs based on problem description.
First Created: May  04, 2019 - rajpat@us.ibm.com
Last updated - May  17, 2019 - rajpat@us.ibm.com
Last updated - June 06, 2019 - rajpat@us.ibm.com
Last updated - August 19, 2022 - rajpat@us.ibm.com
=================================================

*** NOTE: Contains dump, perf, devscan ( AIX,Linux,IBM i Client )
*** NOTE: Only highlight the points or items from this sections for customer to capture from this complete list.
*** If using dual Hardware Management Console, provide data from both Hardware Management Console

High level problem description and data collection required.
1)  Problem description.
2)  VIO 
3)  Client ( AIX,Linux,IBM i )
4)  HMC pedbg ( If using dual HMC data from both HMC )
5)  RSCT
6)  System Firmware Resource and Platform Dump
7)  Fabric switch ( For NPIV configurations )
8)  Devscan ( For NPIV configurations )
9)  Additional data from HMC for NPIV config.
10) Checking network performance using open source IPERF (executable can also be down loaded from this link)
11) In case of hang or SRC 2005 
12) Requirements for PowerVC 
13) FTP Data. for Blue Diamond Follow BD steps.

===============  Start ===================
1) Problem description

   a) Make sure all date and time on hmc, vio, etc are all in sync.
   b) Is this using single or dual HMC ?
   c) Are you using GUI or CLI ?
   d) Are these single or concurrent LPMs?
   e) Are you using LPM tookkit ? 
   f) Is this a test or production ?
   g) Is this a new configuration ?
   h) When was this last working ?
   i) What changes if any were made to system firmware, vio level, hmc level, client lpar, network, switches etc ?
   j) ** Provide complete date and time of error and any screen shots. **
   k) How long has the LPM been running if hung or slow and SRC code on HMC on Source and Destination frames ?
   l) Is this using PowerVC. If so provide PowerVC logs ( see section item 12 for PowerVC )?

2) VIO snaps from both VIOs on source and both VIOS on target. Rename to indicate source and target.
      $ snap ( run from padmin. Creates file /home/padmin/snap.pax.Z)

3a) AIX client snap
       # snap -r ( clear old snap )
       # snap -ac  ( data in /tmp/ibmsupt/snap.pax.Z)

3b) Linux snap ( sosreport requires root permissions to run ) 
       # sosreports ( data in /var/tmp )

3c) IBM i Client: IBM i must gather (QMGTOOLS), which the customer should update, and then run the IBM i SYSSNAP
4) HMC pedbg:
   In short CLI command:
        # pedbg -c  -q 4;
          - Say, YES when prompted to collect ctsnap. 
                (if there are RSCT problems. ctsnap can hang)
          - File created is
                "HsClogsXXzz2007zzzzzz.zip", created in /dump directory.
          - Copy file to the common directory for FTP to IBM.
                # scp  /dump/HSClogsXXzz2007zzzzzz.zip
                       @:.
                 (NOTE: the   ":."   means keep the same name on remote,
                        do not forget the  :.   At the end of the line  )

          - Rename the file to include the PMR number
                # mv HSClogsXXzz2007zzzzzz.zip  case.pedbg.zip
5) RSCT

   - From VIO Servers within oem_setup_env ( Included in snap with VIO 2.2.2.1 but may still need below for RMC / RSCT )
        $ oem_setup_env - 
        # ctsnap -x runrpttr - This will create => /tmp/ctsupt/ctsnap*.tar.gz

   - From AIX lpar:
        # ctsnap -x runrpttr - This will create => /tmp/ctsupt/ctsnap*.tar.gz

   - From Linux lpar:
        # ctsnap -x runrpttr - This will create => /tmp/ctsupt/ctsnap*.tar.gz

   - From HMC ( Requires pesh password ) - pedbg option 7
        # ctsnap -x runrpttr
 
6) System Firmware non disruptive resource and platform  dump FOR BOTH Source and target managed system:
  
   If using dual HMC, provide from both HMC
   a) Access HMC restricted shell To get {managed_system}:
      # lssyscfg -r sys -F name
   b) startdump -t resource -m {source_managed_system} -r "system"
   c) startdump -t resource -m {target_managed_system} -r "system"
   d) This creates a file in /dump/SYSDUMP.SRLNMBR.DUMPNMBR.TIMESTAMP....
   e) Please re-name the file to indicate if its for source_managed_system or target_managed_system
 
 
7) Fabric switch ( For NPIV configurations )
     Switch Logs for Cisco & Brocade:
      - Switch logs ("show tech-support details" for Cisco)
      - 'show tech detail' if more then 1 switch.
      - 'supportshow' collected via CLI using either
         HyperTerm or Putty to collect the output for Brocade.
      - 'supportsave'  as that has additional debug information.
      - How the RSCN ( Registered State Change Notification )
        events are sent when a zoning change is done on the switch?

   Switch Logs for McData:
      - "data collection" from the switch management console, EFCM.
      - any other related to ones described above under Cisco / Brocade.

   Switch Logs using iSCSI TOE:
      - igroup show
      - lun show -m

    ** For FCOE Types also include: **
      - "show tech-support fc"

8) Devscan ( For NPIV configurations )
   From the HMC get the Live Partition Mobility WWPNS of client lpar.
   # lssyscfg -r sys -F name ( To get {managed_system} )
   # lssyscfg -r prof -m {source_managed_system} -F name virtual_fc_adapters 

   From VIO1 target that current has inactive WWPN
   $ oem_setup_env
   # script /tmp/devscan_vio1_active_wwpn_src.log
   # devscan -t f -n [wwpn_inactive_lowercase]            ( -t and f may be optional )
   # devscan -t f -n [wwpn_inactive_lowercase] --dev=fcxx ( -t and f may be optional )
   # exit

   From VIO2 target that current has inactive WWPN
   $ oem_setup_env
   # script /tmp/devscan_vio2_inactive_wwpn_src.log
   # devscan -t f -n [wwpn_inactive_lowercase]             ( -t and f may be optional )
   # devscan -t f -n [wwpn_inactive_lowercase]  --dev=fcxx ( -t and f may be optional )
   # exit

   From AIX and Linux  Client LPAR:
   Run devscan on the moving lpar, and compare the output with the one you get on the Virtual I/O Server :
   # devscan --dev=fscsi0 --concise | awk -F '|' '{print $2}' | sort -n | uniq
  Complete Devscan Client to Destination VIOs Checks - Step by step ( AIX & Linux ):
9) Additional data from HMC for NPIV config for AIX and Linux Clients. If using dual HMC data from both HMC
    To get {managed_system}:
    # lssyscfg -r sys -F name

    To get lsnportlogin from HMC:
    # lsnportlogin -m {managed_system} --filter lpar_names={name_of_client_lpar}

    To get complete listing from HMC 
    # lssyscfg -r sys F name ( to get managed_system  )
    # lshmc -v
    # lshmc -V
    # lshwres -r virtualio -rsubtype scsi -m {managed_system} -level lpar
    # lshwres -r virtualio -m  --rsubtype fc --level sys
    # lshwres -r virtualio -m  --rsubtype fc --level lpar 
 # lshwres -r virtualio --rsubtype fc --level lpar -m [system_name] -F lpar_name,lpar_id,slot_num,adapter_type,state,is_required,remote_lpar_id,remote_lpar_name,remote_slot_num,wwpns,topology > CASE_NUM.fc.before.topology.out
10a) Performance related: IPERF, VMSTAT and LPARSTAT:
Enable stats on HMC ( Menu selection to get to this may vary depending on HMC levels )
  • HMC enabling_data_collection

  • From HMC activate Performance Information Collection - Right click on the specific LPAR - Properties - Hardware - Processors - Allow performance information collection
  • 10b) VMSTAT and LPARSTAT ( Collect during problem ) vmstat -It 5 200 | tee vmstat_vio1_src.log & vmstat -It 5 200 | tee vmstat_vio2_src.log & vmstat -It 5 200 | tee vmstat_vio1_target.log & vmstat -It 5 200 | tee vmstat_vio2_target.log & lparstat -ht 5 200 > lparstat_vio1_src.log & lparstat -ht 5 200 > lparstat_vio2_src.log & lparstat -ht 5 200 > lparstat_vio1_target.log & lparstat -ht 5 200 > lparstat_vio2_target.log & 10c) IPERF ( Basic info shown below, full details refer to ( iperf_Instructions.txt attached )
  • I) Start the iperf test at the other end as a server ( Target MSP ) ./iperf -s -P 4 --> Starts the server
  • II) At the problem vios start the iperf as a client ( Source MSP ) ./iperf -c -P 4 -t 10 -w 240k III) Start the iptrace at both the ends as well as at the switch. On the aix VIOS # startsrc -s iptrace -a " -a /big_directory/iptrc.bin " Wait for 20 seconds. REPEAT ABOVE IN BOTH DIRECTION BY CHANGING THE SERVER AND CLIENT AND CAPTURE ALL STDOUT INCLUDING THE SUM AT THE END. THIS MAY NEED TO BE RUN AT VARIOUS TIMES INCASE OF IRREGULAR / RANDOM NETWORK BEHAVIOR. Repeat above several time to make sure network through put is good.
11) For SRC 2005 Client hangs and dumps refer to below links: 12) LPM with PowerVC: PowerVC techdoc
13) FTP the file to IBM:
      ftp  testcase.software.ibm.com,
      login:   anonymous,
      passwd:  your email address,
      ftp>  cd /toibm/aix
      ftp>  bin
      ftp>  put  (case number.pax.gz)
      ftp>  quit

      For Blue Diamond: Registration Link:   Blue Diamond Registration 
======================   End ================

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Component":"PowerVM","Platform":[{"code":"PF002","label":"AIX"}],"Version":"2.2.6.0 and 3.1 and higher","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
19 August 2022

UID

ibm10887093