Use this procedure when servicing a Linux® partition or a server that has Linux as its only operating system.
(D005)
These procedures define the steps to take when servicing a Linux partition or a server that has Linux as its only operating system.
Before continuing
with this procedure it is recommended that you review the additional software
available to enhance your Linux solutions. This software is available at: Linux on POWER® Web site at http://techsupport.services.ibm.com/server/lopdiags
.
| Number of digits in reference code | Reference code | Name or code type |
|---|---|---|
| Any | Contains # (pound sign) | Menu goal |
| Any | Contains - (hyphen) | Service request number (SRN) |
| 5 | Does not contain # or - | SRN |
| 8 | Does not contain # or - | Service reference code (SRC) |
Look at the service action event log in SFP for errors. Focus on those errors with a timestamp near the time at which the error occurred. Follow the steps indicated in the error log entry to resolve the problem. If the problem is not resolved, continue with step 3.
Is Linux usable in any partition with Linux installed?
ls -l /var/log/platform
Does the /var/log/platform file exist?
cat /var/log/messages |grep RTAS |more
Linux run-time RTAS error messages are logged in the messages file under /var/log. The following is an example of the Linux system RTAS error log messages.
Aug 27 18:13:41 rasler kernel: RTAS: -------- event-scan begin -------- Aug 27 18:13:41 rasler kernel: RTAS: Location Code: U0.1-P1-C1 Aug 27 18:13:41 rasler kernel: RTAS: WARNING: (FULLY RECOVERED) type: INTERN_DEV_FAIL Aug 27 18:13:41 rasler kernel: RTAS: initiator: UNKNOWN target: UNKNOWN Aug 27 18:13:41 rasler kernel: RTAS: Status: predictive new Aug 27 18:13:41 rasler kernel: RTAS: Date/Time: 20020827 18134000 Aug 27 18:13:41 rasler kernel: RTAS: CPU Failure Aug 27 18:13:41 rasler kernel: RTAS: CPU id: 0 Aug 27 18:13:41 rasler kernel: RTAS: Failing element: 0x0000 Aug 27 18:13:41 rasler kernel: RTAS: A reboot of the system may correct the problem Aug 27 18:13:41 rasler kernel: RTAS: -------- event-scan end ---------- |
cat /var/log/platform |grep diagela |moreLinux run-time diagela error messages are logged in the platform file under /var/log.
The following is an example of the Linux system error log diagela messages.
Aug 13 09:38:45 larry diagela: 08/13/2003 09:38:44 Aug 13 09:38:45 larry diagela: Automatic Error Log Analysis has detected a problem. Aug 13 09:38:45 larry diagela: Aug 13 09:38:45 larry diagela: The Service Request Number(s)/Probable Cause(s) Aug 13 09:38:45 larry diagela: (causes are listed in descending order of probability): Aug 13 09:38:45 larry diagela: Aug 13 09:38:45 larry diagela: 651-880: The CEC or SPCN reported an error. Report the SRN and the following reference and physical location codes to your service provider. Aug 13 09:38:45 larry diagela: Location: n/a FRU: n/a Ref-Code: B1004699 Aug 13 09:38:45 larry diagela: Aug 13 09:38:45 larry diagela: Analysis of Error log sequence number: 3 Aug 29 07:13:04 larry diagela: 08/29/2003 07:13:04 Aug 29 07:13:04 larry diagela: Automatic Error Log Analysis has detected a problem. Aug 29 07:13:04 larry diagela: Aug 29 07:13:04 larry diagela: The Service Request Number(s)/Probable Cause(s) Aug 29 07:13:04 larry diagela: (causes are listed in descending order of probability): Aug 29 07:13:04 larry diagela: Aug 29 07:13:04 larry diagela: 651-880: The CEC or SPCN reported an error. Report the SRN and the following reference and physical location codes to your service provider. Aug 29 07:13:04 larry diagela: Location: U0.1-F4 FRU: 09P5866 Ref-Code: 10117661 Aug 29 07:13:04 larry diagela: Aug 29 07:13:04 larry diagela: Analysis of /var/log/platform sequence number: 24 Sep 4 06:00:55 larry diagela: 09/04/2003 06:00:55 Sep 4 06:00:55 larry diagela: Automatic Error Log Analysis reports the following: Sep 4 06:00:55 larry diagela: Sep 4 06:00:55 larry diagela: 651204 ANALYZING SYSTEM ERROR LOG Sep 4 06:00:55 larry diagela: A loss of redundancy on input power was detected. Sep 4 06:00:55 larry diagela: Sep 4 06:00:55 larry diagela: Check for the following: Sep 4 06:00:55 larry diagela: 1. Loose or disconnected power source connections. Sep 4 06:00:55 larry diagela: 2. Loss of the power source. Sep 4 06:00:55 larry diagela: 3. For multiple enclosure systems, loose or Sep 4 06:00:55 larry diagela: disconnected power and/or signal connections Sep 4 06:00:55 larry diagela: between enclosures. Sep 4 06:00:55 larry diagela: Sep 4 06:00:55 larry diagela: Supporting data: Sep 4 06:00:55 larry diagela: Ref. Code: 10111520 Sep 4 06:00:55 larry diagela: Location Codes: P1 P2 Sep 4 06:00:55 larry diagela: Sep 4 06:00:55 larry diagela: Analysis of /var/log/platform sequence number: 13 |
cat /var/log/platform |grep RTAS |more
Linux RTAS error messages are logged in the platform file under /var/log. The following is an example of RTAS messages in the Linux system error log.
Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event begin -------- Aug 27 12:16:33 larry kernel: RTAS 0: 04440040 000003f8 96008508 19155800 Aug 27 12:16:33 larry kernel: RTAS 1: 20030827 00000001 20000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 2: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 3: 49424d00 55302e31 2d463400 00503034 Aug 27 12:16:33 larry kernel: RTAS 4: 10117661 04a0005d 10110000 00000000 Aug 27 12:16:33 larry kernel: RTAS 5: 00007701 000000e0 00000003 000000e3 Aug 27 12:16:33 larry kernel: RTAS 6: 00000000 01000000 00000000 31303131 Aug 27 12:16:33 larry kernel: RTAS 7: 37363631 20202020 20202020 55302e31 Aug 27 12:16:33 larry kernel: RTAS 8: 2d463420 20202020 20202020 03705a39 Aug 27 12:16:33 larry kernel: RTAS 9: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 10: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 11: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 12: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 13: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 14: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 15: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 16: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 17: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 18: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 19: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 20: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 21: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 22: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 23: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 24: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 25: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 26: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 27: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 28: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 29: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 30: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 31: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 32: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 33: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 34: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 35: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 36: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 37: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 38: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 39: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 40: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 41: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 42: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 43: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 44: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 45: 00000000 00000000 00000000 00000000 |
Aug 27 12:16:33 larry kernel: RTAS 46: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 47: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 48: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 49: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 50: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 51: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 52: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 53: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 54: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 55: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 56: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 57: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 58: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 59: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 60: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 61: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 62: 00000000 00000000 00000000 00000000 Aug 27 12:16:33 larry kernel: RTAS 63: 00000000 00000000 00000000 00020000 Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event end ---------- |
Reference codes and location codes may appear as RTAS messages. The extended data is also provided in the form of an RTAS message. The extended data contains other reference code words that help in isolating the correct FRUs. The start of the extended data is marked, for example, by the line:
Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event begin --------Aug 27 12:16:33 larry kernel: RTAS: 15 -------- RTAS event end ----------
with the same sequence number. Word 13 and word 19 are found in the RTAS messages. For example, to find word 13, first find the reference code in the left column of words of the extended data, 10117661. In this example, we find the reference code to the right of "RTAS 4:". This is also word 11. To get word 13, 10110000, simply count the words left-to-right, beginning at word 11.
If the system is configured with more than one logical partition with Linux installed, repeat step 5 and step 6 for all logical partitions that have Linux installed.
Examine the Linux boot (IPL) log by logging in to the system as the root user and entering the following command:
cat /var/log/boot.msg |grep RTAS |more
Linux boot (IPL) error messages are logged into the boot.msg file under /var/log. The following is an example of the Linux boot error log.
RTAS daemon started RTAS: -------- event-scan begin -------- RTAS: Location Code: U0.1-F3 RTAS: WARNING: (FULLY RECOVERED) type: SENSOR RTAS: initiator: UNKNOWN target: UNKNOWN RTAS: Status: bypassed new RTAS: Date/Time: 20020830 14404000 RTAS: Environment and Power Warning RTAS: EPOW Sensor Value: 0x00000001 RTAS: EPOW caused by fan failure RTAS: -------- event-scan end ---------- |
Examine the extended data in both logs.
The following is an example of the Linux extended data.
<3>RTAS daemon started <3>RTAS: -------- event-scan begin -------- <3>RTAS: Location Code: U0.1-P1-C2 <4>RTAS: Log Debug: 04 4b2726fb04a00011702c0014000000000000000000000000f1800001001801d3ffffffff010000000000000042343138 20202020383030343236464238454134303030303 030303030303030 <4>RTAS: Log Debug: D2 5046413405020d0a000001000271400100000033434d502044415441000001000000000000010000f180000153595 320444154410000000000000000200216271501050920021627150105092002063715010509535243204441544170 2c001400000000000000020018820201d382000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000280048ea400000000000000000000 000000000000000000004350542044415441702cff08000000001c000000702cf0080000000080000000702cf10070 2cf200702c000400000800702c01040bf2002e702c02040c1fffbf702c0300702c1000702c11040bf2002e702c12040 c1fffbf702c1300702ca000702ca108000000000000a03c702ca208000000000000effc702cb000702cb10800000000 0000a03c702cb208000000000000effc702cc000702cc108000000000000a03c702cc208000000000000effc702c3 000702c31080000000000000003702c3208000000000000007b702c8000702c81080000000020e27a39702c820800 000000fffeffff702cd000702cd1080000000010004010702cd208000000007777f3ffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffffff <3>RTAS: WARNING: (FULLY RECOVERED) type: INTERN_DEV_FAIL <3>RTAS: initiator: UNKNOWN target: UNKNOWN <3>RTAS: Status: unrecoverable new <3>RTAS: Date/Time: 20020905 15372200 <3>RTAS: CPU Failure <3>RTAS: Internal error (not cache) <3>RTAS: CPU id: 0 <3>RTAS: Failing element: 0x0000 <3>RTAS: -------- event-scan end --------- |
If the system is configured with more than one logical partition with Linux installed, repeat step 9 and step 10 for all logical partitions that have Linux installed.
You need a personal computer (and cable, part number 62H4857) capable of connecting to system port 1 on the system unit. (The Linux login prompt cannot be seen on a personal computer connected to system port 1.) If the ASMI functions are is not otherwise available, use the following procedure:
You may also compare this list of resources that were found to a prior version of the device tree as follows:
cd /var/lib/lsvpd/
lscfg -vpd db-2003-03-31-12:26:31.
This displays the device tree created on 03/31/2003 at 12:26:31.
The diff command offers a way to compare the output from a current lscfg command to the output from an older lscfg command. If the files names for the current and old device trees are current.out and old.out, respectively, type: diff old.out current.out. Any lines that exist in the old, but not in the current will be listed and preceded by a less-than symbol (<). Any lines that exist in the current, but not in the old will be listed and preceded by a greater-than symbol (>). Lines that are the same in both files are not listed; for example, files that are identical will produce no output from the diff command. If the location or description changes, lines preceded by both < and > will be output.
If the system is configured with more than one logical partition with Linux installed, repeat 15 and 16 for all logical partitions that have Linux installed.
If you did not previously answer Yes to step 17, go to step 18.