IBM Support

PureData System for Operational Analytics known issues

Release Notes


Abstract

This document contains many important known issues for the product PureData System for Operational Analytics. This document used the same or similar content that exists in the fixpack specific known issues documents and is intended to be used for all known issues from PDOA versions V1.0.0.6/V1.1.0.2 and up in the future.

Content

Term and conventions that are used in the table.
  • General: This issue occurs when the system is running on the impacted versions.  The impacted version refers to the version that is running on the system
  • Fixpack: This issue occurs when applying a fixpack. The impacted version refers to the version that is being applied during the fixpack application.
  • I_V#.#.#.#: In the search bar use I_V and the version of PDOA to see the Known Issues related to a specific version. The search bar applies to all columns in the table.
  • Use a combination of the column sort and search lookup to find known issues related to the symptoms experienced.
PureData System for Operational Analytics Known Issues
Reference Number Type Impacted Versons Symptom Resolution
KI002423 Unable to filter system console events using the time filter General I_V1.0.0.0
Unable to filter system console events using the time filter
In the system console Events pane, when you select a time interval value of Last 24 Hours or Last Hour there are no events displayed.
Workaround:
-----------
1. Select the time interval value of All to display all events.
2. If the events are not sorted in descending order, click the Updated on field in the table to sort the events in descending order. The most recent events are displayed first in the table.
Fixed:
-----------
Addressed in V1.0.0.2-V1.0.0.5 fixpacks and All  V1.1  Systems
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI003332
Mozilla Firefox 20.0 is an unsupported browser
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.0
I_V1.1.0.1
Mozilla Firefox 20.0 is an unsupported browser
The system console GUI fails to load the login page or provide any corrective instructions when viewed through the latest version of the Mozilla Firefox browser (20.0) released on April 2, 2013.
Workaround:
-----------
Install and enable the Firefox ESR release from the following page: http://www.mozilla.org/en-US/firefox/organizations/all.html
Alternatively, use another supported browser such as Internet Explorer V8 or V9.  
Fixed:
-----------
F_V1.0.0.6/F_V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI002637
A resume option has changed for the miupdate command
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
A resume option has changed for the miupdate command
 
The -e alternative to the --resume option for the miupdate command has been changed to -resume.
Workaround:
-----------
Do not use the -e option to resume the update process. Use the -resume option instead. For example, to resume the update process after the system console suspends the update to reboot one or more nodes in the system, the following miupdate command options can be used:

miupdate --resume | -resume [management | prepare | apply | commit]
Fixed:
-----------
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance where miupdate is no longer used for fixpack application.
KI002702
During fix pack registration the console might hang while trying to SSH to localhost (127.0.0.1)
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
During fix pack registration the console might hang while trying to SSH to localhost (127.0.0.1)
During fix pack registration, there is at least one command to SSH to localhost (127.0.0.1) to perform some tasks. However, SSH does not recognize the localhost host key under the IP address 127.0.0.1, and waits for you to enter Yes to accept the host key.
Workaround:
-----------
Before you install the fix pack, enter the following command as root on the management host to accept the host key:
ssh 127.0.0.1 ls
When prompted to accept the new host key, enter Yes.
Fixed:
-----------
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance which no longer uses this fixpack registration mechanism.
KI002696
The system console GUI becomes inaccessible during the management phase or after the management phase of a fix pack installation completes
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
The system console GUI becomes inaccessible during the management phase or after the management phase of a fix pack installation completes
During the management phase of a fix pack installation, if you attempt to log in to the system console GUI or to use it for tasks other than installing the fix pack, the system console GUI becomes inaccessible at the URL https://management_host_name, where management_host_name represents the host name or IP address of the management host.

The system console GUI might also become inaccessible after the management phase completes.
Workaround:
-----------
Restart the system console GUI:
1. Log in to the management host as the root user.
2. Run the following command:
miresolve -restart
Fixed:
-----------
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI003581
During the apply phase of a fix pack installation the progress bar incorrectly indicates the stage is 100% complete
Fixpack I_V1.0.0.1
During the apply phase of a fix pack installation the progress bar incorrectly indicates the stage is 100% complete
 
During the apply phase of a fix pack installation, the progress bar in the system console GUI might indicate that the stage is 100% complete but the Apply Fix Pack window indicates that stage 4 (Apply to non-management hosts) is in a Running state.
Workaround:
-----------
1. Ignore the percentage complete value displayed in the progress bar.
2. Verify that the apply stage has completed successfully. The Apply Fix Pack window displays a Completed state for stage 4 (Apply to non-management hosts) when the apply stage has completed successfully.
Fixed:
-----------
V1.0.0.2 Fixed in V.0 FP2
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI007645
After the commit phase of a fix pack installation the progress bar incorrectly indicates the stage is less than 100% complete
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
After the commit phase of a fix pack installation the progress bar incorrectly indicates the stage is less than 100% complete
After the commit phase of a fix pack installation completes successfully, the progress bar in the system console GUI might indicate that the stage is less than 100% complete but the Apply Fix Pack window indicates that stage 5 (Commit) is in a Completed state.
Workaround:
-----------
1. Ignore the percentage complete value displayed in the progress bar.
2. Verify that the commit stage has completed successfully. The Apply Fix Pack window displays a Completed state for stage 5 (Commit) to indicate the commit stage has completed successfully.
Fixed:
-----------
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI004047
Paging space is set incorrectly on the management node
General I_V1.0
Paging space is set incorrectly on the management node
Paging space on the management node is set to the incorrect value of 48 GB. The paging space needs to be set to a value of 128 GB.
Workaround:
-----------
Use the mgmt_ps script to set the value of the paging space to 128 GB on the management host.

Note: You do not need to change the value of the paging space on the standby management host. The paging space is set to the correct value of 128 GB on the standby management host

1. Obtain the mgmt_ps.zip file from IBM Support.

2. Log in to the management host as the root user.

3. Copy the mgmt_ps.zip file to a temporary directory on the management host.

4. Navigate to the temporary directory and extract the mgmt_ps script.

5. Grant execute permission on the script:
  • chmod +x mgmt_ps

6. Issue the following command to run the script and increase the paging space to 128 GB:
  • ./mgmt_ps
Fixed:
-----------
Affected customers may only apply the workaround. There will be no automated fix as part of any appliance fixpack.
KI003962
The fix pack installation hangs when only a directory is specified as the location of the fix pack file
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
The fix pack installation hangs when only a directory is specified as the location of the fix pack file
 
When specifying the location of the fix pack file on the management host, if you specify only a directory and do not include the fix pack file name, the fix pack installation hangs. No warning message or error message is displayed.
Workaround:
-----------
Refresh the system console and specify the full path and the file name of the fix pack file in the Add Fix Pack window.
Fixed:
-----------
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI004076
The miinfo -d -c command incorrectly indicates that the versions of some components are higher than expecte
General I_V1.0.0.2
The miinfo -d -c command incorrectly indicates that the versions of some components are higher than expected
 
When you run the miinfo -d -c command, the command incorrectly indicates that the versions of the following software components are higher than expected:
  • WebSphere Application Server
  • Optim Performance Manager
  • InfoSphere Warehouse
  • Optim Query Workload Tuner

Workaround:
-----------
Ignore the warning about the higher than expected versions of these software components. The version returned by the command is correct and is the expected version.
Fixed:
-----------
V1.0.0.3
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • miinfo was part of the appliance console which has been removed. appl_ls_sw can be used instead for this purpose.
  • WebSphere Application Server was part of Warehouse Tools which is removed as part of these fixpacks.
  • InfoSphere Warehouse components are part of Warehouse Tools which is removed as part of these fixpacks.
  • Updates to the catalog for later fixpacks should correctly show the levels for Optim Performance Manager and OQWT.
KI004088
The system console cannot access Systems Director after the miauth command is used to change the password for the restuser user
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.1.0.0
The system console cannot access Systems Director after the miauth command is used to change the password for the restuser user
If you do not stop the system console before you use the miauth command to change the password for the restuser user, as documented in the password change procedure Changing the passwords for system console component users, the system console cannot access Systems Director.
Workaround:
-----------
1. Log in to the management host as root.

2. Stop the system console:
  • mistop

3. Unlock Systems Director:
  • smcli chuser -u restuser -m unlock

4. Start the system console:
  • mistart

IMPORTANT: For future password changes for the restuser user, use the password change procedure that is documented in Changing the passwords for system console component users.
Fixed:
-----------
V1.0.0.5/V1.1.0.1 has fundamental changes that impact this issue.
  • IBM Systems Director is disabled and no longer used as part of the Appliance Console. The smcli command is no longer used and the restuser user is no longer managed.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • The appliance console is removed. mistop and mistart are no longer valid commands.
KI004070
The system console becomes unresponsive after the fix pack file is uploaded
Fixpack
I_V1.0.0.2
The system console becomes unresponsive after the fix pack file is uploaded
 
After you upload the fix pack file to the system, after approximately one hour the Add Fix Pack window in the system console remains unresponsive and greyed out.
.Workaround:
-----------
Verify that the system console is unresponsive by attempting to open the Welcome page in the system console.
  • If the Welcome page loads, refresh the Add Fix Pack file window and continue with the fix pack installation.
  • If the Welcome page does not load, continue to step 2.

2. Determine if there are any system console modules that are stopped. Issue the following command as root on the management host:
  • mistatus

3. If there are modules that are stopped, issue the following command to restart the modules:
  • mistart

Wait approximately 10 minutes for the system console modules to restart, and then continue with the fix pack installation.
Fixed:
-----------
This issue is fixed in V1.0.0.3 and higher and V1.1.0.1.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • The appliance console is removed.
KI004082
The miupdate -resume current command hangs after it fails or after it finishes successfully
Fixpack
I_V1.0.0.2
The miupdate -resume current command hangs after it fails or after it finishes successfully
The miupdate -resume current command hangs when resuming the fix pack installation after the M_Failed_Apply_Nonimpact substage.
.Workaround:
-----------
1. Log in as root on the management host and issue the following command:
  • ps -eaf | grep resume

In the output returned by the command, identify the process IDs for the miupdate process and the MIUpdateCLI process. In the following sample output, the miupdate process ID is 11599914 and the MIUpdateCLI process ID is 12976198.

root 11599914 1 0 12:25:11 - 0:00 /bin/sh /opt/IBM/mi/bin/miupdate -resume current -n
root 12976198 11599914 0 12:25:11 - 0:02 /usr/java6_64/bin/java -classpath .:/opt/IBM/mi/lib/*:/opt/IBM/mi/lib/log4j/* -Dlog4j.configuration=file:/opt/IBM/mi/configuration/log4j.properties -Djavax.net.ssl.trustStore=/opt/ibm/director/lwi/security/keystore/ibmjsse2.jks -DUSERDIR=/usr/IBM/applmgmt/isas.server com.ibm.isas.cli.command.application.MIUpdateCLI -resume current -n
root 21037306 35848408 0 12:39:58 pts/8 0:00 grep resume                
2. For each process ID, issue the following command to terminate the hanging process:
  • kill -9 <pid>
where <pid> represents the process ID of the miupdate process or the process ID of the MIUpdateCLI process.

3. Click the Resume button in the system console.
Fixed:
-----------
This issue is fixed in V1.0.0.3 and higher and V1.1.0.1.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • The appliance console is removed and fixpacks are no longer manged by the console or mi* commands.
KI004991
Error resuming the fix pack installation after the M_Applied_Nonimpact substage
Fixpack I_V1.0.0.2
Error resuming the fix pack installation after the M_Applied_Nonimpact substage
The fix pack installation fails in the Apply to management hosts stage. After you address the issue and click Resume current phase in the system console, the system console does not respond and the fix pack installation does not continue.
.Workaround:
-----------
1. Log in to the management host as the root user.

2. Verify that the fix pack installation is in the M_Applied_Nonimpact substage:
  • appl_ls_cat

3. Resume the fix pack installation:
  • miupdate -u management
Fixed:
-----------
This issue is fixed in V1.0.0.3 and higher and V1.1.0.1.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • The appliance console is removed and fixpacks are no longer manged by the console or mi* commands.
KI004991
Error resuming the fix pack installation after the M_Failed_Apply_Nonimpact substage
Fixpack I_V1.0.0.2
Error resuming the fix pack installation after the M_Failed_Apply_Nonimpact substage
The fix pack installation fails in the Apply to management hosts stage. After you address the issue and click Resume current phase in the system console, the system console does not respond and the fix pack installation does not continue .
.Workaround:
-----------
1. Log in to the management host as the root user.

2. Verify that the fix pack installation is in the M_Failed_Apply_Nonimpact substage:
appl_ls_cat

3. Resume the fix pack installation:
miupdate -u management
Fixed:
-----------
This issue is fixed in V1.0.0.3 and higher and V1.1.0.1.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • The appliance console is removed and fixpacks are no longer manged by the console or mi* commands.
KI004080
The system console displays incorrect status in the Apply Fix Pack window after the fix pack installation fails in the Apply to non-management hosts stage
Fixpack I_V1.0.0.2
The system console displays incorrect status in the Apply Fix Pack window after the fix pack installation fails in the Apply to non-management hosts stage
If the Apply to non-management hosts stage of the fix pack installation fails, the fix pack installation stops but the system console incorrectly displays the status of the stage as Waiting in the Apply Fix Pack window. The status of the stage is Failed to apply without impact to non-management hosts and is displayed correctly in the Fix Pack Detail panel. The fix pack installation cannot be resumed by clicking the Start current stage button.
.Workaround:
-----------
1. Log in to the management host as root.

2. Review the fix pack installation log /BCU_share/aixapply/pflayer/pl_update.log and correct the problem identified in the log.

3. Click the Resume button in the system console to resume the fix pack installation.
Fixed:
-----------
This issue is fixed in V1.0.0.3 and higher and V1.1.0.1.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • The appliance console is removed and fixpacks are no longer manged by the console or mi* commands.
KI004080
Error resuming the fix pack installation from the status Failed to apply without impact to non-management hosts
Fixpack I_V1.0.0.2
Error resuming the fix pack installation from the status Failed to apply without impact to non-management hosts
In the Apply to non-management hosts stage of the fix pack installation, when you click the Resume button in the system console to resume the installation from the status Failed to apply without impact to non-management hosts, the fix pack installation stops before the end of the Apply to non-management hosts stage. The status of the fix pack installation in the Fix Pack Details panel is Applied without impact to non-management hosts.
.Workaround:
-----------
1. Log in to the management host as root.

2. Verify that the fix pack installation is in the Applied non impact to non management hosts status:
  • appl_ls_cat

3. Issue the following command:
  • ps -eaf | grep resume

In the output returned by the command, identify the process IDs for the miupdate process and the MIUpdateCLI process. In the following sample output, the miupdate process ID is 11599914 and the MIUpdateCLI process ID is 12976198.

root 11599914 1 0 12:25:11 - 0:00 /bin/sh /opt/IBM/mi/bin/miupdate -resume current -n
root 12976198 11599914 0 12:25:11 - 0:02 /usr/java6_64/bin/java -classpath .:/opt/IBM/mi/lib/*:/opt/IBM/mi/lib/log4j/* -Dlog4j.configuration=file:/opt/IBM/mi/configuration/log4j.properties -Djavax.net.ssl.trustStore=/opt/ibm/director/lwi/security/keystore/ibmjsse2.jks -DUSERDIR=/usr/IBM/applmgmt/isas.server com.ibm.isas.cli.command.application.MIUpdateCLI -resume current -n
root 21037306 35848408 0 12:39:58 pts/8 0:00 grep resume                
4. For each process ID, issue the following command to terminate the hanging process:
kill -9 <pid>
where <pid> represents the process ID of the miupdate process or the process ID of the MIUpdateCLI process.

5. Issue the following command to resume the fix pack installation:
  • miupdate -u apply
Fixed:
-----------
This issue is fixed in V1.0.0.3 and higher and V1.1.0.1.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • The appliance console is removed and fixpacks are no longer manged by the console or mi* commands.
KI003892
The system console GUI is not accessible after the fix pack is installed
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
The system console GUI is not accessible after the fix pack is installed
After the fix pack installation completes you might not be able to log into the system console GUI because the profiles files on the management host were not correctly updated.
.Workaround:
-----------
During a fix pack installation, profiles are automatically backed up on the management host.

You can restore the profiles using the following steps:
1. Log in to the management host as the root user.
2. Identify the most recent profile backups in the /.profile, /etc/profile, and /etc/security/profile directories. The file names include the date the back up was created.
3. Issue the the following commands:
  • cp /.profile-yyyy-mm-dd /.profile
    cp /etc/profile-yyyy-mm-dd /etc/profile
    cp /etc/security/profile-yyyy-mm-dd /etc/security/profile

    where yyyy-mm-dd is the most recent back up of the profile.
Fixed:
-----------
The only fix is to apply the workaround if encountered.
KI004754
Fix pack installation stops with user validation failure at preview stage
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Fix pack installation stops with user validation failure at preview stage
 
If the bcuaix core warehouse instance owner password was previously changed by using a method other than the miauth command method, a user validation failure occurs when you attempt to run the preview stage of the fix pack installation procedure. This failure stops the preview stage from running.
Workaround:
-----------
Change the password for the bcuaix core warehouse instance owner by using the following supported miauth command method:

1. Log in to the management host as the root user and issue the following command:
  • miauth -pw -p os -u bcuaix

2. Run the preview stage of the fix pack installation procedure.
Fixed:
-----------
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • These fixpacks have significantly changed how the fixpack is applied and no longer use the appliance console based fixpack stages.
KI004971
Resuming a failed preview stage results in a fix pack installation state of Waiting instead of Resuming
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
Resuming a failed preview stage results in a fix pack installation state of Waiting instead of Resuming
If the preview stage fails, the system console Fix Pack panel shows an error message. After fixing the environment and resuming the preview stage of the fix pack installation process, the status of the preview stage in the system console is shown as Waiting instead of Running.
Workaround:
-----------
If the preview stage of the installation process is in the Waiting state, start the preview stage again by clicking the Resume button in the system console.
Fixed:
-----------
V1.0.0.4
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • These fixpacks have significantly changed how the fixpack is applied and no longer use the appliance console based fixpack stages.
KI004907
Fix pack installation fails due to an HMC connection issue during the preview stage
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
Fix pack installation fails due to an HMC connection issue during the preview stage
The fix pack installation fails during the preview stage because the HMC cannot connect to an endpoint. The following error message is displayed:

ENV_VALIDATION::PREVIEW::FAILED::CEC connectivity validation with HMC failed
Workaround:
-----------
Identify and replace the invalid IP addresses stored in the HMC.

1. Log in to the HMC as the hscroot user and run the following command:
  • lssysconn -r all

2. Identify the invalid IP addresses. An invalid IP address is in a Connecting state and shows a connection error code.

In the following example output, the IP address 172.17.255.1 is invalid.

resource_type=sys,type_model_serial_num=8231-E2C*101F9ER,sp=unavailable,sp_phys_loc=unavailable,ipaddr=172.17.255.1,alt_ipaddr=unavailable,state=Connecting,connection_error_code=Connecting 0000-0000-00000000
resource_type=sys,type_model_serial_num=8231-E2C*101F9ER,sp=primary,sp_phys_loc=U78AB.001.WZSGRHY-P1,ipaddr=172.17.254.254,alt_ipaddr=unavailable,state=Connected
resource_type=sys,type_model_serial_num=8231-E2C*101F9FR,sp=primary,sp_phys_loc=U78AB.001.WZSGRJN-P1,ipaddr=172.17.254.255,alt_ipaddr=unavailable,state=Connected

3. For each invalid IP address, run the following command as the hscroot user on the HMC to remove it:
  • rmsysconn --ip invalid_IP_address -o  remove

where invalid_IP_address is the invalid IP address you identified in step 2.
Fixed:
-----------
V1.0.0.5/V1.1.0.1:. HMC FW in this validated stack address this issue.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • These fixpacks have significantly changed how the fixpack is applied and no longer use the appliance console based fixpack stages.
KI005059
HMC fails to restart during management stage firmware upgrade
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
HMC fails to restart during management stage firmware upgrade
During the management stage firmware upgrade of the HMC, the HMC fails to restart and results in a failure to upgrade the HMC firmware.
A symptom of the issue can be seen in the /BCU_share/aixappl/pflayer/log/pl_update.log file, which shows the following example log entries:

[22 May 2014 07:56:03,249] <28311760 UPDT PREP TRACE host01> For message id::631
[22 May 2014 07:56:03,251] <28311760 UPDT PREP ERROR host01> Error on nodes (172.23.1.245).
[22 May 2014 07:56:03,268] <28311760 UPDT PREP INFO host01> STEP_END::3::HMC_UPD::FAILED

When the HMC fails to restart, the HMC local console displays the following error message after the HMC is manually restarted:

Critical Error 1901
A critical error has prevented normal HMC startup. Please reboot the HMC and try again. If the problem persists, contact your support personnel. 1901: HMC Startup aborted due to a malfunction of a required module.
Workaround:
-----------
After encountering the HMC restart failure, complete the following steps to resume the firmware upgrade process:

1. Verify that the HMC is offline by running the ping command. If the HMC is confirmed to be offline, manually start the HMC.

2. After the HMC comes up, wait 10 minutes to be certain that all of the HMC services are started.

3. Resume the management stage update process by running the following command on the management host as root:

  • miupdate -resume
Fixed:
-----------
No permanent fix.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • These fixpacks have significantly changed how the fixpack is applied and no longer use the appliance console based fixpack stages. mi* commands are no longer applicable in these fixpacks and higher.
KI005062
V7000 storage drive firmware upgrade fails
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
V7000 storage drive firmware upgrade fails
The V7000 storage drive firmware upgrade process fails due to a failure of the utilitydriveupgrade command of the drive upgrade utility to be successfully installed, even after the successful installation of the installation package. Investigate the pl_update.trace log to determine if the utilitydriveupgrade command has failed.

Sample information from the PL log in /BCU_share/aixappl/pflayer/log/pl_update.trace:

[20 May 2014 03:06:54,797] <40566876 CTRL  TRACE  stgkf301> command: ssh admin@172.23.3.201 LANG=en_US svcservicetask applysoftware -file  IBM_INSTALL_driveUpgrade_130610
[20 May 2014 03:06:54,798] <40566876 CTRL  TRACE  stgkf301>  CMMVC6227I The package installed successfully.
[20 May 2014 03:06:54,798] <40566876 CTRL  TRACE  stgkf301> Rc = 1
[20 May 2014 03:06:54,799] <40566876 CTRL  DEBUG  stgkf301> Error String: CMMVC6227I The package installed successfully.
[20 May 2014 03:06:54,799] <40566876 CTRL  DEBUG  stgkf301> command succeeded - drive upgrade utility installed successfully
[20 May 2014 03:06:54,800] <40566876 CTRL  DEBUG  stgkf301> verifying installation of utilitydriveupgrade command
[20 May 2014 03:06:55,014] <40566876 CTRL  TRACE  stgkf301> command: ssh admin@172.23.3.201 utilitydriveupgrade
[20 May 2014 03:06:55,015] <40566876 CTRL  TRACE  stgkf301>  rbash: utilitydriveupgrade: command not found
[20 May 2014 03:06:55,015] <40566876 CTRL  TRACE  stgkf301> Rc = 127
[20 May 2014 03:06:55,016] <40566876 CTRL  DEBUG  stgkf301> utilitydriveupgrade did not get installed properly
[20 May 2014 03:06:55,016] <40566876 CTRL  TRACE  stgkf301> workaround: driveupgradeutility is having some issues, applying workaround by installing softwareupgrade test utility and retrying
[20 May 2014 03:06:55,017] <40566876 CTRL  DEBUG  stgkf301> apply: retrying install of drive upgrade utility
[20 May 2014 03:06:55,038] <40566876 CTRL  DEBUG  stgkf301> apply: Uploading update file /BCU_share/bwr2/firmware/storage/2076/image/imports/testupdate/IBM2076_INSTALL_upgradetest_11.15
[20 May 2014 03:06:55,235] <40566876 CTRL  TRACE  stgkf301> command: LANG=en_US scp /BCU_share/bwr2/firmware/storage/2076/image/imports/testupdate/IBM2076_INSTALL_upgradetest_11.15 admin@172.23.3.201:/home/admin/upgrade
[20 May 2014 03:06:55,236] <40566876 CTRL  TRACE  stgkf301> Rc = 0
[20 May 2014 03:06:55,237] <40566876 CTRL  DEBUG  stgkf301> apply: uploaded test update file IBM2076_INSTALL_upgradetest_11.15 to storwize
[20 May 2014 03:06:55,638] <40566876 CTRL  TRACE  stgkf301> command: ssh admin@172.23.3.201 LANG=en_US svctask applysoftware -file  IBM2076_INSTALL_upgradetest_11.15
[20 May 2014 03:06:55,638] <40566876 CTRL  TRACE  stgkf301>  CMMVC6227I The package installed successfully.
[20 May 2014 03:06:55,639] <40566876 CTRL  TRACE  stgkf301> Rc = 1
[20 May 2014 03:06:55,639] <40566876 CTRL  DEBUG  stgkf301> Error String: CMMVC6227I The package installed successfully.
[20 May 2014 03:06:55,640] <40566876 CTRL  DEBUG  stgkf301> apply: upgrade test utility installed successfully
[20 May 2014 03:06:55,641] <40566876 CTRL  DEBUG  stgkf301> Successfully applied workaround, retrying utilitydriveupgrade installation
[20 May 2014 03:06:55,702] <40566876 CTRL  TRACE  stgkf301> { Entering Ctrl::Updates::Storwize::get_imagefile_name (Called from /opt/ibm/aixappl/pflayer/lib/Ctrl/Updates/Storwize.pm line 1104)
[20 May 2014 03:06:55,703] <40566876 CTRL  TRACE  stgkf301> Args:["/BCU_share/bwr2/firmware/storage/2076/image/imports","driveutility"]
[20 May 2014 03:06:55,755] <40566876 CTRL  TRACE  stgkf301> command: ls /BCU_share/bwr2/firmware/storage/2076/image/imports/driveutility
[20 May 2014 03:06:55,755] <40566876 CTRL  TRACE  stgkf301>  IBM_INSTALL_driveUpgrade_130610
[20 May 2014 03:06:55,756] <40566876 CTRL  TRACE  stgkf301> Rc = 0
[20 May 2014 03:06:55,756] <40566876 CTRL  DEBUG  stgkf301> prepare: type driveutility image file: IBM_INSTALL_driveUpgrade_130610
[20 May 2014 03:06:55,757] <40566876 CTRL  TRACE  stgkf301> Return: 0
[20 May 2014 03:06:55,757] <40566876 CTRL  TRACE  stgkf301> Exiting Ctrl::Updates::Storwize::get_imagefile_name }
[20 May 2014 03:06:55,758] <40566876 CTRL  DEBUG  stgkf301> prepare: 172.23.3.201: installing drive upgrade utility
[20 May 2014 03:06:56,060] <40566876 CTRL  TRACE  stgkf301> command: ssh admin@172.23.3.201 LANG=en_US svcservicetask applysoftware -file  IBM_INSTALL_driveUpgrade_130610
[20 May 2014 03:06:56,061] <40566876 CTRL  TRACE  stgkf301>  CMMVC5993E The specified upgrade package does not exist.
[20 May 2014 03:06:56,061] <40566876 CTRL  TRACE  stgkf301> Rc = 1
[20 May 2014 03:06:56,062] <40566876 CTRL  DEBUG  stgkf301> Error String: CMMVC5993E The specified upgrade package does not exist.
[20 May 2014 03:06:56,063] <40566876 CTRL  DEBUG  stgkf301> drive upgrade utility failed to install:172.23.3.201
[20 May 2014 03:06:56,063] <40566876 CTRL  DEBUG  stgkf301> utilitydriveupgrade installation failed even after re-installing software test utility
[20 May 2014 03:06:56,064] <40566876 CTRL  DEBUG  stgkf301> Unable to fix problems in utilitydriveupgrade, drive update failed
Workaround:
-----------
Resume the firmware upgrade procedure by running the following command from the management host as root:
  • miupdate -resume


Note: If the problem persists, contact IBM Support with the following logs at hand:

/BCU_share/aixappl/pflayer/log -> All of the files within this directory

/log/pfmgt.trace

ssh admin@<storwize_ip> svc_snap
Fixed:
-----------
No permanent fix.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • These fixpacks have significantly changed how the fixpack is applied and no longer use the appliance console based fixpack stages. mi* commands are no longer applicable in these fixpacks and higher.
KI005064
V7000 storage drive goes offline after firmware upgrade succeeds
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
V7000 storage drive goes offline after firmware upgrade succeeds
 
The V7000 storage drive might go offline after a successful firmware upgrade. Investigate the pl_update.trace log to determine if a drive has gone offline.

Sample information from the PL log in /BCU_share/aixappl/pflayer/log/pl_update.trace:

[14 May 2014 07:34:36,828] <30867468 CTRL  TRACE  stgkf201> command: ssh admin@172.23.2.206 LANG=en_US utilitydriveupgrade -drivemodel ALL -filename  IBM2076_DRIVE_20130314
[14 May 2014 07:34:36,829] <30867468 CTRL  TRACE  stgkf201>  Upgrading drive id 47, ( 1 / 48 )
[14 May 2014 07:34:36,829] <30867468 CTRL  TRACE  stgkf201>  Upgrading drive id 23, ( 2 / 48 )
[14 May 2014 07:34:36,829] <30867468 CTRL  TRACE  stgkf201>  Upgrading drive id 35, ( 3 / 48 )
[14 May 2014 07:34:36,830] <30867468 CTRL  TRACE  stgkf201>  Upgrading drive id 11, ( 4 / 48 )
[14 May 2014 07:34:36,830] <30867468 CTRL  TRACE  stgkf201>  Upgrading drive id 4, ( 5 / 48 )
[14 May 2014 07:34:36,830] <30867468 CTRL  TRACE  stgkf201>  ERROR: Drive 4 is no longer online after being upgraded.
[14 May 2014 07:34:36,831] <30867468 CTRL  TRACE  stgkf201>     The current drive status is offline
[14 May 2014 07:34:36,831] <30867468 CTRL  TRACE  stgkf201> Rc = 1
[14 May 2014 07:34:36,832] <30867468 CTRL  DEBUG  stgkf201> Extracted msg from NLS: apply: 172.23.2.206 ssh admin@172.23.2.206 LANG=en_US utilitydriveupgrade -drivemodel ALL -filename  IBM2076_DRIVE_20130314 command failed.
Workaround:
-----------
1. Bring the offline drives back online by running the following command on the management host as root:

  • ssh admin@<storwize_ip> svcinfo lsdrive <drive id>

    where <storwize_ip> represents the IP address of the V7000 storage controller, and <drive id> represents the ID number of the drive that is offline in the pl_update.trace log.

    The following example output is a result of running the previous command:

    IBM_2076:Cluster_172.23.2.206:admin>svcinfo lsdrive 4
    id 4
    status offline
    error_sequence_number 547
    use spare
    UID 5000cca0222d216c
    tech_type sas_hdd
    capacity 837.9GB
    block_size 512
    vendor_id IBM-207x
    product_id HUC109090CSS60
    FRU_part_number 00Y2684
    FRU_identity 11S49Y7449YXXXKPVTUE7F
    RPM 10000
    firmware_level J2E6
    FPGA_level
    mdisk_id
    mdisk_name
    member_id
    enclosure_id 1
    slot_id 8
    node_id
    node_name
    quorum_id
    port_1_status excluded
    port_2_status excluded


    Note that 547 is the error_sequence_number in the output from the previous command.

2. Fix the error sequence number by running the following command on the management host as root:

  • svctask cheventlog -fix <error_sequence_number>

    where <error_sequence_number> represents the error sequence number that you noted in the previous step.

3. Check the current status of the drive by running the following command on the management host as root:

  • ssh admin@<storwize_ip> svcinfo lsdrive <drive id>

    where <storwize_ip> represents the IP address of the V7000 storage controller, and <drive id> represents the ID number of the drive that was offline.

    The following example output is a result of running the previous command:

    IBM_2076:Cluster_172.23.2.206:admin>svcinfo lsdrive 4
    id 4
    status online
    error_sequence_number
    use member
    UID 5000cca0222d216c
    tech_type sas_hdd
    capacity 837.9GB
    block_size 512
    vendor_id IBM-207x
    product_id HUC109090CSS60
    FRU_part_number 00Y2684
    FRU_identity 11S49Y7449YXXXKPVTUE7F
    RPM 10000
    firmware_level J2E6
    FPGA_level
    mdisk_id 0
    mdisk_name ARRAY3
    member_id 7
    enclosure_id 1
    slot_id 8
    node_id
    node_name
    quorum_id
    port_1_status online
    port_2_status online


    Note that the error sequence number has been fixed and the status of the drive is online.

4. Resume the firmware upgrade procedure by running the following command from the management host as root:

  • miupdate -resume


Note: If the problem persists, contact IBM Support with the following logs at hand:

/BCU_share/aixappl/pflayer/log -> All of the files within this directory

/log/pfmgt.trace

ssh admin@<storwize_ip> svc_snap
Fixed:
-----------
No permanent fix.
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • These fixpacks have significantly changed how the fixpack is applied and no longer use the appliance console based fixpack stages. mi* commands are no longer applicable in these fixpacks and higher.
KI004969
Network switch port configuration causes loss of connection to corporate network during reboot after firmware upgrade
Fixpack I_V1.0.0.3
Network switch port configuration causes loss of connection to corporate network during reboot after firmware upgrade
The 1 Gbps network switches might lose their connections to the corporate network because ports are configured with bpdu-guard, usually on ports 43-48. After the firmware upgrade and reboot as part of the Fix Pack 1.0.0.3 installation, the network switch ports might go into errdisable mode and be disabled if bridge protocol data units (BPDU) frames are detected.
Workaround:
-----------
The following steps should only be performed on V1.0 Appliances and not on V1.1 Appliances.
To prevent this issue from occurring, disable bpdu-guard on the uplink ports of both of the 1 Gbps network switches.

1. Log in to the 1 Gbps network switches as the admin user.

2. Verify that the 1 Gbps network switches have bpdu-guard enabled on its ports.

  • a. Select the iscli command line interface mode at the following prompt:
    • Select Command Line Interface mode (ibmnos-cli/iscli): iscli

    b. To see the port configurations, run the following commands:
    • switch48e1# enable
      switch48e1# show run


















































































































































    •  
    The following truncated example output is a result of running the previous commands:
    ..
    interface port 43
            pvid 5
            bpdu-guard
            exit
    !
    interface port 44
            pvid 5
            bpdu-guard
            exit
    !
    interface port 45
            pvid 5
            bpdu-guard
            exit
    !
    interface port 46
            pvid 5
            bpdu-guard
            exit
    !
    interface port 47       
             pvid 5
            bpdu-guard
            exit
    !
    interface port 48
            pvid 5
            bpdu-guard
            exit
    !
    ..


















































































































































  •  
3. If bpdu-guard is enabled on the ports of the 1 Gbps network switches, disable the setting by running the following commands on each switch:
  • switch48e1# enable
    switch48e1# conf t
    switch48e1# int port 43-48
    switch48e1# no bpdu-guard
    switch48e1#
    switch48e1# write mem
Fixed:
-----------
V1.0.0.4
V1.1.0.0 Fixed as part of appliance deployment and changes to networking setup.
KI005272
Paging devices are missing on all hosts except management host or paging space is not set to auto on management host
General
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
Paging devices are missing on all hosts except management host or paging space is not set to auto on management host
These issues might occur as a result of the fix pack installation. The paging00 device is missing when a check is done by running the lsps -a command on a host. The result of this missing paging device is that the paging space is cut in half from the standard 128 GB.

An indication that your system is experiencing the missing paging device issue is shown in the output that results from running the following command on any host as root:

$ dsh -n $ALL "lsps -a" | dshbak -c

In the following example output, the paging00 device is missing on all of the hosts except the management host (host01):

HOSTS -------------------------------------------------------------------------
host01
-------------------------------------------------------------------------------
Page Space      Physical Volume   Volume Group    Size %Used Active  Auto  Type Chksum
paging00        hdisk0            rootvg       65536MB     3   yes   yes    lv     0
hd6             hdisk0            rootvg       65536MB     3   yes   yes    lv     0

HOSTS -------------------------------------------------------------------------
host02, host03, host04, host05, host06
-------------------------------------------------------------------------------
Page Space      Physical Volume   Volume Group    Size %Used Active  Auto  Type Chksum
hd6             hdisk0            rootvg       64512MB     1   yes   yes    lv     0

A related issue is that the paging space on the management host might not be set to auto.
Workaround:
-----------

To restore the missing paging device on a host, run the following commands as root on each host that is missing the paging00 device:

  • /usr/sbin/mkps -a -n -s 64 rootvg hdisk0

    /usr/sbin/mklvcopy -k paging00 2

    /usr/sbin/swapon /dev/paging00

To set the paging space on the management host paging00 device to auto, run the following command on the management host as root:

  • chps -a y paging00
Fixed:
-----------
The only fix is through the workaround.
KI005121
Ethernet network switch firmware upgrade fails during apply stage of fix pack installation
Fixpack I_V1.0.0.3
Ethernet network switch firmware upgrade fails during apply stage of fix pack installation
 
During the apply stage of the fix pack installation process to upgrade the firmware of the Ethernet network switches, a comparison with the expected firmware image version might fail. If this happens, the firmware for the switch images has not been successfully uploaded.

You can confirm that this issue has occurred by checking the /var/aixappl/pflayer/log/pl_update.trace log. Run the following command to examine the log file:

tail -f /var/aixappl/pflayer/log/pl_update.trace

Look for the corresponding messages (highlighted in bold here) in the following example log output:

[05 Jun 2014 19:48:16,849] <26542124 UPDT APPI DEBUG  host01mgmt> Command Status:BNT:net3:172.23.1.251:1:Compare of firmware failed for switch after copy.
[05 Jun 2014 19:48:16,850] <26542124 UPDT APPI DEBUG  host01mgmt>  BNT:net1:172.23.1.253:0:Compare of firmware success for switch after copy.
[05 Jun 2014 19:48:16,850] <26542124 UPDT APPI DEBUG  host01mgmt>  BNT:net0:172.23.1.254:0:Compare of firmware success for switch after copy.
[05 Jun 2014 19:48:16,850] <26542124 UPDT APPI DEBUG  host01mgmt>  BNT:net2:172.23.1.252:0:Compare of firmware success for switch after copy.
[05 Jun 2014 19:48:16,850] <26542124 UPDT APPI DEBUG  host01mgmt>  , Command Status->1
[05 Jun 2014 19:48:16,856] <26542124 UPDT  ERROR  host01mgmt> TASK_END::13::1 of 1::NetFWUPD::172.23.1.251::::RC=1::The NET FW update failed on the node 172.23.1.251.
[05 Jun 2014 19:48:16,858] <26542124 UPDT  INFO   host01mgmt> TASK_END::13::1 of 1::NetFWUPD::172.23.1.253::::RC=0::The NET FW update is successful on the node 172.23.1.253.
[05 Jun 2014 19:48:16,860] <26542124 UPDT  INFO   host01mgmt> TASK_END::13::1 of 1::NetFWUPD::172.23.1.254::::RC=0::The NET FW update is successful on the node 172.23.1.254.
[05 Jun 2014 19:48:16,862] <26542124 UPDT  INFO   host01mgmt> TASK_END::13::1 of 1::NetFWUPD::172.23.1.252::::RC=0::The NET FW update is successful on the node 172.23.1.252.
[05 Jun 2014 19:48:17,011] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Logical_name=Infrastructure AND Solution_version=3.0.3.1, to update status of Product
[05 Jun 2014 19:48:17,108] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Sub_module_type=Infrastructure AND Solution_version=3.0.3.1, to update status of sub module
[05 Jun 2014 19:48:17,537] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Logical_name=Infrastructure AND Solution_version=3.0.3.1, to update status of Product
[05 Jun 2014 19:48:17,602] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Sub_module_type=Infrastructure AND Solution_version=3.0.3.1, to update status of sub module
[05 Jun 2014 19:48:17,931] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Logical_name=Infrastructure AND Solution_version=3.0.3.1, to update status of Product
[05 Jun 2014 19:48:17,998] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Sub_module_type=Infrastructure AND Solution_version=3.0.3.1, to update status of sub module
[05 Jun 2014 19:48:18,385] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Logical_name=Infrastructure AND Solution_version=3.0.3.1, to update status of Product
[05 Jun 2014 19:48:18,454] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Sub_module_type=Infrastructure AND Solution_version=3.0.3.1, to update status of sub module
[05 Jun 2014 19:48:18,600] <26542124 UPDT APPI ERROR  host01mgmt> Error on nodes (172.23.1.251).
[05 Jun 2014 19:48:18,632] <26542124 UPDT APPI INFO   host01mgmt> STEP_END::13::NetFW_UPD::FAILED
[05 Jun 2014 19:48:18,642] <26542124 UPDT APPI DEBUG  host01mgmt> Error occured in apply for product netfw1
[05 Jun 2014 19:48:18,719] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Logical_name=netfw1 AND Solution_version=3.0.3.1, to update status of Product
[05 Jun 2014 19:48:18,864] <26542124 UPDT APPI ERROR  host01mgmt> Apply (impact) phase for solution has failed.
[05 Jun 2014 19:48:18,935] <26542124 UPDT APPI DEBUG  host01mgmt> Executing query Logical_name=bwr3 AND Solution_version=3.0.3.1, to update status of Solution
[05 Jun 2014 19:48:19,088] <26542124 UPDT APPI INFO   host01mgmt> PHASE_END APPLY IMPACT
[05 Jun 2014 19:48:19,090] <26542124 UPDT APPI ERROR  host01mgmt> The apply phase for the release 'bwr3' failed.
Workaround:
-----------
Resume the firmware upgrade procedure by running the following command on the management host as root:

  • miupdate -resume
Fixed:
-----------
V1.0.0.4
V1.1.0.1
V1.0.0.6/V1.1.0.2 has fundamental changes that impact this issue.
  • These fixpacks have significantly changed how the fixpack is applied and no longer use the appliance console based fixpack stages. mi* commands are no longer applicable in these fixpacks and higher.
KI004393
Tuning database queries returns an error message that the database is not configured for tuning
General
I_V1.0
I_V1.1
.
Tuning database queries returns an error message that the database is not configured for tuning
 
After you have tuned some queries successfully, you might see an error message that states the database is not configured for tuning when you attempt to tune additional queries.

This problem can occur when you tune DDL statements using the web console, for example by selecting "Tune All with This Web Console" from the Execution Summary tab in the SQL dashboard. When DDL statements are tuned, they are also executed under the user ID that deployed the database, and some of the tables that are created under the schema of that user ID can cause problems with query tuning.
Workaround:
-----------
If the error occurs, connect to the database and look for tables that use the schema of the user ID that was used to deploy the database.

For those objects, delete the following tables, which can cause the error:

ADVISE_*
EXPLAIN_*
OBJECT_METRICS

You can also remove the following tables, which are not needed:

QT_*
OPT_PROFILE

To prevent the error from happening again, ensure that you do not tune DDL statements. You can use dashboard filters in the database performance monitor web console to remove DDL statements from the grid so that you do not tune them as part of a workload.
Fixed:
-----------
Only fix is the workaround when encountered.
KI004721
The dirinst1 IBM Systems Director instance owner user password cannot include certain characters
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.1.0.0
The dirinst1 IBM Systems Director instance owner user password cannot include certain characters
When changing the password for the dirinst1 IBM Systems Director instance owner user, certain characters contained within the password causes the password change procedure to fail.
Workaround:
-----------
Do not include any of the following characters in the password for the dirinst1 IBM Systems Director instance owner user:

! % ˆ & ( ) | " ' ? , < > * $ @ = + [ ] \ / ; : . { } ` ˜ # - ´ ¨

In addition, do not begin the password for the dirinst1 IBM Systems Director instance owner user with any of the following characters:

_ 0 1 2 3 4 5 6 7 8 9
Fixed:
-----------
V1.0.0.5/V1.1.0.1 has fundamental changes that impact this issue.
  • IBM Systems Director is disabled and no longer used as part of the Appliance Console. The dirinst1 user is no longer required or used as part of the appliance.
KI005291
Logging out from storage web console results in error message
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.0
I_V1.1.0.1
Logging out from storage web console results in error message
 
This issue occurs when you access the storage web console from the system console by clicking System > Service Level Access, selecting one of the links in the IBM Storwize V7000 section, and logging out by clicking the Log out link on the storage web console. When you log out of the storage web console, a 404 File not found error is displayed.
Workaround:
-----------
When this issue occurs, click the Back button of your browser to return to the IBM Storwize V7000 login page.
Fixed:
-----------
Apply workaround if encountered.
V1.0.0.6/V1.1.0.2 has fundamental change that impact this issue.
  • The appliance console is removed which removes the availability of this Service Level Access feature.
KI004678
All selected events are not deleted from Events page of system console
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.0
I_V1.1.0.1
All selected events are not deleted from Events page of system console
As a result of selecting a number of events to delete from the Events page of the system console by using the Delete selected events button (red cross) that is located in the upper-right corner of the events table above the Actions column, the Delete the number selected events that are currently showing option deletes only 1 or 2 of the selected events, not all.
Workaround:
-----------
If you want to delete a small number of events, you can use the Delete button located in the Action column for each individual event.

If you want to delete a large number of events, you can filter events by choosing event text and attribute settings to list events and then delete the list of filtered events. It is important to note that this procedure deletes all of the events that match the current filter, even if the listed events are not individually selected for deletion. To delete a large number of filtered events from the Events page of the system console, do the following steps:

1. Navigate to the Events page by clicking System > Events found either in the System menu, or by expanding the Welcome page Working with the system section and clicking the link in Review system events.

2. Set the event filters located across the top of the events table to specifically list the events that you want to delete. Refresh the events list to see the list of filtered events.

3. To make the Delete selected events button visible above the Action column of the events table, select at least one of the filtered events.

4. Click the Delete selected events button.

5. To delete all of the events in the filtered list, select the Delete all events that match the current filter option. Click OK.
 
Fixed:
-----------
No fix, if encountered apply the workaround.
V1.0.0.6/V1.1.0.2 has fundamental change that impact this issue.
  • The appliance console is removed which removes this feature.
KI004952
Status of powered-off Ethernet network switches are not correctly shown in system console
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.0
I_V1.1.0.1
Status of powered-off Ethernet network switches are not correctly shown in system console
On the system console Hardware > Network Devices page, it is not possible to monitor the actual status of the Ethernet network switches because they are always shown as Available and Powered On.
Workaround:
-----------
To reliably verify the availability of Ethernet network switches by using the system console, do the following steps:

1. Navigate to the Events page by clicking System > Events found either in the System menu, or by expanding the Welcome page Working with the system section and clicking the link in Review system events.

2. Search for the keyword 'switch' on the Events page and filter the events with severity Critical or Informational.

The following example output, showing only three relevant columns, is the result of a filtered search:

Event text                            Type              Severity

Eth_network_switch_name is offline.   Network switch    Critical

Eth_network_switch_name is online.    Network switch    Informational
Fixed:
-----------
No fix, if encountered apply the workaround.
V1.0.0.6/V1.1.0.2 has fundamental change that impact this issue.
  • The appliance console is removed which removes this feature.
KI004434
An error message about password synchronization is displayed when using the miauth -pw command
Generel
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
An error message about password synchronization is displayed when using the miauth -pw command
When attempting to use the miauth -pw command to reset a password for a node or hardware device, the following output is displayed in the system console:

Changing the password for user 'username' on resource 'device'.
The password change on the 'device' resource failed.
The script failed with the following message:
-------------
spawn /usr/bin/passwd username
Changing password for "username"
username's Old password:
3004-604 Your entry does not match the old password.

where username is the name of the user for the password that you are attempting to change, and device is the name of the node or hardware device.

The error message is displayed because a command other than miauth was previously used to change the node or hardware device password. Using a method other than the miauth command method results in password synchronization issues.
Workaround:
-----------
1. Run the following command:

miauth -u user_name -p device_type -pw -oldpw

where user_name is the user name of the node or device, and device_type is the type of node or hardware device. The device_type options are os, hmc, net, san, and storage.

2. When you are prompted for a new password, enter the password that you want to use for the node or hardware device you have specified.

3. When you are prompted to enter the old password, enter the password that was originally provided when it was changed with a command other than miauth.

After the password change is complete, a message is displayed indicating that the operation was successful.
Fixed:
-----------
V1.0.0.3
V1.0.0.6/V1.1.0.2 has fundamental change that impact this issue.
  • The appliance console is removed and miauth is no longer available for managing passwords. The appl_conf command will be used and documented in a pllayer document at a later date.
KI004752
Warehouse tools administration console hangs after several HA failures
General I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
Warehouse tools administration console hangs after several HA failures
After several HA failures, IBM Tivoli System Automation for Multiplatforms (Tivoli SA MP) might not be able to restart the warehouse tools administration console because the warehouse tools application server (WASAPP) is stuck in an Online state.
Workaround:
-----------
Kill the warehouse tools application server profile process by completing the following steps:

1. Log in to the management node as the root user.

2. Delete the warehouse tools application server profile pid file by running the following command:
  • rm /usr/IBM/dwe/appserver_001/appServer_10/profiles/AppSrv01/logs/server1/server1.pid

Deletion of the server pid file causes an inconsistency that eventually kills the pid process. The killed pid process removes the Stuck Online status on the application server and the monitor returns an Offline status. Tivoli SA MP can then restart the warehouse tools administration console.
Fixed:
-----------
V1.0.0.4+, V1.1.0.1: Issue is fixed.
V1.0.0.6/V1.1.0.2: Warehouse Tools is removed from the appliance.
KI005210
Database partitions moved during manual fail over that times out can result in corrupted db2nodes.cfg file when resources are restarted on source node
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.1.0.0
Database partitions moved during manual fail over that times out can result in corrupted db2nodes.cfg file when resources are restarted on source node
This issue can occur when doing a manual fail over or during a fail over when the source is still able to run resources. If IBM Tivoli System Automation for Multiplatforms is not able to start all of the database partitions on the target node during a fail over, it marks the target as Failed offline and moves the resources back onto the source node. However, if two or more database partitions were successfully started during the fail over to the target node, but the start of the other database partitions ultimately timed out, then the db2nodes.cfg file can become corrupted during the move back to the source node.

The corruption of the db2nodes.cfg file is due to the fact that the database partitions are started serially on the target node during a fail over, but after an unsuccessful fail over of all of the database partitions, the database partitions are started in parallel when moved back to the source node with DB2 expecting serial starts.

The symptoms of this issue is an unsuccessful fail over after running the hafailover command and a db2nodes.cfg file that is missing one or more database partition entries.
Workaround:
-----------
1. To restore the cluster, stop any DB2 resources by running the following command as root on the administration host:

  • hastopdb2

2. Manually add the missing database partition entries to the db2nodes.cfg file.

3. Restart the DB2 resources by running the following command as root on the administration host:

  • hastartdb2
Fixed:
-----------
Apply HA Tools 2.0.0.4 and higher.Updates to HA Tools reduce the chance of db2nodes.cfg corruption.
V1.0.0.5/V1.1.0.1 includes HA Tools 2.0.5.0.
KI004994
Running hafailover command results in message that resources failed to start
General
I_V1.0
I_V1.1
Running hafailover command results in message that resources failed to start
On occasion, running the hafailover command can result in the display of the following message:

Failed to start all resources. Check state with hals

This message is erroneous because sometimes the failover process takes longer to complete than the configured timeout period. All of the resources did not start because the failover process is still underway.
Workaround:
-----------
Periodically check the status of the failover by running the following command on the management host as root:

hals

If the resources are in a Pending online state, wait a few minutes before repeating the previous command. If all of the resources are Online and accessible after a reasonable period of time, no further action is required.

If all of the resources are not Online and accessible after a reasonable period of time, contact IBM Support for assistance.
Fixed:
-----------
No fix. See workaround above.
KI004767
Upgrade of database performance monitor to version 5.3 results in warning that performance monitoring is not fully enabled
Fixpack I_V1.0.0.3
Upgrade of database performance monitor to version 5.3 results in warning that performance monitoring is not fully enabled
 
Some, rather than all, of the tables with an IBMPDQ schema are dropped from the opmdb performance monitor database during the database performance monitor upgrade process from version 5.2 to version 5.3. As a result, the following warning message is displayed in the database performance monitor user interface:

Performance monitoring is not fully enabled
Workaround:
-----------
Drop the remaining tables with an IBMPDQ schema in the opmdb performance monitor database by completing the following steps:

1. Bring the database performance monitor user interface offline by running the following command on the management host as the root user:
  • chrg -o offline -s "Name = 'db2_perf-rg'"

2. Before proceeding to the next step, verify that the DPM COMPONENT has gone from a Pending to an Offline OPSTATE by running the following command:
  • hals

    The following output is an example of running the previous command:

    MANAGEMENT DOMAIN
    +============+=========+=========+=========+=================+=================+=============+
    | COMPONENT  | PRIMARY | STANDBY | CURRENT | OPSTATE         | HA STATUS       | RG REQUESTS |
    +============+=========+=========+=========+=================+=================+=============+
    | WASAPP     | host01  | host03  | host01  | Online          | Normal          | -           |
    | DB2APP     | host01  | host03  | host01  | Online          | Normal          | -           |
    | DPM        | host01  | host03  | host01  | Offline         | Normal          | -           |
    | DB2DPM     | host01  | host03  | host01  | Online          | Normal          | -           |
    +============+=========+=========+=========+=================+=================+=============+

3. Connect to the opmdb performance monitor database by running the following command on the management host as the db2opm user:
  • db2 connect to opmdb

4. Drop the IBMPDQ schema and all of the objects contained within it by running the following command:
  • db2 "CALL SYSPROC.ADMIN_DROP_SCHEMA('IBMPDQ', NULL, 'ERRORSCHEMA', 'ERRORTABLE')"

5. Stop and restart the database performance monitor by running the following commands on the management host as the root user:
  • hastopdpm
    hastartdpm

After the IBMPDQ schema is dropped, stopping and restarting the database performance monitor in step 5 results in a re-creation of the IBMPDQ schema and all of its objects, and the monitored database configuration is synchronized with the repository server.

Note: A result of stopping the database performance monitor in step 5 is a short period of uncollected database performance monitoring data.
Fixed:
-----------
Only the workaround is available.
KI005415
Node power up might result in inoperative syslogd process
General
I_V1.0
I_V1.1
Node power up might result in inoperative syslogd process
Powering up a node or nodes after a power shut down might result in the system log daemon (syslogd) not automatically starting on the host or hosts of the system. An inoperative syslogd results in the system log not being updated and error messages shown after running some commands.

To verify that syslogd is inoperative on any of the hosts, run the following command on any host as root:

dsh -n ${ALL} 'lssrc -s syslogd | grep inoperative' | dshbak -c

The following example output is a result of running the previous command and shows on which hosts the syslogd is inoperative:

HOSTS -------------------------------------------------------------------------
host02, host03, host04, host05, host06, host08
-------------------------------------------------------------------------------
 syslogd ras inoperative  3:23:38 PM 

To verify that an active syslogd is not logging events in the /var/log/syslog.out log file on each of the hosts, check for a zero file size. Run the following command on any host as root:

dsh -n ${ALL} 'v=/var/log/syslog.out;([ -f ${v} ] && wc -l ${v}) || (touch ${v} && wc -l ${v})'

The following example output is a result of running the previous command and shows that syslogd is not logging events on host04 because
the system log does not contain any entries as indicated by its zero
file size:

host01:    28735 /var/log/syslog.out
host03:    18750 /var/log/syslog.out
host08:     1059 /var/log/syslog.out
host06:    16870 /var/log/syslog.out
host02:     3971 /var/log/syslog.out
host05:    24312 /var/log/syslog.out
host07:    18656 /var/log/syslog.out
host04:    0 /var/log/syslog.out
Workaround:
-----------
If the system log daemon is inoperative or not logging events on any of the hosts, complete the following steps:

1. Verify that the /var/log/syslog.out log file exists, and if the log file does not exist, create it. Both of these tasks can be completed by running the following command on any host as root:

  • dsh -n ${ALL} 'v=/var/log/syslog.out;[ ! -f ${v} ] && echo "Node does not have a valid $v file." && touch ${v}' | dshbak -c

2. Start the syslogd process by running the following command on any host as root:

  • dsh -n ${ALL} 'refresh -s syslogd' | dshbak -c

3. Verify that an active system log daemon is logging events in the /var/log/syslog.out log file on each of the hosts by checking for a non-zero file size. Run the following command on any host as root:

  • dsh -n ${ALL} 'v=/var/log/syslog.out;([ -f ${v} ] && wc -l ${v}) || (touch ${v} && wc -l ${v})'

4. If zero file size /var/log/syslog.out log files or inoperative system log daemons persist after completing the previous steps, contact IBM Support.
Fixed:
-----------
None. See workaround.
KI006086
Hardware management console (HMC) updates will disable call-home functionality if the pre-fix-pack-3 HMC firmware level is older than V7.7.3
Fixpack I_V1.0.0.3
Hardware management console (HMC) updates will disable call-home functionality if the pre-fix-pack-3 HMC firmware level is older than V7.7.3
After installing PDOA fix pack 3 you may find that call-home settings on the primary and secondary HMC have been reset. This will occur if the HMC was at any firmware level older than V7.7.3 prior to installing PDOA fix pack 3.
Workaround:
-----------
After completing the fix pack installation, check the HMC call home settings have not been reset using the following steps:

As hscroot user on the primary HMC, confirm the file isas_config.xml is found by running:

ls -l /opt/hsc/data/ISASconfig
Repeat this check on the secondary HMC.
As hscroot user on the primary HMC, confirm the file isas_config.xml is found by running:

ls -l /opt/hsc/data/ISASconfig
Repeat this check on the secondary HMC.

Note: If any of these checks fail, contact IBM Support.

Fixed:
-----------
See workaround.
KI004960
AIX reboot operation fails during fix pack installation process despite successful server reboot
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
AIX reboot operation fails during fix pack installation process despite successful server reboot
Older versions of CAS have an unstable Communication State with IBM Systems Director. This unstable Communication State results in the appl_ls_hw command sometimes showing the server with a Communication State that is Off even though the server is reachable at times.

During the fix pack installation when the server reboots after the AIX update, there is a chance that the Communication State of the server is Off, which causes the AIX reboot operation to fail.

When a server fails to reboot the AIX OS, an error message is displayed in the system console. You can also check the pl_update.log file for a message similar to the following example message:

[18 Mar 2014 12:06:27,464] <29819052 UPDT APPI DEBUG  host01mgmt> AIX update Reboot Error on node server5
[18 Mar 2014 12:06:27,514] <29819052 UPDT  INFO   host01mgmt> TASK_END::1::3 of 3::AIXUPD_REBOOT::172.23.1.17::::RC=1::The AIX reboot operation failed on 172.23.1.17.
Workaround:
-----------
1. Log in to the host and verify that the server was actually restarted by running the following command:

  • uptime

    The uptime command shows how long the server has been running.

2. Identify the logical name of the servers where the AIX reboot operation failed by running the following command from the management node as root:

  • appl_ls_hw -M

    The following example output is a result of running the previous command:

    NAME       HOSTNAME      IP             MODULE        STATUS      DESCRIPTION 
    server0    host03mgt     172.21.1.14    Data          Stopping    IBM Power 730 Express Operating System
    server1    host02mgt     172.21.1.15    Foundation    Online      IBM Power 740 Express Operating System
    server2    host04mgt     172.21.1.16    Data          Online      IBM Power 730 Express Operating System
    server3    host01mgt     172.21.1.17    Foundation    Online      IBM Power 740 Express Operating System
    server4    host02mgt     172.21.1.18    Foundation    Online      IBM Power 740 Express Operating System
    server5    host06mgt     172.21.1.19    Data          Online      IBM Power 730 Express Operating System
    server6    host01mgmt    172.21.1.10    Foundation    Online      IBM Power 740 Express Operating System

    In the previous example output, servers that failed to reboot their AIX OS show a Stopping status. In this example output, server0 is in the Stopping state.

3. To bring online the server that failed to reboot the AIX OS, run the following command from the management node as root:

  • appl_start -l <Server_logical_name>

    The following example output is a result of running the previous command to bring server0 online:

    (0) root @ stgkf201: 7.1.0.0: /
    $ appl_start -l server0
    Checking the status for the resource 'server0'.
    The resource 'server0' is online. Skipping the power on operation.
    The SSHD daemon started successfully on '172.21.1.14'.
    Retrieving and updating the status in the database for 'server0'.
    Retrieving and updating the child node(s) status in the database for 'server0'.
    NAME        HOSTNAME         IP            MODULE     STATUS         DESCRIPTION
    server0     host03corp       9.3.22.27     Data       Online         IBM Power 730 Express Operating System
    (0) root @ stgkf201: 7.1.0.0: /
    $

4. After the server shows an Online status, the fix pack installation process can be resumed by running the following command:

  • miupdate -resume
Fixed:
-----------
V1.0.0.5/V1.1.0.1 have fundamental changes that impact this issue.
  • System Director is no longer enabled.
KI004991
System console shows inconsistent status of management or apply stage of fix pack installation
Fixpack I_V1.0.0.3
System console shows inconsistent status of management or apply stage of fix pack installation
After resuming from a failed stage, the fix pack installation hangs. In the Fix Pack panel of the system console, the level 3.0.3.1 status indicates that the current stage was completed by displaying the status as Applied to management hosts or Applied to non-management hosts. However, clicking the View Progress button in the Fix Pack Details panel of the system console shows that the management or apply stage is in the Running state.
Workaround:
-----------
1. Log in as the root user on the management host and run the following command to determine if any miupdate -resume or MIUpdateCLI -resume processes are still running:

  • ps -eaf | grep resume

    In the output returned by the command, identify the process IDs for the miupdate process and the MIUpdateCLI process. In the following sample output, the miupdate process ID is 11599914 and the MIUpdateCLI process ID is 12976198.

    root 11599914 1 0 12:25:11 - 0:00 /bin/sh /opt/IBM/mi/bin/miupdate -resume current -n
    root 12976198 11599914 0 12:25:11 - 0:02 /usr/java6_64/bin/java -classpath .:/opt/IBM/mi/lib/*:/opt/IBM/mi/lib/log4j/* -Dlog4j.configuration=file:/opt/IBM/mi/configuration/log4j.properties -Djavax.net.ssl.trustStore=/opt/ibm/director/lwi/security/keystore/ibmjsse2.jks -DUSERDIR=/usr/IBM/applmgmt/isas.server com.ibm.isas.cli.command.application.MIUpdateCLI -resume current -n
    root 21037306 35848408 0 12:39:58 pts/8 0:00 grep resume                

2. Terminate any running miupdate -resume or MIUpdateCLI -resume processes by running the following command:

  • kill -9 <pid>

    where <pid> represents the process ID of the running miupdate or MIUpdateCLI process.

3. Click the View Progress button in the Fix Pack Details panel of the system console to check if the status of the stage has changed.
  • If the status changed to Completed, start the next stage of the fix pack installation process by clicking the OK button in the system console.
  • If the status is still Running, log in to the management node as the root user and run the following example command to start the next stage of the fix pack installation:
















































































































































  •  
  • miupdate -u <next_stage>

    Examples:

    If you are currently in the management stage, run the following command to start the prepare stage:

    miupdate -u prepare

    If you are currently in the apply stage, run the following command to start the commit stage:

    miupdate -u commit
Fixed:
----------
KI005115
Fix pack installation can't be resumed in the apply stage after a problem with the SSD drawer
Fixpack I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
Fix pack installation can't be resumed in the apply stage after a problem with the SSD drawer
The fix pack installation fails in the apply stage due to a problem with the SSD drawer.  You will see error messages in the /BCU_share/aixappl/pflayer/log/pl_update.log file that are similar to the following output:

  • Running command -- LANG=en_US /usr/sbin/lsmcode -c -r -d ses1 -- on 172.23.1.13
    command output-> ??????
    return code -> 1
    A query of the firmware level of ses1 172.23.1.13 failed.
    Command Output->A query of the firmware level of ses1 172.23.1.13 failed.
    Return code->1

After you address the problem with the SSD drawer, the device name of the SSD drawer might change when you reboot the node. If the device name is changed, you will not be able to resume the fix pack installation. 
Workaround:
-----------
Address the problem with the SSD drawer and, if necessary, change the device name of the SSD drawer back to the original name before you resume the fix pack installation.

1.  Shut down the node connected to the SSD drawer:

  • shutdown -F

2. Power off the BlueHawk SSD adapter and then power it back on.  

3. Start the node.  You can do this manually or through the HMC.

4.  Run the configuration manager: 

  • cfgmgr

5. Verify if the name of the SSD drawer is reset to the original name.  The original name is contained in the error message.

  • /usr/sbin/lsmcode -c -r -d <original_device_name>

    where <original_device_name> represents the original name of the SSD drawer in the error message.

6.  If the name of the SSD drawer was not reset to the original name, reset it.

  • a.  Issue the following command:

    • rendev -l <original_device_name> -n <new_device_name>

    b. Verify that the name of the SSD drawer has been changed to the original name: 

    • /usr/sbin/lsmcode -c -r -d <original_device_name>

7.  Resume the fix pack installation:

  • miupdate -resume
Fixed:
----------
See Workaround.
KI004080
Error resuming the fix pack installation after the M_Applied_Nonimpact substage
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Error resuming the fix pack installation after the M_Applied_Nonimpact substage
The fix pack installation fails in the Apply to management hosts stage. After you address the issue and click Resume current phase in the system console, the system console does not respond and the fix pack installation does not continue.
Workaround:
-----------
1. Log in to the management host as the root user.

2. Verify that the fix pack installation is in the M_Applied_Nonimpact substage:
  • appl_ls_cat

3. Resume the fix pack installation:
  • miupdate -u management
Fixed:
----------
See Workaround.
KI004080
Error resuming the fix pack installation after the M_Failed_Apply_Nonimpact substage
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Error resuming the fix pack installation after the M_Failed_Apply_Nonimpact substage
The fix pack installation fails in the Apply to management hosts stage. After you address the issue and click Resume current phase in the system console, the system console does not respond and the fix pack installation does not continue.
Workaround:
-----------
1. Log in to the management host as the root user.

2. Verify that the fix pack installation is in the M_Failed_Apply_Nonimpact substage:
  • appl_ls_cat

3. Resume the fix pack installation:
  • miupdate -u management
Fixed:
----------
See Workaround
KI004080
Fix pack installation stops at M_Prepared state after completion of management prepare substage
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Fix pack installation stops at M_Prepared state after completion of management prepare substage
If a failure occurs during the early management substage that validates the environments of the management and the standby management hosts, the status of the fix pack installation hangs at the Prepared management hosts state, as shown on the left side of the Fix Packs panel in the system console or you can use the appl_ls_cat command to view states. You can also verify that miupdate processes are not running by issuing the ps -ef | grep -i miupdate command. The Previewed state, which preceded the Prepared management hosts state, signalled the successful completion of the Preview stage of the fix pack installation procedure.

After the management environments are fixed and the management stage of the fix pack installation procedure is resumed, the installation stops in the M_Prepared state, which means the fix pack installation procedure successfully completed the management prepare substage, but does not continue further. Attempts to normally restart the fix pack installation process do not work.
Workaround:
-----------
To restart the fix pack installation procedure at the management stage, run the following commands in sequence from the management host as root:

miupdate -u management
miupdate -u prepare
miupdate -resume current
miupdate -u management
Fixed:
----------
See Workaround.
KI005346
Fix pack installation fails in the apply stage with GPFS start error
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
Fix pack installation fails in the apply stage with GPFS start error
Fix pack installation fails with a GPFS error after updating the V7000 storage during the apply stage. GPFS on the management host displays arbitrating state during multiple attempts to start and finally ends with a start error as shown in the following portion of example entries in the pl_update.log:

[11 Jun 2014 17:37:47,787] <9699536 GPFS  DEBUG  plsapd01> Executing command on 172.23.1.1-> ssh root@172.23.1.1 /usr/lpp/mmfs/bin/mmgetstate -Y
[11 Jun 2014 17:37:49,274] <9699536 GPFS  DEBUG  plsapd01> Return code-> 0
[11 Jun 2014 17:37:49,275] <9699536 GPFS  DEBUG  plsapd01>  Output-> , mmgetstate::HEADER:version:reserved:reserved:nodeName:nodeNumber:state:quorum:nodesUp:totalNodes:remarks:cnfsState
[11 Jun 2014 17:37:49,275] <9699536 GPFS  DEBUG  plsapd01>  mmgetstate::0:1:::plsapd01:1:arbitrating:1*:0:8::(undefined):
[11 Jun 2014 17:37:49,276] <9699536 GPFS  DEBUG  plsapd01> GPFS state on node 172.23.1.1 : arbitrating
[11 Jun 2014 17:37:49,277] <9699536 GPFS  DEBUG  plsapd01> GPFS is not yet started completly on 172.23.1.1. waiting for 30 sec more
[11 Jun 2014 17:38:16,007] <23199746 GPFS  DEBUG  plsapd01> Timeout:GPFS is not active on node 172.23.1.3
[11 Jun 2014 17:38:19,277] <9699536 GPFS  DEBUG  plsapd01> Timeout:GPFS is not active on node 172.23.1.1
[11 Jun 2014 17:38:19,300] <21954626 STRT  ERROR  plsapd01> Could not start the product 'GPFS' on 'server7 server2'
[11 Jun 2014 17:38:19,331] <20840644 UPDT APPI WARN   plsapd01> Failed to start solution.
[11 Jun 2014 17:38:19,381] <20840644 UPDT APPI DEBUG  plsapd01> Executing query Logical_name=bwr1 AND Solution_version=3.0.3.1, to update status of Solution
[11 Jun 2014 17:38:19,489] <20840644 UPDT APPI INFO   plsapd01> PHASE_END APPLY IMPACT
[11 Jun 2014 17:38:19,490] <20840644 UPDT APPI DEBUG  plsapd01> Apply phase done for release 'bwr1' successfully
[11 Jun 2014 17:38:19,491] <20840644 UPDT RESU INFO   plsapd01> PHASE_END RESUME
[11 Jun 2014 17:38:19,493] <20840644 UPDT RESU ERROR  plsapd01> The resume phase for the release 'bwr1' failed.
Workaround:
-----------
Fix the GPFS error and then continue the fix pack installation by completing the following steps.

1. Run the following commands on the management host as the root user:
  • appl_start -g -s Admin
    appl_start -q -s Management 
    appl_start

2. Check the fix pack installation status by running the following command:
  • appl_ls_cat
    • If the fix pack installation status is in the Applied state, proceed to the Commit phase.
    • If the fix pack installation status is not in the Applied state, resume the fix pack installation by running the following command:

    • miupdate -resume
Fixed:
----------
V1.0.0.4
V1.1.0.0
  • The versions above introduced a GPFS design change that separates the management hosts into their own GPFS cluster.
KI005022
Message display is not updated during management stage after fix pack installation is resumed
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Message display is not updated during management stage after fix pack installation is resumed
he message display does not update when resuming the fix pack installation after an error during the management stage. The system console continues to show the following error message:

Error: Error on nodes (172.23.1.10).. Refer to the platform layer log file for details. To ...

For example, when the management host reboots, the warning message that the management host is rebooting is not shown. This warning message is normally shown when the management host reboots during the management stage of a successful fix pack installation process that does not require to be resumed after an error or a failure.
Workaround:
-----------
When the state of the Apply to management hosts stage changes to Suspended, it means that the management host is rebooting. The user must hover the mouse over the error message to see the reboot warning details. The user must then follow the instructions in the User response section of the reboot warning message to restart the system console.
Fixed:
----------
V1.0.0.6/V1.1.0.2 include design changes to the fixpack process such that this is not an issue.
KI004283
Ethernet switches might lose their configurations during firmware upgrade
Fixpack
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.1.0.1
I_V1.1.0.2
Ethernet switches might lose their configurations during firmware upgrade
During fix pack installation, the Ethernet switches might lose their configurations when the switch firmware is upgraded in two stages, first to an interim level, then to the target level.
Workaround:
-----------
Contact IBM Support and reference this technote. Do no proceed without working with a support engineer.
As a precautionary measure, automated backups of the switch configurations are taken before the preview phase and during the apply phase of the fix pack installation in V1.0.0.3 and above. The backup files are located on the management host in the /BCU_share/net_switch_backup folder.

Restore the Ethernet switch configurations from their configuration backup files by completing the following steps:

1. Copy the configuration files to a folder on a USB drive or on a local computer after the preview phase. As an example, kf3 is the USB drive folder name used here when the usbcopy command is run at a later step.

Explanation: Instruct the customer to make a copy of the switch configuration backup files to a USB drive or their local computer after the preview stage because the switch usually loses its configurations during the apply phase. It will be impossible to remotely access the management host to make a copy of the configuration backup files after the apply phase if the switch has lost its configurations.

2. Connect a serial cable to the serial port on the switch and start a hyperterminal session.

3. Enter the following settings for the hyperterminal:
  • Bits per second: 9600
    Data bits: 8
    Parity: None
    Stop bits: 1
    Flow controls: None

    Click OK.

4. Click Enter on the hyperterminal session.

5. Enter the following password:
  • admin

6. At the prompt, run the following command:
  • > enable
  • The following message is displayed:
    Enable privilege granted.

7. Run the following command:
  • # config t

    The following message is displayed:
    Enter configuration commands, one per line. End with Ctrl /Z.

8. Install the USB drive with both the OS and boot image firmware loaded.

9. Run the following commands:
  • > enable
    # config t
    # clear flash-config
    Enter Y to continue.

10. Run the following command:
  • # reload
    Enter Y to continue.

11. Log in to switch and enter the following password:
  • admin

12. Enter the following commands:
  • > enable
    # config t

Now run the usbcopy cmd by using the following general format:
  • usbcopy fromusb USB_folder_name/Backup_file_name active

For top 1G switch
  • # usbcopy fromusb kf3/172.23.3.11_backup.txt active

For bottom 1G switch
  • # usbcopy fromusb kf3/172.23.3.12_backup.txt active

For top 10G switch
  • # usbcopy fromusb kf3/172.23.3.21_backup.txt active

For bottom 10G switch
  • # usbcopy fromusb kf3/172.23.3.22_backup.txt active

13. Enter the following command:
  • # dir

Look for the file that contains the loaded configuration information by checking the time stamp. The file with the latest time stamp is the one most recently loaded.
  • Conf1 is an active configuration.
    Conf2 is a backup configuration.
  • If the configuration is active (if conf1 has latest time stamp), run the following commands:
  • # copy active-config running-config
    # copy running-config startup-config

    Enter Y to continue.
  • If the configuration is backup (if conf2 has latest time stamp), run the following commands:
  • # copy backup-config running-config
    # copy running-config startup-config

    Enter Y to continue.

14. Reload the switch configurations by running the following command:
  • # reload
  • Enter Y to continue.
Fixed:
----------
NA
KI005412
SAN switch status and monitoring information does not get updated after fix pack installation
Fixpack I_V1.0.0.3
SAN switch status and monitoring information does not get updated after fix pack installation
This issue can occur for either of the following reasons:

  • Adding a new node on which fix pack 3 was installed.
  • Customer installs fix pack 3 themselves.

The fix pack updates the console code and introduces a bug that prevents the SAN switch status in the Hardware > Network Devices panel from getting updated monitoring information. For example, if the SAN switch temperature changes, this does not get registered in the console. Another sign of this problem is seen in an add-node scenario. When adding a new node on which fix pack 3 was installed, all of the listed network devices, including the SAN switches, are shown as a Network Switch type. When you click on any switch listed in the Hardware > Network Devices panel to view the details, the Model field is not set.
Workaround:
-----------
To fix this issue, a console code patch must be performed after the installation of fix pack 3.

1. Download the patched query_switches.groovy file from IBM Support.
2. With the patched file, replace the broken query_switches.groovy file in the following directory:

  • /usr/IBM/applmgmt/isas.async/private/expanded/ibm/isas.hardware-1.0.0.3/app/scripts/switches

3. After replacing the broken query_switches.groovy file with the patched file, verify that the patched file has the following permissions:

  • -rw-r--r--

    If the permissions on the file are not as previously shown, run the following command:

    chmod 644 /usr/IBM/applmgmt/isas.async/private/expanded/ibm/isas.hardware-1.0.0.3/app/scripts/switches/query_switches.groovy
Fixed:
----------
V1.0.0.4
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI004996
V7000 storage controller firmware upgrade hangs in upgrading state
Fixpack
I_V1.0.0.3
V7000 storage controller firmware upgrade hangs in upgrading state
While upgrading the 6.4.0.3 level firmware of the V7000 storage controller to 6.4.1.6 level, the upgrade process hangs in the upgrading state. The following example message is shown:

apply: 172.23.1.237: broke out of timed wait after 36 iterations of maximum 36. update status is <upgrading>

If the previous example message was not seen, you can verify the upgrade status by running the following command:

svcinfo lssoftwareupgradestatus -nohdr

The following example output is a result of running the previous command:

upgrading
Workaround:
-----------
Contact IBM Support before proceeding with the workaround. Only a qualified SSR or CE should be allowed to manipulate the hardware which can only be engaged through IBM Support as this minimizes the risk of loss of storage configuration or loss of data in the worst case.
1. Reseat the canister of the offline storage controller that is stuck in the upgrading state. Usually only one of the two storage canisters is stuck. The stuck canister is the one that does not respond to ping commands.

  • a. After reseating the canister, wait 15-30 minutes to give it time to rejoin the cluster in an online state. You can run the following command to verify that the canister (nodex) is now in an online state before proceeding to the next step:

    • svcinfo lsnodecanister

      Example output:

      id name  UPS_serial_number WWNN             status IO_group_id IO_group_name config_node UPS_unique_id    hardware iscsi_name                            iscsi_alias panel_name enclosure_id canister_id enclosure_serial_number
      1  node1                   5005076802006DBC online 0           io_grp0       no          5005076802006DBC 100      iqn.1986-03.com.ibm:2145.v70001.node1             01-1       1            1           78N17BH
      2  node2                   5005076802006DBD online 0           io_grp0       yes         5005076802006DBD 100      iqn.1986-03.com.ibm:2145.v70001.node2             01-2       1            2           78N17BH

    b. Verify the upgrade status by running the following command:

    • svcinfo lssoftwareupgradestatus

      Example output:

      status
      upgrading
 
    • If the upgrade status is upgrading, then let the upgrade process continue until the upgrade status is inactive, which indicates the upgrade was successful.
    • If the upgrade status is stalled, then continue to step 2.

2.  The upgrade status changes to the stalled state. Verify by running the following command as the superuser on the V7000 that is having the issue:

  • svcinfo lssoftwareupgradestatus -nohdr

    Example output:

    stalled

3. To abort the firmware upgrade process and revert the firmware back to its previous level, run the following command as the superuser on the V7000 that is having the issue:

  • svcservicetask applysoftware -abort

4. To verify that the firmware upgrade process is downgrading the firmware back to its previous level, run the following command as the superuser on the V7000 that is having the issue:

  • svcinfo lssoftwareupgradestatus -nohdr
      
  • Example output:

    downgrading

5. The storage controller becomes inaccessible and does not respond to pings for a few minutes. This is expected. Wait until it becomes accessible before continuing to the next step.

6. After logging in to the cluster, the firmware is back at the 6.4.0.3 level. Verify that the firmware upgrade process is in an inactive state by running the following command as the superuser on the V7000 that is having the issue:

  • svcinfo lssoftwareupgradestatus -nohdr
     
  • Example output:

    inactive

7.  Copy the firmware upgrade image to the /home/admin/upgrade Storwize directory by running the following example command on the management host as root:

  • scp <image_dir>/<image_name> superuser@<v7k_ip>:/home/admin/upgrade

    Example:

    scp /BCU_share/bwr3/firmware/storage/2076/image/imports/controller/IBM2076_INSTALL_6.4.1.6 superuser@172.23.1.185:/home/admin/upgrade

8. To resume the V7000 storage controller firmware upgrade, run the following command on the management host as root:

  • miupdate -resume

    Note: You can also resume the firmware upgrade by using the system console. With the fix pack installation in a failed state, press the OK button with the following option selected: The error condition has been addressed. Resume this stage.
Fixed:
----------
V1.0.0.4 includes later V7000 firmware that reduces the risk of similar issues.
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance so the of mi* or console gui actions  are no longer possible.
KI005345
V7000 storage drive firmware upgrade fails
Fixpack I_V1.0.0.3
V7000 storage drive firmware upgrade fails
Firmware upgrades do not succeed for V7000 storage drives that have a different product ID that is not part of the upgrade bundle or have a higher firmware level than the upgrade target level.
Workaround:
-----------
V1.0.0.3 Fixpacks ONLY!
After the preview stage (stage 1) completes and before you start the management stage (stage 2), install a manual fix for the V7000s.

1. Download in binary mode the Storwize.pm_Ctrl file and the Storwize.pm_UPD file from the IBM Support.
  • Important: If the files are not downloaded in binary mode, the ^M characters might be appended to each file, which can cause code compilation to fail. Check each downloaded file for the ^M characters and remove them, if found. Otherwise, you can use checksum (cksum command) to confirm that the downloaded files on the AIX machine were not altered.

2. Copy the files to the /BCU_share directory on the management host.

3. Determine the logical name assigned to the fix pack by running the following command:

  • appl_ls_cat

    The name is listed in the row that contains the version 3.0.3.1 (FP3). In the sample output, the logical name is bwr2.

    Sample output:

    NAME VERSION STATUS    DESCRIPTION
    bwr0 3.0.0.0 Committed Initial images for IBM PureData System for Operational Analytics
    bwr1 3.0.1.0 Committed Updates for IBM_PureData_System_for_Operational_Analytics
    bwr2 3.0.3.1 Applied   Updates for IBM_PureData_System_for_Operational_Analytics

4. Copy files to the appropriate locations by running the following commands as root on the management host:

  • cd /BCU_share

    cp $PL_ROOT/lib/Ctrl/Updates/Storwize.pm $PL_ROOT/lib/Ctrl/Updates/Storwize.pm_org

    cp /BCU_share/<name>/code/ISAS/Update/Foundation/Infrastructure/Storwize/Storwize.pm /BCU_share/<name>/code/ISAS/Update/Foundation/Infrastructure/Storwize/Storwize.pm_org

    cp Storwize.pm_UPD /BCU_share/<name>/code/ISAS/Update/Foundation/Infrastructure/Storwize/Storwize.pm

    cp Storwize.pm_Ctrl $PL_ROOT/lib/Ctrl/Updates/Storwize.pm

    Note: In the following commands, replace <name> with the logical name that you identified in step 3.
Fixed:
----------
V1.0.0.4
KI005222
appl_stop command successfully stops and reboots HMC, but falsely reports return code 1
General I_V1.0.0.3
appl_stop command successfully stops and reboots HMC, but falsely reports return code 1
The appl_stop command successfully stops and reboots the Hardware Management Console (HMC), but a failure return code of 1 is falsely reported. The appl_stop command calls the hmcshutdown command that succeeds to call the ForceShutdown script. The erroneous return code is caused by the script not completing before the exit of the hmcshutdown command.

The following list contains the commands that are affected by this false status reporting issue:

appl_stop -l logical_name_of_hmc
appl_stop -r resource_type_of_hmc
appl_stop -l logical_name_of_hmc -R
appl_stop -r resource_type_of_hmc -R

Example output of the appl_stop command to stop the HMC:

appl_stop -l hmc0

Checking the status for the resource 'hmc0'.
Stopping the resource, 'hmc0'. This may take long time.
The SSHD daemon started successfully on '172.23.4.241'.
The power off operation failed for the resource hmc0.
Error stopping the resource, hmc0.

The HMC status shows that it is Stopping and eventually does stop. Example shows only the HMC part of the output:

appl_ls_hw

NAME   HOSTNAME   IP         MODULE     STATUS       DESCRIPTION
hmc0              9.3.2.16              Stopping     IBM Hardware Management Console
hmc1              9.3.2.17              Online       IBM Hardware Management Console
Workaround:
-----------
1. Ignore the false return code 1 that is reported after running the appl_stop command.

2. Run the appl_start command to restart the HMC after it was stopped or rebooted.

3. Ping the HMC after completion of the restart or reboot to verify that it is online.
Fixed:
----------
V1.0.0.4
KI004775
System console fails to refresh properly and prompts for new log in after prolonged idle time, a stopped isas.server module, and an isas.console.system in a stopped or unknown state
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.0
I_V1.1.0.1
System console fails to refresh properly and prompts for new log in after prolonged idle time, a stopped isas.server module, and an isas.console.system in a stopped or unknown state
After a prolonged idle time while logged in to the system console, no contents are displayed and you might see the error message Sorry, an error occurred when you refresh the system console display or click on the UI. You might also be prompted to log in to the system console again, but are unable to log in when you provide the correct user ID and password.

The system console fails to display the correct content because the isas.server module is stopped and the isas.console.system is stopped or in an unknown state. You might also see the log in prompt due to the GUI timing out, but because the isas.server module is down, the user ID and password cannot be authenticated and the Invalid user name and password error message is displayed.
Workaround:
-----------
1. Verify that the isas.server module is stopped and the isas.console.system is stopped or in an unknown state by running the following command:
  • mistatus -v

2. Start the isas.server module by running the following command:
  • mistart
Fixed:
----------
V1.0.0.6/V1.1.0.2 Fixpacks remove the appliance console from the appliance.
KI004810
SAS RAID adapter in SSD enclosure is in a Degraded state without symptoms
General I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
SAS RAID adapter in SSD enclosure is in a Degraded state without symptoms
When the SAS RAID adapter in an SSD enclosure encounters an error that the adapter does not understand, the adapter assumes a Degraded state and disables the adapter cache as a precaution.

Note: There is no significant impact with the degraded adapter state and a disabled adapter cache in a RAID 10 array. However, customer perception is impacted when they see a Degraded array state without an explanation or a hint about the nature of the problem.


Diagnostic prerequisite:

Verify the names of your SSD RAID adapters on the system. On the administration and standby administration hosts, the adapters are sissas1 and sissas2. On the data and standby hosts, the adapters are sissas2 and sissas3.

Run the following command on the administration host and a data host:

lsdev|grep "RAID SAS"

The following output is a result of running the previous command on the administration host:

(130) root @ paradise02: 7.1.0.0: /
$ lsdev|grep "RAID SAS"
sissas1        Available 06-00       PCIe2 3.1GB Cache RAID SAS Enclosure 6Gb x8
sissas2        Available 07-00       PCIe2 3.1GB Cache RAID SAS Enclosure 6Gb x8

The following output is a result of running the previous command on a data host:

(130) root @ paradise05: 7.1.0.0: /
$ lsdev|grep "RAID SAS"
sissas2        Available 09-00       PCIe2 3.1GB Cache RAID SAS Enclosure 6Gb x8
sissas3        Available 0A-00       PCIe2 3.1GB Cache RAID SAS Enclosure 6Gb x8
Detection of the problem:

There is no indication in the errpt that there is a problem. However, you can obtain an indication that there is a problem by either of the following two methods:

1. Run the following command to check for a Degraded state of the RAID 10 Array on each of the SSD RAID adapters:
  • sissasraidmgr -L -l sissas2 -j 3

The following output is a result of running the previous command:

0) root @ paradise05: 7.1.0.0: /var/adm/ras
$ sissasraidmgr -L -l sissas2
------------------------------------------------------------------------
Name      Resource  State       Description              Size
------------------------------------------------------------------------
sissas2   FEFFFFFF  Primary     PCIe2 3.1GB Cache RAID SAS Enclosure 6Gb x8
sissas3   FEFFFFFF  HA Linked   Remote adapter SN  002AN00C

hdisk2    FC0000FF  Degraded    RAID 10 Array (O/O)     1163GB
 pdisk2   000401FF  Active      SSD Array Member       387.9GB
 pdisk1   000400FF  Active      SSD Array Member       387.9GB
 pdisk5   000404FF  Active      SSD Array Member       387.9GB
 pdisk3   000402FF  Active      SSD Array Member       387.9GB
 pdisk4   000403FF  Active      SSD Array Member       387.9GB
 pdisk0   000007FF  Active      SSD Array Member       387.9GB
 
2. Run the diag command on the sissasX adapters, which returns an output and logs an entry in errpt:
  • diag -d sissas2 -v

The following output is a result of running the previous command:

A PROBLEM WAS DETECTED ON Thu Jan 30 15:18:32 CST 2014  801014

The Service Request Number(s)/Probable Cause(s)
(causes are listed in descending order of probability):

  2D24-8150: Controller failure.
           Error log information:
                 Date: Thu Jan 30 15:18:31 CST 2014
                 Sequence number: 364
                 Label: SAS_ERR2  
  sissas2        FRU: 00E7703            
                 PCIe2 3.1GB Cache RAID SAS Enclosure 6Gb x8
                 U78AB.001.WZSGRJ9-P1-C1-T1-L1-T1


errpt:

---------------------------------------------------------------------------
LABEL:          SAS_ERR2
IDENTIFIER:     CCC89167

Date/Time:       Thu Jan 30 15:20:28 CST 2014
Sequence Number: 366
Machine Id:      00F72AA74C00
Node Id:         paradise05
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   sissas3
Resource Class:  adapter
Resource Type:   14105303
Location:        U78AB.001.WZSGRJ9-P1-C8-T1-L1-T1

VPD:
      PCIe2 3.1GB Cache RAID SAS Enclosure 6Gb x8 :
        Part Number.................00E7705
        FRU Number..................00E7703
        Serial Number...............YP11BG2AN00C
        Manufacture ID..............01BG
        EC Level....................1
        ROM Level.(alterable).......015000ab
        Customer Card ID Number.....57C3
        Product Specific.(Z1).......5
        Product Specific.(Z2).......2D24
        Feature Code/Marketing ID...EDR1-001
        Machine/Cabinet Serial No...G2AL001
        FRU Label...................P1-C2-T3

Description
ADAPTER ERROR

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data

ADDITIONAL HEX DATA
0001 0800 1910 00F0 0444 8200 0101 0000 0150 00AB 0000 00FF 57C3 8150 0000 0001
FEFF FFFF FFFF FFFF 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 88F9
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
E210 00A0 FF00 0000 07A0 0001 1A1C EC5F 0000 0000 0000 0FE8 0444 8200 0150 00AB
FFFF FFFF 1521 2204 0000 0000 0000 0000 0000 0000 0000 0000 FEFF FFFF FFFF FFFF
0000 0000 0000 88F9 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000
Workaround:
-----------
Contact IBM Support before proceeding with any workaround.
 
Fixed:
----------
V1.0.0.4 includes firmware updates that address many issues that lead to this scenario.
KI005061
IBM Systems Director storage control update fails during inventory collection
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
IBM Systems Director storage control update fails during inventory collection
 
The fix pack update fails during the management stage while updating the IBM Systems Director storage control component.

There is a known issue with IBM Systems Director where the ssh server drops the ssh connection during an update due to its high workload, many ciphers, or slow responses from the target, any of which can cause a timeout and result in collect inventory failure.

Investigation of the PL trace file finds that the error is due to a connection issue. The following message is found in the PL trace file:

Inventory collection failed for system "sysNode". Verify the connection to the system and collect inventory.

The following example entries were found in the PL trace file:

[22 May 2014 12:48:21,886] <11403344 UPDT APPI TRACE stgkf301>
[22 May 2014 12:48:21,887] <11403344 UPDT APPI TRACE stgkf301> ATKUPD764I Update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" was installed on system "sysNode" successfully.
[22 May 2014 12:48:21,887] <11403344 UPDT APPI TRACE stgkf301>
[22 May 2014 12:48:21,887] <11403344 UPDT APPI TRACE stgkf301> ATKUPD795I You must manually restart the IBM Systems Director management server after this install completes for the updates to take effect.
[22 May 2014 12:48:21,887] <11403344 UPDT APPI TRACE stgkf301>
[22 May 2014 12:48:21,887] <11403344 UPDT APPI TRACE stgkf301> ATKUPD739I Collecting inventory on system "sysNode".
[22 May 2014 12:48:21,887] <11403344 UPDT APPI TRACE stgkf301>
[22 May 2014 12:48:21,888] <11403344 UPDT APPI TRACE stgkf301> ATKUPD711I Still collecting inventory for system "sysNode".
[22 May 2014 12:48:21,888] <11403344 UPDT APPI TRACE stgkf301>
[22 May 2014 12:48:21,888] <11403344 UPDT APPI TRACE stgkf301> ATKUPD706E Inventory collection failed for system "sysNode". Verify the connection to the system and collect inventory.
[22 May 2014 12:48:21,888] <11403344 UPDT APPI TRACE stgkf301>
[22 May 2014 12:48:21,888] <11403344 UPDT APPI TRACE stgkf301> ATKUPD572I Running compliance on system "sysNode".
[22 May 2014 12:48:21,889] <11403344 UPDT APPI TRACE stgkf301>
[22 May 2014 12:48:21,889] <11403344 UPDT APPI TRACE stgkf301> ATKUPD734E An error was encountered during the "Install Updates" task. Search above for previous related errors, fix each error, and then retry the operation. 
Workaround:
-----------
If the cause of the update failure was a connection issue due to high workloads or delays in the connection, the problem can be resolved by resuming the fix pack installation. Run the following command on the management host as root:

  • miupdate -resume

If the previous command fails to resume the fix pack installation, contact IBM Support.
Fixed:
----------
V1.0.0.5/V1.1.0.1 where IBM System Director is disabled.
KI005116
IBM Systems Director managed end point goes offline after update and results in Storage Control update failure
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
IBM Systems Director managed end point goes offline after update and results in Storage Control update failure
After updating the IBM Systems Director (ISD), the processes might get locked and hit a resource crunch. The ISD local CAS agent fails to start and the server communication state is stuck at 3 (expected state value: 2) leading to a Storage Control (SC) update failure.

To confirm that this issue has occurred, check for the corresponding messages in the /var/aixappl/pflayer/log/pl_update.trace file by running the following command on the management host as root:

tail -f /var/aixappl/pflayer/log/pl_update.trace

The following example output is a result of running the previous command:

[03 Jun 2014 23:45:54,341] <13631536 UPDT APPI TRACE  hostname>
[03 Jun 2014 23:45:54,341] <13631536 UPDT APPI TRACE  hostname>  ATKUPD275E The managed system "sysNode" is either offline or locked. Ensure that the system is available and attempt to perform the operation again.

[03 Jun 2014 23:45:54,341] <13631536 UPDT APPI TRACE  hostname>
[03 Jun 2014 23:45:54,341] <13631536 UPDT APPI TRACE  hostname>  ATKUPD287E The install needed updates task has completed with errors. Read the above messages for details on the error.
[03 Jun 2014 23:45:54,341] <13631536 UPDT APPI TRACE  hostname>
[03 Jun 2014 23:45:54,342] <13631536 UPDT APPI TRACE  hostname>  ATKUSC307E Command installneeded completed with errors. For more information, see the job log for job InstallNeededTask.
[03 Jun 2014 23:45:54,342] <13631536 UPDT APPI TRACE  hostname>
[03 Jun 2014 23:45:54,342] <13631536 UPDT APPI TRACE  hostname> command /opt/ibm/director/bin/smcli installneeded -v -F /BCU_share/bwr3/software/DirectorServer/imports/SCupdates4241 returned status =  32
Workaround:
-----------
1. Reboot the management host by running the following command on the management host as root:

  • reboot

2. After the management host comes up, verify that the ISD is Active, that the ISD version is 6.3.3.1, and that the local CAS agent is running by running the following commands on the management host as root:

  • smstatus

    smcli lsver

    /opt/ibm/director/agent/runtime/agent/bin/endpoint.sh status

    If the status of the local CAS agent is Stopped, contact the support team.

3. Collect server inventory by running the following command on the management host as root:

  • smcli collectinv -p “All Inventory” -i <IP_mgmt_server>

    where <IP_mgmt_server> represents the IP address of the management server.

4. Install the SC update by running the following command on the management host as root:

  • smcli installneeded -v -F <path_of_SC_update_package>

    The following example output is a result of running the previous command:

    smcli installneeded -v -F /BCU_share/bwr3/software/DirectorServer/imports/SCupdates4241

    ATKUPD489I Collecting inventory for one or more systems.
    ATKUSC206I Generating SDDs for path: "/BCU_share/bwr3/software/DirectorServer/imports/SCupdates4241".
    ATKUPD293I Update "com.ibm.director.storage.storagecontrol.member.AIX_4.2.4.fp1-build-00009" was successfully imported to the library.
    ATKUPD293I Update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" was successfully imported to the library.
    ATKUPD573I Running compliance for all new updates that were found.
    ATKUPD286I The import updates task has completed successfully.
    ATKUPD725I The update install task has started.
    ATKUPD487I The download task has finished successfully.
    ATKUPD629I Installation staging will be performed to 1 systems.
    ATKUPD632I The Installation Staging task is starting to process system "sysNode".
    ATKUPD633I The Installation Staging task has finished processing system "sysNode".
    ATKUPD630I The update installation staging has completed.
    ATKUPD760I Start processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD770I Still processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD770I Still processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD770I Still processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD770I Still processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD770I Still processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD770I Still processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD770I Still processing update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" and system "sysNode".
    ATKUPD764I Update "com.ibm.director.storage.storagecontrol.mgr.AIX_4.2.4.fp1-build-00009" was installed on system "sysNode" successfully.
    ATKUPD795I You must manually restart the IBM Systems Director management server after this install completes for the updates to take effect.
    ATKUPD739I Collecting inventory on system "sysNode".
    ATKUPD572I Running compliance on system "sysNode".
    ATKUPD727I The update install task has finished successfully.
    ATKUPD288I The install needed updates task has completed successfully.

5. After the completion of the SC update installation, stop and restart the ISD by running the following commands on the management host as root:

  • smstop

    smstart

    Verify that the ISD server is Active before proceeding to the next step by running the following command on the management host as root:

    smstatus -r

6. Verify that the ISD version was updated to 6.3.3.1 by running the following command on the management host as root:

  • smcli lsver

7. Verify that the SC version was updated to 4.2.4.1 by running the following command on the management host as root:

  • cat /opt/ibm/director/StorageControl/SCVersion.properties | grep StorageControlVersion

    The following example output is a result of running the previous command:

    StorageControlVersionWithBuildPadded=4.2.4.fp1-build-00010-20140421-iFix
    StorageControlVersion=4.2.4.1

8. Resume the fix pack installation.
Fixed:
----------
V1.0.0.5/V1.1.0.1 where IBM System Director is disabled.
KI005982
The "+" icon in system console Fix Packs panel is not displayed on systems with a language other than English system locale setting
General
I_V1.0.0.3
The "+" icon in system console Fix Packs panel is not displayed on systems with a language other than English system locale setting
If your system locale is set to a language other than English, you will find that the "+" icon is missing when you attempt to add a new fix pack file to the management host on your 1.0.0.3 (Fix Pack 3) system by using the Fix Packs panel of the system console. Not being able to add and register the new fix pack prevents you from installing the new fix pack by using the system console.
Workaround:
-----------
To restore the missing "+" icon in the Fix Packs panel of the system console so that you can install the fix pack by using the system console, complete the following steps:

1. Set the system locale to English.
2. Restart the system console.
3. Install the fix pack by using the system console.
4. Commit the fix pack installation.
5. Set the system locale to the desired language.
Fixed:
----------
V1.0.0.4 Console is updated with a fix for the issue.
V1.0.0.6/V1.1.0.2 Console is removed.
KI006755
Single sign-on from the system console to the Database Performance Monitoring (DPM) Console and/or the Warehouse Admin Console fails with Error 500
General I_V1.0.0.4
Single sign-on from the system console to the Database Performance Monitoring (DPM) Console and/or the Warehouse Admin Console fails with Error 500
After installing PDOA fix pack 4 you may find that single sign-on from the console to the DPM and/or Warehouse Admin console(s) fails with an Error 500 message.
Workaround:
-----------
You must download the Unrestricted JDK JCE policy files and patch them into /usr/java6_64/jre/lib/security directory on the management host. The exact download locations and steps are listed below:

1. Download the Unrestricted JDK JCE policy files from https://www-01.ibm.com/marketing/iwm/iwm/web/preLogin.do?source=jcesdk . Select the option "Java 5.0 SR16, Java 6 SR13, Java 6 SR5 (J9 VM2.6), Java 7 SR4, Java 8 GA, and all later releases " for download.

2. Unzip the unrestrictedpolicyfiles.zip which will give you 2 JAR files: local_policy.jar and US_export_policy.jar

3. On the management node, as root, create a backup dir /usr/java6_64/jre/lib/security/backup and move the files local_policy.jar and US_export_policy.jar to this directory.

4. Place the new JARs of the same name in the management node's /usr/java6_64/jre/lib/security/ directory.

5. Run this command on the management node as root:
miresolve -restart

These 5 steps will fix the problem with single sign-on and restart the console. You can now login normally to the DPM and Warehouse Admin consoles.
Fixed:
----------
V1.0.0.5/V1.1.0.1 Updates to the console remove its dependency on the java installed at the system level.
V1.0.0.6/V1.1.0.2 The console and Warehouse Tools are removed.  With the conosle removed the SSO feature to DPM is no longer supported.
KI004026
A limited number of SSH sessions can be connected to an Ethernet network switch at the same time
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.0
I_V1.1.0.1
A limited number of SSH sessions can be connected to an Ethernet network switch at the same time
The maximum number of simultaneous SSH sessions that can be connected to an Ethernet network switch is four.

The following message is displayed after attempting to create a fifth SSH session from the system console:
java.net.ConnectException: A remote host refused an attempted connect operation.

The following example message is displayed after attempting to create a fifth SSH session from a terminal application on a remote computer:
Server unexpectedly closed network connection
Workaround:
-----------
Do not attempt to connect more than the maximum number of four simultaneous SSH sessions to an Ethernet network switch.

Close one or more of the four existing SSH sessions to be able to connect one or more new SSH sessions to a maximum of four.
Fixed:
----------
N/A. This is a limitation imposed by the Network Switch firmware. V1.0.0.6/V1.1.0.2 significantly reduce the amount of ssh sessions to the switches with the removal of the console.
KI004712
Storage web console links break after logging in as a superuser
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.0
I_V1.1.0.1
Storage web console links break after logging in as a superuser
The port number of any link that is selected from a storage web console page is lost and results in a 404 File not found error. This issue occurs when you access the storage web console from the system console by clicking System > Service Level Access, selecting one of the links in the IBM Storwize V7000 section, and after logging in as the superuser for the first time.
Workaround:
-----------
When this issue occurs, do the following steps:

1. Click the Back button of your browser to return to the IBM Storwize V7000 login page. Save the port number that you find in the URL in the address bar of your browser.

2. Click the Forward button of your browser to return to the error page. Insert the missing port number in the URL that you find in the address bar of your browser. Load this web page by pressing the Enter key or clicking the browser Load button.
Fixed:
----------
Apply workaround if encountered.
V1.0.0.6/V1.1.0.2 The console has been removed which removes the availability of this Service Level Access feature.
KI007646
Configuration of some workload management variables is not possible with Firefox browser and low screen resolution
General
I_V1.0
I_V1.1
Configuration of some workload management variables is not possible with Firefox browser and low screen resolution
When the screen resolution is lower than the 1280x1024 minimum browser resolution requirement for the Firefox browser, a scroll bar is not present to access the bottom portion of the database performance monitor system console panel. This issue prevents being able to configure some workload management variables.
Workaround:
-----------
To resolve this issue, you can take one of the following actions:
  • Use a system that can support the 1280x1024 screen resolution requirement.
  • Use a different browser, such as Chrome or Internet Explorer.
Fixed:
----------
N/A.
KI006085
Accesssys command might fail and cause a locked state after changing the network switch password
 
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.1.0.0
Accesssys command might fail and cause a locked state after changing the network switch password
 
fter changing the password for a network switch, the process involves discovering the network switch and then running the accesssys command that requests secure access to the switch by using the new password. On occasion, the accesssys command fails to get access to the switch and the switch goes into a locked state.

To confirm that this issue occurred, look for return code 66 in the pflayer log file. The following pflayer log file example shows the accesssys command failed:

[09 Jan 2014 12:07:20,367] <24248418 CTRL  DEBUG  host01> Executing command->/opt/ibm/director/bin/smcli discover -i 172.23.1.251 -t "Switch"
[09 Jan 2014 12:07:31,595] <24248418 CTRL  DEBUG  host01> Ret Code-> 0
[09 Jan 2014 12:07:31,596] <24248418 CTRL  DEBUG  host01>  Command output-> Discovery completion percentage 50%
[09 Jan 2014 12:07:31,596] <24248418 CTRL  DEBUG  host01>  Discovery completion percentage  100%
[09 Jan 2014 12:07:31,596] <24248418 CTRL  DEBUG  host01>  Discovery completed:
[09 Jan 2014 12:07:31,596] <24248418 CTRL  DEBUG  host01>  100%
[09 Jan 2014 12:07:31,597] <24248418 CTRL  DEBUG  host01> Waiting for 20 sec between discovery and accesssys
[09 Jan 2014 12:07:51,598] <24248418 CTRL  DEBUG  host01> Executing command-> /opt/ibm/director/bin/smcli lssys -i 172.23.1.251 -t "Switch"
[09 Jan 2014 12:07:52,011] <24248418 CTRL  DEBUG  host01> Ret Code-> 0
[09 Jan 2014 12:07:52,011] <24248418 CTRL  DEBUG  host01>  Command output-> fcm_switch2
[09 Jan 2014 12:07:52,328] <24248418 CTRL  DEBUG  host01> Executing command ->/opt/ibm/director/bin/smcli accesssys -u admin -p ***** -i 172.23.1.251
[09 Jan 2014 12:07:52,606] <24248418 CTRL  DEBUG  host01> Ret Code-> 66
[09 Jan 2014 12:07:52,606] <24248418 CTRL  DEBUG  host01>  Command output-> DNZCLI0727I : Waiting for request access to complete on... fcm_switch2
[09 Jan 2014 12:07:52,606] <24248418 CTRL  DEBUG  host01>  Result Value: DNZCLI0736I : The system is not available. : fcm_switch2
[09 Jan 2014 12:07:52,649] <24248418 CTRL  DEBUG  host01> For net switch accessys waiting 20 sec more in case of failure
[09 Jan 2014 12:08:12,650] <24248418 CTRL  DEBUG  host01> For net switch executing accessys again in case of failure
[09 Jan 2014 12:08:12,970] <24248418 CTRL  DEBUG  host01> Executing command ->/opt/ibm/director/bin/smcli accesssys -u admin -p ***** -i 172.23.1.251
[09 Jan 2014 12:08:13,209] <24248418 CTRL  DEBUG  host01> Ret Code-> 66
[09 Jan 2014 12:08:13,209] <24248418 CTRL  DEBUG  host01>  Command output-> DNZCLI0727I : Waiting for request access to complete on... fcm_switch2
[09 Jan 2014 12:08:13,209] <24248418 CTRL  DEBUG  host01>  Result Value: DNZCLI0736I : The system is not available. : fcm_switch2
[09 Jan 2014 12:08:13,219] <24248418 CTRL  ERROR  host01> Discovery failed for net3.
Workaround:
-----------
To unlock and access the network switch, run the following command:

smcli accesssys -i <switch_ip> -u admin -p <password>
Fixed:
----------
V1.0.0.5/V1.1.0.1 IBM System Director is disabled as part of these fixpacks.
KI006330
Paths for fix pack file location on the system are restricted for security reasons
Fixpack
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Before you can install a fix pack, the fix pack files must be accessible on the management host. For security reasons, the valid management host paths from which the fix pack files can be accessed during the fix pack installation procedure have been restricted on your 1.0.0.3 (Fix Pack 3) system. You cannot install the fix pack if the fix pack files are not located in one of the valid management host paths. Workaround:
-----------
You must locate the fix pack files on the management host in an absolute path that starts with only the following file systems:
  • /backups
  • /BCU_share
  • /opmfs
  • /opt
  • /optimdatatools
  • /tmp
  • /usr
  • /var

The following examples are valid absolute paths on the management host:
  • /BCU_share/
  • /tmp/<dir_name1>/
  • /usr/<dir_name1>/<dir_name2>/
Fixed:
----------
V1.0.0.6/V1.1.0.2 The fixpack mechanism has changed and this statement is no longer applicable.
KI006280
SAN switch password change fails
 
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.1.0.0
Due to SAN switch listing limitations in IBM Systems Director (ISD), changing the SAN switch password fails with return code 20 displayed in the PL log file.

Example PL log file output:

[14 Apr 2015 06:49:34,225] <23003302 PASS DEBUG stgkf101> Executing command->/opt/ibm/director/bin/smcli lsver
[14 Apr 2015 06:49:34,575] <23003302 PASS DEBUG stgkf101> ISD version->6.3.5
[14 Apr 2015 06:49:34,577] <23003302 PASS DEBUG stgkf101> 6.3.5.0 is equal to 6.3.5.0.
[14 Apr 2015 06:49:34,578] <23003302 PASS DEBUG stgkf101> Executing command expect -f /opt/ibm/aixappl/pflayer/lib/create_snmp_profile.expect profile_172.23.1.31_2 172.23.1.31 Switch
[14 Apr 2015 06:49:35,467] <23003302 PASS DEBUG stgkf101> From prcoess 33423556 :STDOUT
[14 Apr 2015 06:49:35,467] <23003302 PASS DEBUG stgkf101> spawn smcli mkbasicdiscprofile -name profile_172.23.1.31_2 -i 172.23.1.31 -res Switch -snmpversion 1
[14 Apr 2015 06:49:35,467] <23003302 PASS DEBUG stgkf101>
[14 Apr 2015 06:49:35,467] <23003302 PASS DEBUG stgkf101> Enter community strings as comma separated value
[14 Apr 2015 06:49:35,467] <23003302 PASS DEBUG stgkf101> profile_172.23.1.31_2
[14 Apr 2015 06:49:35,469] <23003302 PASS DEBUG stgkf101> From process 33423556: STDERR:
[14 Apr 2015 06:49:35,469] <23003302 PASS DEBUG stgkf101>
[14 Apr 2015 06:49:35,470] <23003302 PASS DEBUG stgkf101> Exit code: 0
[14 Apr 2015 06:49:35,532] <23003302 PASS DEBUG stgkf101> Executing command->/opt/ibm/director/bin/smcli discover -p profile_172.23.1.31_2
[14 Apr 2015 06:49:41,820] <23003302 PASS DEBUG stgkf101> Ret Code-> 0
[14 Apr 2015 06:49:41,820] <23003302 PASS DEBUG stgkf101> Command output-> profile_172.23.1.31_2 Profile Based Discovery completion percentage 100%
[14 Apr 2015 06:49:41,820] <23003302 PASS DEBUG stgkf101> profile_172.23.1.31_2 Profile Based Discovery completed:
[14 Apr 2015 06:49:41,821] <23003302 PASS DEBUG stgkf101> 100%
[14 Apr 2015 06:49:41,821] <23003302 PASS DEBUG stgkf101>
[14 Apr 2015 06:49:41,824] <23003302 PASS DEBUG stgkf101> Waiting for 20 sec between discovery and accesssys
[14 Apr 2015 06:50:01,824] <23003302 PASS DEBUG stgkf101> Executing command-> /opt/ibm/director/bin/smcli lssys -i 172.23.1.31 -t "Switch"
[14 Apr 2015 06:50:02,174] <23003302 PASS DEBUG stgkf101> Ret Code-> 20
[14 Apr 2015 06:50:02,175] <23003302 PASS DEBUG stgkf101> Command output->
[14 Apr 2015 06:50:02,177] <23003302 PASS ERROR stgkf101> Discovery failed for 172.23.1.31
[14 Apr 2015 06:50:02,178] <23003302 PASS DEBUG stgkf101> Notify DirectorServer failed for node 172.23.1.31.
[14 Apr 2015 06:50:02,180] <23003302 PASS ERROR stgkf101> Password change notification to Systems Director failed for nodes 172.23.1.31.
[14 Apr 2015 06:50:02,182] <23003302 PASS INFO stgkf101> The script was executed with following status:
[14 Apr 2015 06:50:02,183] <23003302 PASS INFO stgkf101> ----------------------------------------------------------
[14 Apr 2015 06:50:02,184] <23003302 PASS INFO stgkf101> SCHEMA::CHPW::LOGICAL_NAME::STATUS(PASS/FAIL)::DESCRIPTION
[14 Apr 2015 06:50:02,185] <23003302 PASS ERROR stgkf101> CHPW::san0::FAIL::notify_director_server Password change failed for resource(s) 172.23.1.31.

Example command line output:

/opt/ibm/aixappl/pflayer/bin/appl_conf chpw -l san0 -u admin -p <new_password>
spawn time /opt/ibm/aixappl/pflayer/bin/appl_conf chpw -l san0 -u admin -p
Enter the new password
Changing the password for user 'admin' on resource 'san0'.
The password change was successful for the user 'admin' on resource 'san0'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Discovery failed for 172.23.1.31
Password change notification to Systems Director failed for nodes 172.23.1.31.
The script was executed with following status:
----------------------------------------------------------
SCHEMA::CHPW::LOGICAL_NAME::STATUS(PASS/FAIL)::DESCRIPTION
CHPW::san0::FAIL::notify_director_server Password change failed for resource(s) 172.23.1.31.
----------------------------------------------------------
Workaround:
-----------
To resolve the issue, complete the following steps:

1. Rediscover the SAN switch by running the following command:

  • /opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_san discover -l <resource_logical_name>

2. Retry changing the SAN switch password by running the following command:

  • appl_conf chpw -l <resource_logical_name> -u <user_name> -p <new_password>
Fixed:
----------
V1.0.0.5/V1.1.0.1 IBM System Director is removed as part of these fixpacks.
KI006280
SAN switches are not listed by lssys command after password change
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.1.0.0
SAN switches are not listed by lssys command after password change
After the successful change of the SAN switch passwords, the lssys command does not return a list of the SAN switches.

Example command line output to change the password:

$ time appl_conf chpw -r san -u admin -p -o
Enter the new password
Enter the old  password
Changing the password for user 'admin' on resource 'san0'.
The password change was successful for the user 'admin' on resource 'san0'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Password change notification to Systems Director is successful.
Changing the password for user 'admin' on resource 'san1'.
The password change was successful for the user 'admin' on resource 'san1'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Password change notification to Systems Director is successful.
Changing the password for user 'admin' on resource 'san2'.
The password change was successful for the user 'admin' on resource 'san2'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Password change notification to Systems Director is successful.
Changing the password for user 'admin' on resource 'san3'.
The password change was successful for the user 'admin' on resource 'san3'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Password change notification to Systems Director is successful.
The script was executed with following status:
----------------------------------------------------------
SCHEMA::CHPW::LOGICAL_NAME::STATUS(PASS/FAIL)::DESCRIPTION
CHPW::san0::PASS::The password change for 172.23.3.31 was successful.
CHPW::san1::PASS::The password change for 172.23.3.32 was successful.
CHPW::san2::PASS::The password change for 172.23.3.33 was successful.
CHPW::san3::PASS::The password change for 172.23.3.34 was successful.
----------------------------------------------------------

Example lssys command output:

$ smcli lssys -l -i 172.23.3.32
DNZCLI0241E : (Run-time error) The system with IP address or host name 172.23.3.32 was not found.
Use the smcli lssys -A IPv4Address,HostName command to list IP addresses and host names for all systems.
Workaround:
-----------
To resolve the issue, complete the following steps:

1. Rediscover the SAN switch by running the following command:

  • /opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_san discover -l <resource_logical_name>

2. Retry listing the SAN switch by running the following command:

  • smcli lssys -l -i <san_ip_address>
Fixed:
----------
V1.0.0.5/V1.1.0.1 IBM System Director is removed as part of these fixpacks.
KI006280
Ethernet network switch goes into partially locked state during password change
 
General
I_V1.0.0.0
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.1.0.0
Ethernet network switch goes into partially locked state during password change
After changing the password for the Ethernet network switches, the switch status is listed as PartiallyLocked.

Example lssys command output before the password change:

$ smcli lssys -A "CommunicationState,AccessState"
 adminnode_1: Unlocked, 2
 DATA-STANDBY-SN101F54R: Unlocked, 2
 datanode_4: Unlocked, 2
 datanode_5: Unlocked, 2
 ETHERNET0-IBM*8205-E6D*101F52R: Unlocked, 2
 ETHERNET0-IBM*8205-E6D*101F53R: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F0CR: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F0DR: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F54R: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F55R: Unlocked, 2
 fcm_switch1: Unlocked, 2
 fcm_switch2: Unlocked, 2
 mgt_switch1: Unlocked, 2
 mgt_switch2: Unlocked, 2

Example output during the password change:

/opt/ibm/aixappl/pflayer/bin/appl_conf chpw -r net -u admin -p <new_password>
spawn time /opt/ibm/aixappl/pflayer/bin/appl_conf chpw -r net -u admin -p
Enter the new password
Changing the password for user 'admin' on resource 'net0'.
The password change was successful for the user 'admin' on resource 'net0'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Discovery failed for 172.23.1.11
Password change notification to Systems Director failed for nodes 172.23.1.11.
Changing the password for user 'admin' on resource 'net1'.
The password change was successful for the user 'admin' on resource 'net1'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Discovery failed for 172.23.1.12
Password change notification to Systems Director failed for nodes 172.23.1.12.
Changing the password for user 'admin' on resource 'net2'.
The password change was successful for the user 'admin' on resource 'net2'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.
Discovery failed for 172.23.1.21
Password change notification to Systems Director failed for nodes 172.23.1.21.
Changing the password for user 'admin' on resource 'net3'.
The password change was successful for the user 'admin' on resource 'net3'.
Updating the password in the database for user admin.
The password was successfully updated in the database.
Notifying Systems Director of a password change for user admin.

Example lssys command output after the password change:

$  smcli lssys -A "CommunicationState,AccessState"
 adminnode_1: Unlocked, 2
 DATA-STANDBY-SN101F54R: Unlocked, 2
 datanode_4: Unlocked, 2
 datanode_5: Unlocked, 2
 ETHERNET0-IBM*8205-E6D*101F52R: Unlocked, 2
 ETHERNET0-IBM*8205-E6D*101F53R: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F0CR: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F0DR: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F54R: Unlocked, 2
 ETHERNET0-IBM*8231-E2D*101F55R: Unlocked, 2
 fcm_switch1: PartiallyLocked, 2
 fcm_switch2: PartiallyLocked, 2
 mgt_switch1: PartiallyLocked, 2
 mgt_switch2: PartiallyLocked, 2
Workaround:
-----------
To resolve the issue, complete the following steps:

1. Remove the Ethernet network switch from the system by running the following command:

  • smcli rmsys -i <net_switch_ip_address>

2. Rediscover the Ethernet network switch on the system by running the following command:

  • /opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_net discover -l <resource_logical_name>

3. Retry changing the Ethernet network switch password by running the following command:

  • appl_conf chpw -l <resource_logical_name> -u <user_name> -p <new_password>
Fixed:
----------
V1.0.0.5/V1.1.0.1 IBM System Director is removed as part of these fixpacks.
KI006204
Fix pack installation phases show only highest version of Ethernet network switch firmware
 
General I_V1.0.0.4
Fix pack installation phases show only highest version of Ethernet network switch firmware
 
During the preview, prepare, and apply phases of fix pack installation, both the log output and the appl_ls_cat command output show only the highest version of the firmware for the G8264 Ethernet network switch (7.9.12.0), even though the G8052 Ethernet network switch has a different firmware version (7.9.11.0).

Example log output after the preview phase:

Note: This example output can also be viewed in MI preview.

PRODUCT_UPDATES::SSDDrawerFW_PROD::SSDDrawerFW::Management,Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::67G5::67E5::SSDDrawerFW Update
PRODUCT_UPDATES::PFW_PROD::PFW::Management,Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::AL740_156::AL770_048::PFW Update
PRODUCT_UPDATES::PFW_PROD::PFW::Management,Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::AL770_098::AL770_048::PFW Update
PRODUCT_UPDATES::FCAdapterFW_PROD::FCAdapterFW::Management,Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::202307::0315050680,0315050680,0315050680,0315050680,::FCAdapterFW 5273 Update
PRODUCT_UPDATES::FCAdapterFW_PROD::FCAdapterFW::Management,Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::0320051000::0315050680,0315050680,0315050680,0315050680,::FCAdapterFW EN0Y Update
PRODUCT_UPDATES::NetAdapterFW_PROD::NetAdapterFW::Management,Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::0400401800007::0310303970033::NetAdapterFW 1648 Update
PRODUCT_UPDATES::GPFS_PROD::GPFS::Management,Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::4.1.0.6::3.4.0.14::GPFS
PRODUCT_UPDATES::CAS_PROD::CAS::Mgmt_Standby,Admin,Admin_Standby,User,Data,Standby::6.3.3.0::6.3.0.3::CAS Update
PRODUCT_UPDATES::StorageFW_PROD::StorageFW::Infrastructure::7.3.0.9::6.4.1.6::StorageFW Update
PRODUCT_UPDATES::SANFW_PROD::SANFW::Infrastructure::v7.2.1d::v7.0.2c::SANFW Update
PRODUCT_UPDATES::NetFW_PROD::NetFW::Infrastructure::7.9.12.0::7.7.3.0::NetFW Update
END::PRODUCT_UPDATES

Example appl_ls_cat command output after the preview phase:

NAME                     VERSION                       STATUS                   OPERATION      DESCRIPTION
netfw1                   7.9.12.0                      Previewed                manage         NetFW Update

However, after the commit phase, both firmware versions are displayed.

Example appl_ls_cat command output after the commit phase:

NAME                     VERSION                       STATUS                   OPERATION      DESCRIPTION
netfw2                   7.9.12.0                      Committed                manage         NetFW Update
netfw3                   7.9.11.0                      Committed                manage         NetFW Update
Workaround:
-----------
A resolution to this restriction is not available at this time.
Fixed:
----------
V1.0.0.5
KI007486
Flash storage upgrade gets stalled during apply phase
Fixpack
I_V1.1.0.1
I_V1.1.0.2
Flash storage upgrade gets stalled during apply phase
Flash storage upgrade gets stalled during apply phase which results in the apply failure

Sample output of the failure from the PL log

[06 Mar 2017 07:45:40,711] <4653948 CTRL  DEBUG  ibis01> apply: 172.23.1.182: waiting for upgrade to complete, iteration <1>: update_status=<upgrading 2>
[06 Mar 2017 07:50:41,284] <4653948 CTRL  DEBUG  ibis01> get_update_status: status is <upgrading 2
[06 Mar 2017 07:50:41,284] <4653948 CTRL  DEBUG  ibis01> >
[06 Mar 2017 07:50:41,286] <4653948 CTRL  DEBUG  ibis01> apply: 172.23.1.182: waiting for upgrade to complete, iteration <2>: update_status=<upgrading 2>
[06 Mar 2017 07:55:42,812] <4653948 CTRL  DEBUG  ibis01> get_update_status: status is <stalled 23
[06 Mar 2017 07:55:42,812] <4653948 CTRL  DEBUG  ibis01> >
[06 Mar 2017 07:55:42,813] <4653948 CTRL  DEBUG  ibis01> apply: 172.23.1.182: waiting for upgrade to complete, iteration <3>: update_status=<stalled 23>
[06 Mar 2017 08:00:42,814] <4653948 CTRL  DEBUG  ibis01> apply: 172.23.1.182: broke out of timed wait after 4 iterations of maximum 48. update status is <stalled 23>
[06 Mar 2017 08:00:42,815] <4653948 CTRL  DEBUG  ibis01> Extracted msg from NLS: apply: 172.23.1.182 Error: The update status of the end point is <stalled 23>.
[06 Mar 2017 08:00:42,815] <4653948 CTRL  DEBUG  ibis01> apply: 172.23.1.182: error: error state, update status is <stalled 23>
[06 Mar 2017 08:00:42,853] <7013026 CTRL  DEBUG  ibis01> apply: storage1: apply failed


Verifying the status on the failed node by executing lssoftwareupgradestatus command

$ ssh admin@172.23.1.182
IBM_FlashSystem:ibisFlash_00:admin>lssoftwareupgradestatus
status  percent_complete
stalled 23
Workaround:
-----------
1. Abort the upgrade using applysoftware -abort command. Wait until the status becomes inactive

IBM_FlashSystem:ibisFlash_00:admin>applysoftware -abort
IBM_FlashSystem:ibisFlash_00:admin>lssoftwareupgradestatus
status      percent_complete
downgrading 23
IBM_FlashSystem:ibisFlash_00:admin>lssoftwareupgradestatus
status      percent_complete
downgrading 23
...
IBM_FlashSystem:ibisFlash_00:admin>lssoftwareupgradestatus
status      percent_complete
downgrading 23
IBM_FlashSystem:ibisFlash_00:admin>lssoftwareupgradestatus
status      percent_complete
downgrading 23
IBM_FlashSystem:ibisFlash_00:admin>lssoftwareupgradestatus
status   percent_complete

inactive 0 


2. Verify if there are any events for internal errors "Node warmstarted due to an internal error"
This is a known issue with flash storage. One can clear this using cheventlog

IBM_FlashSystem:ibisFlash_00:superuser>lseventlog
sequence_number last_timestamp object_type object_id object_name  copy_id status  fixed event_id error_code description                          secondary_object_type secondary_object_id
103             170209221539   node        2         node2                message no    980349              Node added
114             170301053034   drive       0                              message no    988024              Flash module format complete
115             170301053034   drive       6                              message no    988024              Flash module format complete
...
129             170306074033   cluster               ibisFlash_00         message no    980506              Update prepared
130             170306075323   node        1         node1                alert   no    074002   2030       Internal error                       canister              1
131             170306075401   enclosure   1                              alert   no    085048   2060       Reconditioning of batteries required
...
135             170306075411   cluster               ibisFlash_00         message no    980509              Update stalled
136             170306075411   node        1         node1                alert   no    009100   2010       Update process failed
IBM_FlashSystem:ibisFlash_00:superuser>

Here event 130 has internal error on node with error code 2030.

Detail listing of event:-
IBM_FlashSystem:ibisFlash_00:superuser>lseventlog 130
sequence_number 130
first_timestamp 170306075323
first_timestamp_epoch 1488815603
last_timestamp 170306075323
last_timestamp_epoch 1488815603
object_type node
object_id 1
object_name node1
copy_id
reporting_node_id
reporting_node_name
root_sequence_number
event_count 1
status alert
fixed no
auto_fixed no
notification_type error
event_id 074002

event_id_text Node warmstarted due to an internal error
error_code 2030
error_code_text Internal error
machine_type 9840AE2
serial_number 1351351
FRU None
fixed_timestamp
fixed_timestamp_epoch
callhome_type software
sense1 41 73 73 65 72 74 20 46 69 6C 65 20 2F 62 75 69
sense2 6C 64 2F 74 6D 73 2F 53 56 43 5F 4F 44 45 5F 52
sense3 32 2F 32 30 31 35 2D 30 33 2D 30 39 5F 31 32 2D
sense4 31 34 2D 30 34 2F 72 32 2F 73 72 63 2F 75 73 65
sense5 72 2F 64 72 76 2F 70 61 2F 70 6C 70 61 2E 63 20
sense6 4C 69 6E 65 20 31 35 39 33 00 00 00 00 00 00 00
sense7 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
sense8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
secondary_object_type canister
secondary_object_id 1
IBM_FlashSystem:ibisFlash_00:superuser>


Event_id_text shows Node warmstarted due to an internal error

We need to clear the event on the node using cheventlog fix command.

IBM_FlashSystem:ibisFlash_00:admin>cheventlog -fix 130
IBM_FlashSystem:ibisFlash_00:admin>lseventlog
sequence_number last_timestamp object_type object_id object_name  copy_id status  fixed event_id error_code description                          secondary_object_type secondary_object_id
103             170209221539   node        2         node2                message no    980349              Node added
114             170301053034   drive       0                              message no    988024              Flash module format complete
115             170301053034   drive       6                              message no    988024              Flash module format complete
117             170301053039   drive       2                              message no    988024              Flash module format complete
118             170301053039   drive       7                              message no    988024              Flash module format complete
119             170301053039   drive       8                              message no    988024              Flash module format complete
120             170301053044   drive       1                              message no    988024              Flash module format complete
121             170301053044   drive       3                              message no    988024              Flash module format complete
122             170301053044   drive       4                              message no    988024              Flash module format complete
123             170301053044   drive       5                              message no    988024              Flash module format complete
124             170301053044   drive       9                              message no    988024              Flash module format complete
129             170306074033   cluster               ibisFlash_00         message no    980506              Update prepared
131             170306075401   enclosure   1                              alert   no    085048   2060       Reconditioning of batteries required
132             170306075401   enclosure   1                              alert   no    085048   2060       Reconditioning of batteries required
133             170306075406   enclosure   1                              message no    988030              External data link degraded          canister              2
134             170306075406   enclosure   1                              message no    988030              External data link degraded          canister              1
135             170306075411   cluster               ibisFlash_00         message no    980509              Update stalled
137             170306223519   cluster               ibisFlash_00         message no    980510              Update aborted
138             170306224949   node        2         node2                message no    980349              Node added
139             170306224949   cluster               ibisFlash_00         message no    980508              Update Failed
IBM_FlashSystem:ibisFlash_00:admin>
 

3. Once we have aborted the update and cleared the event we resume the upgrade
Fixed:
----------
N/A
KI007488
Console does not allow resume after PFW update reboots management node
Fixpack
I_V1.0.0.5
I_V1.1.0.1
Console does not allow resume after PFW update reboots management node
When the non-management Apply is running, during the PFW update, the cec reboots. This brings down the management node and stops the fix pack update.

Workaround:
-----------
To resume the update:

Verify the console is started using the mistatus command run as root on the management host. and then execute miupdate -resume

(0) root @ ibis01: 7.1.0.0: /
$ mistatus
CDTFS000063I The system console is started.
(0) root @ ibis01: 7.1.0.0: /
$
 


The command 'miupdate -resume' will not work. Instead:

Determine the 'bwr#' associated with the fixpack.  Use the appl_ls_cat command, run as root on the management host.

$ appl_ls_cat
NAME                     VERSION                       STATUS                   DESCRIPTION
bwr0                     3.0.3.0                       Committed                Initial images for IBM PureData System for Operational Analytics
bwr1                     4.0.5.0                       Applied                  Updates for IBM_PureData_System_for_Operational_Analytics


Substitute the 'bwr#' found above, in this example it is 4.0.5.0 and run the appl_install_sw commmand as root on the management host.

echo "appl_install_sw -l bwr1 -resume > /tmp/appl_install_sw_$(date +"%Y%m%d_%H%M%S").out 2>&1" | at now

This will run the fixpack application outside of the session (these can be long running commands susceptible to terminal loss). Tail the /tmp/appl_install_sw_<date>.out file to view the progress.
Fixed:
----------
V1.0.0.6/V1.1.0.2 The console has been removed in these fixpacks and the fixpack methodology has changed.
KI007489
Apply output shows failed even though there is no failure or error in the log
Fixpack
I_V1.0.0.5
I_V1.1.0.1
Apply output shows failed even though there is no failure or error in the log
The output excerpt from mi command line or through console GUI:
=====================================================
    Log file:
    Infrastructure Infrastructure SAN switch firmware: SANFW apply started 1 of 1 task completed

    Log file:
    Infrastructure Infrastructure Network switch firmware: NetFW apply started 1 of 1 task completed

    Log file:
    Infrastructure Infrastructure Storage firmware: StorageFW apply started 3 of 3 task completed

    Log file:
    Infrastructure Infrastructure Storage firmware: StorageFW apply started 3 of 3 task completed

========================
"The operation failed during the apply stage. The resume phase for the release 'bwr1' failed..
Refer to the platform layer log file for details."
========================
Workaround:
-----------
Verify that update was completed by running appl_ls_cat command.

In case of resume scenarios during mgmt update status should be M_Applied

(0) root @ ibis01: 7.1.0.0: /BCU_share/aixappl/pflayer/log
$ appl_ls_cat
NAME                     VERSION                       STATUS                   DESCRIPTION
bwr0                     4.0.4.2                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics
bwr1                     4.0.5.0                       M_Applied                  Updates for IBM_PureData_System_for_Operational_Analytics_DB2105

(0) root @ ibis01: 7.1.0.0: /BCU_share/aixappl/pflayer/log


In case of resume scenarios during non management/core update status should be  Applied

(0) root @ ibis01: 7.1.0.0: /BCU_share/aixappl/pflayer/log
$ appl_ls_cat
NAME                     VERSION                       STATUS                   DESCRIPTION
bwr0                     4.0.4.2                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics
bwr1                     4.0.5.0                       Applied                  Updates for IBM_PureData_System_for_Operational_Analytics_DB2105

(0) root @ ibis01: 7.1.0.0: /BCU_share/aixappl/pflayer/log


If the status shows either Applied or M_Applied then proceed for the next step ignoring the failure message
Fixed:
----------
V1.0.0.6/V1.1.0.2 The console is removed as part of these fixpacks and the fixpack methodology has changed.
KI007423
ISW update failed while updating MGMT update
Fixpack
I_V1.0.0.5
I_V1.1.0.1
ISW update failed while updating MGMT update
During the PDOA V1.1 FP5/FP1 management apply phase the fix pack may encounter an error. The pflayer log file may show the following error:

...
[10 Nov 2016 20:07:24,829] <8913318 UPDT  DEBUG  ibis01> Node: 172.23.1.1 Return: 256
[10 Nov 2016 20:07:24,844] <8913318 UPDT  ERROR  ibis01> TASK_END::10::5 of 6::ISW_APPLY::172.23.1.1:: ::RC=1::CDTFS000048E An error occurred while updating InfoSphere Data Warehouse.\n\nDetails:\nThe command \"/BCU_share/bwr1/software/ISW/isw/install.bin -DDS_HA_MODE=TRUE -i silent -f /BCU_share/update_105_tFFF.rsp -Dprofile=BCU_share/bwr1/software/ISW/PDS -Dlog=/tmp/isw_full.log\" failed with the error:\n\n\"\"\n\nUser Response:\nContact IBM Support for assistance.
Workaround:
-----------
There is a known issue with the ISW installer returning a status of 256 back to the caller of the install.bin command line even though the installation was a success. To verify:

1. Login to the management node as root in an ssh session.


2. Look for one of the following directories:

/usr/IBM/dwe/appserver_001/iswapp_10.5/logs
or
/usr/IBM/dwe/appserver_001/iswapp_10/logs


3. Look for a file call 'ISWinstall_summary_<date>.log with a recent date.


4. Run the following:

grep -i status logs/ISWinstall_summary_1701121953.log

This should return a large number of lines with 'Status: SUCCESSFUL'.

If this is the case, it is safe to resume the fix pack as the update was successful.
Fixed:
----------
V1.0.0.6/V1.1.0.2 The component that contains ISW is known as Warehouse Tools and this component is removed in these fipxacks.
KI007457
OPM update failed due to DBI Connect connect issue in wait_for_start.pl
Fixpack
I_V1.0.0.5
I_V1.1.0.1
OPM update failed due to DBI Connect connect issue in wait_for_start.pl
During the management apply phase, the apply phase may fail with the following symptoms in the logs.

RC=1::Can't locate DBI.pm in @INC (@INC contains: /usr/opt/perl5/lib/5.10.1/aix-thread-multi /usr/opt/perl5/lib/5.10.1 /usr/opt/perl5/lib/site_perl/5.10.1/aix-thread-multi /usr/opt/perl5/lib/site_perl/5.10.1 /usr/opt/perl5/lib/site_perl .) at /BCU_share/bwr1/code/ISAS/Update/Common/OPM/scripts/wait_for_start.pl line 149.\n

or

DBI connect('OPMDB','',...) failed: [IBM][CLI Driver] SQL1031N  The database directory cannot be found on the indicated file system.  SQLSTATE=58031
at /BCU_share/bwr1/code/ISAS/Update/Common/OPM/scripts/wait_for_start.pl line 155

and hals shows that the DPM components are failed over to the standby management host.
Workaround:
-----------
These messages indicate that the during the management apply phase, the DB2 Performance Monitor (DPM) component failed over to the management stand-by node. There are some known issues with DPM on startup that can lead to failures. If the above symptoms are seen then the next steps are to:

1. Use hals to determine if the DPM resources are indeed failed over.

2. Use lssam on the management host to determine if there are any failed states.

3. Use 'resetrsrc' on any DPM resources that are in a failed state.

4. Verify with lssam that the resources are no longer in a failed state.

5. Use 'hafailover <managementstandby> DPM' to move the DPM resources to the management host.

6. Verify that the DPM resources successfully moved to the management host.

7. Resume the Fix Pack
Fixed:
----------
V1.0.0.6/V1.1.0.2 The update mechansim in these fixpack mechanism has changed.
KI007495
Storage update failed during apply phase because of drive update failure
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Storage update failed during apply phase because of drive update failure
Storage update fails during the apply phase because of drive update failures; one or more storage drives might be in the offline state.

Console output during failure:
========================
The operation failed during the apply stage. Storage update failed on 172.23.1.186.
Refer to the platform layer log file for details.
========================

Sample output of the failure from the PL log:

[08 Mar 2017 04:13:43,315] <28704888 CTRL  DEBUG  finch01> Extracted msg from NLS: apply: 172.23.1.186 ssh admin@172.23.1.186 ssh admin@172.23.1.186 LANG=en_US svctask applydrivesoftware -file  IBM2076_DRIVE_20160923 -type firmware -drive 0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47 command failed.
[08 Mar 2017 04:13:43,315] <28704888 CTRL  DEBUG  finch01> apply: 172.23.1.186: error: ssh admin@172.23.1.186 ssh admin@​172.23.1.186 LANG=en_US svctask applydrivesoftware -file  IBM2076_DRIVE_20160923 -type firmware -drive 0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47 command failed , rc=127


By using executing the lsdrive command, we can verify the drive statuses on the failed storage box. As an example:

$ ssh superuser@172.23.1.186 "lsdrive"
id status  error_sequence_number use    tech_type capacity mdisk_id mdisk_name member_id enclosure_id slot_id node_id node_name auto_manage
0  online                        member sas_hdd   837.9GB  0        ARRAY3     11        2            1                         inactive  
1  online                        member sas_hdd   837.9GB  0        ARRAY3     10        1            1                         inactive  
2  online                        member sas_hdd   837.9GB  0        ARRAY3     9         2            2                         inactive  
3  online                        member sas_hdd   837.9GB  0        ARRAY3     8         1            10                        inactive  
4  online                        member sas_hdd   837.9GB  0        ARRAY3     7         1            2                         inactive  
5  offline 273                   failed sas_hdd   837.9GB                                2            10                        inactive  
6  online                        member sas_hdd   837.9GB  0        ARRAY3     5         2            9                         inactive  
7  online                        member sas_hdd   837.9GB  0        ARRAY3     4         1            9                         inactive  
8  online                        member sas_hdd   837.9GB  0        ARRAY3     3         2            11                        inactive  
9  online                        member sas_hdd   837.9GB  0        ARRAY3     2         2            8                         inactive  
10 online                        member sas_hdd   837.9GB  0        ARRAY3     1         1            8                         inactive  
11 online                        member sas_hdd   837.9GB  0        ARRAY3     0         1            11                        inactive  
12 online                        member sas_hdd   837.9GB  1        ARRAY4     11        2            7                         inactive  
13 offline 268                   spare  sas_hdd   837.9GB                                1            7                         inactive  
14 online                        member sas_hdd   837.9GB  1        ARRAY4     9         2            6                         inactive  
15 online                        member sas_hdd   837.9GB  1        ARRAY4     8         1            6                         inactive  
16 online                        member sas_hdd   837.9GB  1        ARRAY4     7         1            5                         inactive  
17 online                        member sas_hdd   837.9GB  1        ARRAY4     6         1            12                        inactive

Here we see that 2 drives (drive 5 and drive 13) are in the offline state which is the reason for the failure.
Workaround:
-----------
Run the lsdrive to see the list of failed drives and then do the following:

1. Fix the drives.

2. Ensure the statuses of the drives are all online.

3. Resume the fix pack update.
Fixed:
----------
V1.0.0.6/V1.1.0.2 The fixpack mechanism has changed in these fipxacks, however drive failures can lead to similar sypmtoms which similar workarounds.
KI007496
CECs is rebooted and /BCU_share is unmounted after a power firmware update
Fixpack
I_V1.0.0.5
I_V1.1.0.1
CECs is rebooted and /BCU_share is unmounted after a power firmware update
The upgrade of the CEC that hosts the management and admin nodes is done and the CEC gets rebooted and the run is halted due to this reboot.
When the CEC gets back online and the run is resumed, the BCU_share gets mounted on the management and admin nodes.
Subsequently, the other CECs gets upgraded and rebooted.
But when the respective nodes come back up, the BCU_share does not get mounted again and the upgrade proceeds to the point where it tries to access BCU_share and fails.

Symptoms:
1. The output excerpt from the log
=====================================================
[16 Nov 2016 08:32:35,041] <2884066 ADPT ERROR ibis01> Failed to unpack the adapter firmware file /BCU_share/bwr1/firmware/fc_adapter/df1000f114100104/image/df1000f114100104.
203305.aix.rpm on 172.23.1.4.
=====================================================

2. The /BCU_share NFS mount shared from the management host is not mounted on all hosts.
Workaround:
-----------
After the failure is identified, simply resume. The resume code will verify that /BCU_share is mounted across the hosts.
Fixed:
----------
V1.0.0.6/V1.1.0.2 The fixpack mechanism has completely changed rendering this KI moot.
KI007479
miinfo compliance command shows compliance issue for some of the products
General
I_V1.0.0.5
I_V1.1.0.1
miinfo compliance command shows compliance issue for some of the products
Running miinfo compliance command shows some levels are not correct. 'miinfo -d -c'

You may see the following under some servers:

IBM Systems Director Common Agent           6.3.3.1                                                                     Higher
IBM InfoSphere Warehouse                    10.5.0.20151104_10.5.0.8..0                                                 Lower
IBM InfoSphere Optim Query Workload Tuner   The version of the product cannot be determined or it is not installed.     NA
Workaround:
-----------
1. IBM Systems Director Common Agent          6.3.3.1                                                                     Higher
The common agent should no longer be tracked by the compliance software. This is a defect in the compliance check program and will not impact the operation of the appliance.

2. IBM InfoSphere Warehouse                   10.5.0.20151104_10.5.0.8..0                                                 Lower
The InfoSphere Warehouse software compliance check uses 20151117 instead of 20151104. This is a defect in the compliance check code and will not impact the operation of the appliance.

3. IBM InfoSphere Optim Query Workload Tuner  The version of the product cannot be determined or it is not installed.     NA
This is normal for nodes that are currently running as standby hosts. The compliance checker has a limitation in that it cannot check the level when a core host is currently a designated standby host.
Fixed:
----------
V1.0.0.6/V1.1.0.2
     MI* commands are no longer used.
     Warehouse Tools is removed as part of the fixpack.
KI007503
FP5 cannot be directly applied to FP3 without some additional fixes and modifications.
Fixpack I_V1.0.0.5
FP5 cannot be directly applied to FP3 without some additional fixes and modifications.
The IBM PureData System for Operational Analytics V1.0 FP5 package cannot be directly applied to FP3 environments. There are three distinct issues with FP5 on FP3 environments.

1. It is possible to register the fixpack, however after registration the console will no longer start. During the preview step similar messages to the following will appear in the log.
  • 03 Apr 2017 03:54:54,222] <30474324 UPDT PREV INFO   paradise01> PRODUCT_UPDATES::OPM_PROD::OPM::Management::5.3.1.0.8440::5.3.0.0.7336::OPM
    [03 Apr 2017 03:54:54,481] <30474324 UPDT PREV DEBUG  paradise01> server_logical_names = server6
    [03 Apr 2017 03:55:01,876] <30474324 UPDT  DEBUG  paradise01> Stage getlevel failed:
    [03 Apr 2017 03:55:01,877] <30474324 UPDT  DEBUG  paradise01> Use of uninitialized value in split at /opt/ibm/aixappl/pflayer/lib/ISAS/PlatformLayer/TSA/Topology.pm line 138, <> line 10.
    [03 Apr 2017 03:55:01,878] <30474324 UPDT PREV ERROR  paradise01>  Use of uninitialized value in split at /opt/ibm/aixappl/pflayer/lib/ISAS/PlatformLayer/TSA/Topology.pm line 138, <> line 10.
    [03 Apr 2017 03:55:01,927] <30474324 UPDT PREV DEBUG  paradise01> Executing query Logical_name=bwr1 AND Solution_version=4.0.5.0, to update status of Solution
    [03 Apr 2017 03:55:02,056] <30474324 UPDT PREV INFO   paradise01> PHASE_END PREVIEW
    [03 Apr 2017 03:55:02,058] <30474324 UPDT PREV ERROR  paradise01> The preview phase for the release 'bwr1' failed.


2. It is possible to apply the fixpack via command line, however the fixpack will fail validation as the firmware levels on the V7000 is too low for FP5 to update.


3. It is possible to apply the fixpack via command line, however the fixpack will fail validation as the firmware levels on the SAN switches is too low for FP5 to update.
Workaround:
-----------
See the document 'How to apply the IBM PureData System for Operational Analytics V1.0 FP5 on a FP3 environment?' for more information about the FP3 to FP5 scenario.
Fixed:
----------
Only applies to V1.0.0.3 to V1.0.0.5 scnearios.
KI007523
Failed paths after the fixpack apply stage
Fixpack
I_V1.0.0.5
I_V1.1.0.1
Failed paths after the fixpack apply stage
The AIX hosts may have failed paths to the external storage. The run the following command as root on the management host:

dsh -n $ALL "lspath | grep hdisk | grep -v Enabled | wc -l"  | dshbak -c

Will return output if there are failed paths to the storage.
Workaround:
-----------
There are two remedy options.

1. Reboot the host with the failed paths. This will effectively bounce the port. This may require an outage.

2. Follow the instructions below to bounce only the port. All access to the storage is fully redundant with multiple paths which is why the system can start even with failed paths. This method avoids an outage and effectively bounces the port.

The following should be done one port at a time and should be performed either in an outage window, or a time when the system will have very low I/O activity.

a. For each host login, determine the ports with failed paths using the 'lspath | grep hdisk | grep -v Enabled  | while read stat disk dev rest;do echo "${dev}";done | sort | uniq' command. This command will return the uniq set of ports connected to hdisk devices that are Missing or Failed.

Example output:
fscsi10

b. For each port, create a script and update the 'export id=1' to match the # in the fscsi# id of the failed path. This script will then remove all paths to that port, set the port to the defined state, and will then rediscover the paths. This effectively bounces the port.

c. Change the id to match the fscsi<id> number. Run the each script to remove the paths and to put the device in defined state, then use cfgmgr to reinitialize. This should create all of the new paths. Run these scripts one at a time and then verify that the path no longer appears in the command shown in a.

export id=1
lspath -p fscsi${id} | while read st hd fs;do echo $hd;done | sort | uniq | while read disk;do rmpath -d -l ${disk} -p fscsi${id};done
rmdev -l sfwcomm${id};rmdev -l fscsi${id};rmdev -l fcs${id}
cfgmgr -s


d. After the commands run, verify that there are no more failed paths over the port and that the port has discovered the existing paths.

Bouncing the port in this way preserves any settings stored in ODM for the fcs and fscsi devices.
Fixed:
----------
V1.0.0.6/V1.1.0.2 This fixpack uses a different mechanism wihch has not shown to have this issue.
KI007537
DB2 and/or ISW preview failures due to incorrect fixpack or incomplete DB2 10.5 upgrade.
Fixpack
I_V1.0.0.5
DB2 and/or ISW preview failures due to incorrect fixpack or incomplete DB2 10.5 upgrade.
I downloaded and registered the fixpack, however I'm receiving preview errors related to the DB2 and / or InfoSphere Warehouse (ISW) levels.

There are two different fix central downloads for IBM PureData System for Operational Analytics V1.0 fixpack 5.

IBM PureData System for Operational Analytics Fix Pack 5 (for systems with DB2 Version 10.1)

and

IBM PureData System for Operational Analytics Fix Pack 5 (for systems with DB2 Version 10.5)

There are a couple of scenarios where problems arise.

1. Customer has DB2 V10.1, downloads and registers the fixpack with DB2 10.5.
2. Customer has followed the instructions to uplift or upgrade DB2 to 10.5 by following the technote Upgrading an IBM PureData System for Operational Analytics Version 1.0 environment to DB2 10.5 and downloads the fixpack with DB2 10.1.
3. Customer has only partially followed the instructions to uplift or upgrade DB2 to 10.5 by following the technote Upgrading an IBM PureData System for Operational Analytics Version 1.0 environment to DB2 10.5 and downloads the fixpack with DB2 10.5., but encounters a preview errors related to the ISW level being at the incorrect version level.
Workaround:
-----------
1. Customer has DB2 V10.1, downloads and registers the fixpack with DB2 10.5.
2. Customer has followed the instructions to uplift or upgrade DB2 to 10.5 by following the technote Upgrading an IBM PureData System for Operational Analytics Version 1.0 environment to DB2 10.5 and downloads the fixpack with DB2 10.1.

Contact IBM Support for help to de-register the incorrect fixpack.
Download the fixpack with the correct db2 levels.
Follow the fixpack instructions as usually.


3. Customer has only partially followed the instructions to uplift or upgrade DB2 to 10.5 by following the technote Upgrading an IBM PureData System for Operational Analytics Version 1.0 environment to DB2 10.5 and downloads the fixpack with DB2 10.5. but encounters a preview errors related to the ISW level being at the incorrect version level.

This scenario is most likely to due to issues where the technote was not fully followed. This can happen due to confusion about the relationships between the InfoSphere Warehouse and DB2. Most customers understand how to upgrade DB2 and it is easy to miss that it is important to update the InfoSphere Warehouse product as well. Our Fixpack catalog does not at present support mixing DB2 10.5 and InfoSphere Warehouse 10.1 together. So the customer will need to revisit the Upgrading an IBM PureData System for Operational Analytics Version 1.0 environment to DB2 10.5 technote to verify that all of the update steps were followed and that the InfoSphere Warehouse levels are at 10.5 and the WebSphere Application Server levels are at 8.5.5.x as required as part of the technote.

Once the levels are updated per the technote the Fixpack can be resumed and the preview should no longer fail.
Fixed:
----------
V1.0.0.6/V1.1.0.2 this known issue is no longer applicable.
KI007538
The FixCentral download inadvertently includes XML files that are not part of the fixpack.
Fixpack
I_V1.0.0.5
I_V1.1.0.1
The FixCentral download inadvertently includes XML files that are not part of the fixpack.
I downloaded all of the files included in the FixCentral for the fixpack and there are extra XML files included. What are they for?

The XML files are of the following pattern:

*.fo.xml
*SG*.xml
Workaround:
-----------
These files were inadvertently included in the fixpack packages and should either not be downloaded or deleted.
Fixed:
----------
V1.0.0.6/V1.1.0.2
KI007549
Storage update failed during apply phase because an update is already in progress message. [ Added 2017-09-18 ]
Fixpack
I_V1.0.0.5
I_V1.1.0.1
Storage update failed during apply phase because an update is already in progress message. [ Added 2017-09-18 ]
During the fixpack apply phase the fixpack fails.

----------------------------------------------------------------------------------------
/BCU_share/applmgmt/pflayer/log/pl_update.log:
--> Not much in this log except for the failure message.
----------------------------------------------------------------------------------------

[16 Sep 2017 21:14:05,518] <8126944 UPDT APPI DEBUG mgmthost> STORAGE:storage0:172.23.1.181:1:Storage firmware update failed.


----------------------------------------------------------------------------------------
/BCU_share/applmgmt/pflayer/log/pl_update.trace:
--> This excerpts shows an attempt to update the drive fw fails. The critical message is this one: "CMMVC6055E The action failed as an update is in progress.\n"],<"PLLogger=HASH(0x2000f558)"

----------------------------------------------------------------------------------------
[16 Sep 2017 21:11:38,575] <5832914 CTRL TRACE mgmthost> sleep time is not configured, defaults will be applied
[16 Sep 2017 21:13:38,576] <5832914 CTRL DEBUG mgmthost> apply: 172.23.1.181: now installing drive updates...
[16 Sep 2017 21:13:38,577] <5832914 CTRL DEBUG mgmthost> drive_id: 0:1:2:3:4:6:7:8:9:10:11:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:58:59:60:61:62:64:65:66:67:68:69:70:71
[16 Sep 2017 21:13:38,577] <5832914 CTRL DEBUG mgmthost> Number of drive id's is less than 128
[16 Sep 2017 21:13:38,578] <5832914 CTRL DEBUG mgmthost> Drive update command execution cnt : 0.
[16 Sep 2017 21:13:43,150] <5832914 CTRL TRACE mgmthost> command: ssh admin@172.23.1.181 LANG=en_US svctask applydrivesoftware -file IBM2076_DRIVE_20160923 -type firmware -drive 0:1:2:3:4:6:7:8:9:10:11:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:58:59:60:61:62:64:65:66:67:68:69:70:71
[16 Sep 2017 21:13:43,151] <5832914 CTRL TRACE mgmthost> CMMVC6055E The action failed as an update is in progress.
[16 Sep 2017 21:13:43,151] <5832914 CTRL TRACE mgmthost> Rc = 1
[16 Sep 2017 21:13:43,152] <5832914 CTRL DEBUG mgmthost> Extracted msg from NLS: apply: 172.23.1.181 ssh admin@172.23.1.181 LANG=en_US svctask applydrivesoftware -file IBM2076_DRIVE_20160923 -type firmware -drive 0:1:2:3:4:6:7:8:9:10:11:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:58:59:60:61:62:64:65:66:67:68:69:70:71 command failed.
[16 Sep 2017 21:13:43,153] <5832914 CTRL DEBUG mgmthost> apply: 172.23.1.181: error: ssh admin@172.23.1.181 LANG=en_US svctask applydrivesoftware -file IBM2076_DRIVE_20160923 -type firmware -drive 0:1:2:3:4:6:7:8:9:10:11:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:58:59:60:61:62:64:65:66:67:68:69:70:71 command failed , rc=1
[16 Sep 2017 21:13:43,153] <5832914 CTRL TRACE mgmthost> { Entering Ctrl::Updates::Storage::search_token (Called from /opt/ibm/aixappl/pflayer/lib/Ctrl/Updates/Storage.pm line 1127)
[16 Sep 2017 21:13:43,154] <5832914 CTRL TRACE mgmthost> Args:[["CMMVC8325E","None of the specified drives needed to be upgraded or downgraded"],[],<"PLLogger=HASH(0x2000f558)">]
[16 Sep 2017 21:13:43,155] <5832914 CTRL DEBUG mgmthost> Not able to find CMMVC8325E None of the specified drives needed to be upgraded or downgraded in the output, an unexpected error occured
[16 Sep 2017 21:13:43,155] <5832914 CTRL TRACE mgmthost> Return: 0
[16 Sep 2017 21:13:43,156] <5832914 CTRL TRACE mgmthost> Exiting Ctrl::Updates::Storage::search_token }
[16 Sep 2017 21:13:43,156] <5832914 CTRL TRACE mgmthost> { Entering Ctrl::Updates::Storage::search_token (Called from /opt/ibm/aixappl/pflayer/lib/Ctrl/Updates/Storage.pm line 1128)
[16 Sep 2017 21:13:43,157] <5832914 CTRL TRACE mgmthost> Args:[["CMMVC8325E","None of the specified drives needed to be upgraded or downgraded"],["CMMVC6055E The action failed as an update is in progress.\n"],<"PLLogger=HASH(0x2000f558)">]
[16 Sep 2017 21:13:43,157] <5832914 CTRL DEBUG mgmthost> Not able to find CMMVC8325E None of the specified drives needed to be upgraded or downgraded in the output, an unexpected error occured
[16 Sep 2017 21:13:43,158] <5832914 CTRL TRACE mgmthost> Return: 0
[16 Sep 2017 21:13:43,158] <5832914 CTRL TRACE mgmthost> Exiting Ctrl::Updates::Storage::search_token }
[16 Sep 2017 21:13:43,158] <5832914 CTRL TRACE mgmthost> { Entering Ctrl::Updates::Storage::search_token (Called from /opt/ibm/aixappl/pflayer/lib/Ctrl/Updates/Storage.pm line 1138)
[16 Sep 2017 21:13:43,159] <5832914 CTRL TRACE mgmthost> Args:[["CMMVC6546E","The current drive status is degraded"],[],<"PLLogger=HASH(0x2000f558)">]
[16 Sep 2017 21:13:43,159] <5832914 CTRL DEBUG mgmthost> Not able to find CMMVC6546E The current drive status is degraded in the output, an unexpected error occured
[16 Sep 2017 21:13:43,160] <5832914 CTRL TRACE mgmthost> Return: 0
[16 Sep 2017 21:13:43,160] <5832914 CTRL TRACE mgmthost> Exiting Ctrl::Updates::Storage::search_token }
[16 Sep 2017 21:13:43,160] <5832914 CTRL TRACE mgmthost> { Entering Ctrl::Updates::Storage::search_token (Called from /opt/ibm/aixappl/pflayer/lib/Ctrl/Updates/Storage.pm line 1139)
[16 Sep 2017 21:13:43,161] <5832914 CTRL TRACE mgmthost> Args:[["CMMVC6546E","The current drive status is degraded"],["CMMVC6055E The action failed as an update is in progress.\n"],<"PLLogger=HASH(0x2000f558)">]
[16 Sep 2017 21:13:43,161] <5832914 CTRL DEBUG mgmthost> Not able to find CMMVC6546E The current drive status is degraded in the output, an unexpected error occured
[16 Sep 2017 21:13:43,162] <5832914 CTRL TRACE mgmthost> Return: 0
[16 Sep 2017 21:13:43,162] <5832914 CTRL TRACE mgmthost> Exiting Ctrl::Updates::Storage::search_token }
[16 Sep 2017 21:13:43,163] <5832914 CTRL DEBUG mgmthost> Function search_token failed, exiting the loop.
[16 Sep 2017 21:13:43,163] <5832914 CTRL DEBUG mgmthost> Drive update got failed on 172.23.1.181 storage.
[16 Sep 2017 21:13:43,203] <6029570 CTRL DEBUG mgmthost> apply: storage0: apply failed
[16 Sep 2017 21:13:43,204] <6029570 CTRL TRACE mgmthost> For message id::1021
[16 Sep 2017 21:13:58,212] <6029570 CTRL TRACE mgmthost> { Entering Ctrl::Query::Status::read_status_n_details (Called from /opt/ibm/aixappl/pflayer/lib/Ctrl/Updates/Storage.pm line 1210)
[16 Sep 2017 21:13:58,213] <6029570 CTRL TRACE mgmthost> Args:["172.23.1.181 storage0 storage 0 NA"]
[16 Sep 2017 21:13:58,213] <6029570 CTRL TRACE mgmthost> { Entering Ctrl::Util::util_details (Called from /opt/ibm/aixappl/pflayer/lib/Ctrl/Query/Status.pm line 45)
[16 Sep 2017 21:13:58,214] <6029570 CTRL TRACE mgmthost> Args:["172.23.1.181","storage",0]
....
16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> Command Status:Details:
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> Status: Online
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> AccessState: Unlocked
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> Model: 124
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> IPv4Address: ["172.23.1.181"]
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> Manufacturer: IBM
...
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> FWBuild: 115.54.1610251759000
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> PLLogicalName: storage0
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> MachineType: 2076
[16 Sep 2017 21:14:05,520] <8126944 UPDT APPI DEBUG mgmthost> HostName: V7_00_1
[16 Sep 2017 21:14:05,521] <8126944 UPDT APPI DEBUG mgmthost> FWLevel: 7.5.0.11
[16 Sep 2017 21:14:05,521] <8126944 UPDT APPI DEBUG mgmthost> Description: IBM Storwize V7000 Storage
[16 Sep 2017 21:14:05,521] <8126944 UPDT APPI DEBUG mgmthost> STORAGE:storage0:172.23.1.181:1:Storage firmware update failed.
[16 Sep 2017 21:14:05,521] <8126944 UPDT APPI DEBUG mgmthost> , Command Status->1
[16 Sep 2017 21:14:05,522] <8126944 UPDT ERROR mgmthost> TASK_END::13::1 of 1::StorageUPD::172.23.1.181::::RC=1::Storage update failed on 172.23.1.181
[16 Sep 2017 21:14:05,523] <8126944 UPDT APPI TRACE mgmthost> Return: 0
...
[16 Sep 2017 21:14:05,831] <8126944 UPDT APPI ERROR mgmthost> Error on nodes (172.23.1.181).
[16 Sep 2017 21:14:05,848] <8126944 UPDT APPI INFO mgmthost> STEP_END::13::StorageFW_UPD::FAILED
...
[16 Sep 2017 21:14:06,220] <8126944 UPDT APPI TRACE mgmthost> Exiting /opt/ibm/aixappl/pflayer/lib/ManageCatalogStatus.pm => ODM::change_record }
[16 Sep 2017 21:14:06,224] <8126944 UPDT APPI TRACE mgmthost> Exiting ManageCatalogStatus::update_status }
[16 Sep 2017 21:14:06,225] <8126944 UPDT APPI INFO mgmthost> PHASE_END APPLY IMPACT
[16 Sep 2017 21:14:06,225] <8126944 UPDT APPI TRACE mgmthost> For message id::640
[16 Sep 2017 21:14:06,227] <8126944 UPDT APPI ERROR mgmthost> The apply phase for the release 'bwr5' failed.
[16 Sep 2017 21:14:06,228] <8126944 UPDT RESU INFO mgmthost> PHASE_END RESUME
[16 Sep 2017 21:14:06,228] <8126944 UPDT RESU TRACE mgmthost> For message id::640
[16 Sep 2017 21:14:06,229] <8126944 UPDT RESU ERROR mgmthost> The resume phase for the release 'bwr5' failed.



----------------------------------------------------------------------------------------
lseventlog message
--> this messages shows the last message with the 2050 error code.
----------------------------------------------------------------------------------------
131 170916183340 cluster V7_00_1 alert no 009198 2050 System update completion required


----------------------------------------------------------------------------------------
V7000 lsupdate status
--> Indcates a status of system_completion_required
----------------------------------------------------------------------------------------

ssh -n superuser@172.23.1.181 'lsupdate'
status system_completion_required
event_sequence_number 131
progress
estimated_completion_time
suggested_action complete
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name




This is a documented issue when upgrading V7000 firmware from 7.3.x to 7.4.x as indicated in the 7.4.0 release notes.

https://public.dhe.ibm.com/storage/san/sanvc/release_notes/740_releasenotes.html
Workaround:
-----------
Using the PureData System for Operational Analytics Console, use the Service Level Access page to find the link to access the Management Interface for each of the V7000s.

Navigate to the Events page which should show an alert. Select this alert and follow the fix procedures to initiate the second phase.

Do this for each of the V7000s that has this issue. This step will take approximately 40 mins per enclosure and can be run in parallel.


After the second phase is completed. You should see the message 'System update completion finished from lseventlog.

lseventlog -fixed yes

130 170916183340 cluster V7_00_1 message no 980507 Update completed
131 170916183340 cluster V7_00_1 alert yes 009198 2050 System update completion required
132 170917022703 cluster V7_00_1 message no 980511 System update completion started
133 170917022713 node 3 node2 message no 980513 Node restarted for system update completion
134 170917022713 io_grp 0 io_grp0 message no 981102 SAS discovery occurred, configuration changes pending
135 170917022729 io_grp 0 io_grp0 message no 981103 SAS discovery occurred, configuration changes complete
136 170917022818 node 3 node2 message no 980349 Node added
137 170917022818 io_grp 0 io_grp0 message no 981102 SAS discovery occurred, configuration changes pending
138 170917022828 io_grp 0 io_grp0 message no 981103 SAS discovery occurred, configuration changes complete
139 170917025828 node 1 node1 message no 980513 Node restarted for system update completion
140 170917025828 io_grp 0 io_grp0 message no 981102 SAS discovery occurred, configuration changes pending
141 170917025828 io_grp 0 io_grp0 message no 981103 SAS discovery occurred, configuration changes complete
142 170917025939 node 1 node1 message no 980349 Node added
143 170917025941 io_grp 0 io_grp0 message no 981102 SAS discovery occurred, configuration changes pending
144 170917025941 cluster V7_00_1 message no 980512 System update completion finished
145 170917025946 io_grp 0 io_grp0 message no 981103 SAS discovery occurred, configuration changes complete


lsupdate on that host should show 'status' = success.


ssh -n superuser@172.23.1.181 'lsupdate'
status success
event_sequence_number
progress
estimated_completion_time
suggested_action start
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name



Once this is completed for all V7000s resume the apply phase.
Fixed:
----------
V1.0.0.6/V1.1.0.2
KI007421
HMC fw update fails in getupgfiles step. [ Added 2017-10-07 ]
Fixpack
I_V1.0.0.5
I_V1.1.0.1
HMC fw update fails in getupgfiles step. [ Added 2017-10-07 ]
The fixpack fails with the following message in the pl_update.log file.

[07 Oct 2017 14:51:11,167] <3933062 CTRL  DEBUG  host01> iso file validation failed/not-applicable
[07 Oct 2017 14:51:11,168] <3933062 CTRL  DEBUG  host01>  Updates failed
[07 Oct 2017 14:51:11,188] <3474144 UPDT  ERROR  host01> TASK_END::2::1 of 1::HMCUPD::172.23.1.246::::RC=1::Update failed for HMC
[07 Oct 2017 14:51:11,285] <3474144 UPDT APPI DEBUG  host01> Executing query Logical_name=Management AND Solution_version=4.0.5.0, to update status of Product
[07 Oct 2017 14:51:11,340] <3474144 UPDT APPI DEBUG  host01> Executing query Sub_module_type=Management AND Solution_version=4.0.5.0, to update status of sub module
[07 Oct 2017 14:51:11,570] <3474144 UPDT APPI DEBUG  host01> Executing query Logical_name=Management AND Solution_version=4.0.5.0, to update status of Product
[07 Oct 2017 14:51:11,616] <3474144 UPDT APPI DEBUG  host01> Executing query Sub_module_type=Management AND Solution_version=4.0.5.0, to update status of sub module
[07 Oct 2017 14:51:11,735] <3474144 UPDT APPI ERROR  host01> Error on nodes (172.23.1.246 172.23.1.245).
[07 Oct 2017 14:51:11,752] <3474144 UPDT APPI INFO   host01> STEP_END::2::HMC_UPD::FAILED
[07 Oct 2017 14:51:11,756] <3474144 UPDT APPI DEBUG  host01> Error occured in apply for product hmc1
[07 Oct 2017 14:51:11,806] <3474144 UPDT APPI DEBUG  host01> Executing query Logical_name=hmc1 AND Solution_version=4.0.5.0, to update status of Product
[07 Oct 2017 14:51:11,925] <3474144 UPDT APPI ERROR  host01> Apply (impact) phase for management module has failed.
[07 Oct 2017 14:51:12,011] <3474144 UPDT APPI DEBUG  host01> Executing query Logical_name=bwr1 AND Solution_version=4.0.5.0, to update status of Solution
[07 Oct 2017 14:51:12,129] <3474144 UPDT APPI INFO   host01> PHASE_END APPLY IMPACT
[07 Oct 2017 14:51:12,131] <3474144 UPDT APPI ERROR  host01> The apply phase for the release 'bwr1' failed.
[07 Oct 2017 14:51:12,132] <3474144 UPDT RESU INFO   host01> PHASE_END RESUME
[07 Oct 2017 14:51:12,134] <3474144 UPDT RESU ERROR  host01> The resume phase for the release 'bwr1' failed.



Looking earlier in the log we see the following message:


07 Oct 2017 14:51:07,582] <3998084 CTRL  DEBUG  host01> Last login: Sat Oct  7 14:41:30 2017 from 172.23.1.1^M
[07 Oct 2017 14:51:07,582] <3998084 CTRL  DEBUG  host01> ^[[?1034hhscroot@pddrmd7hmc1:~> getupgfiles -h 172.23.1.1 -u root -d /BCU_share/bwr1/firmware/hmc/CR6/image/imports/HMC_Recovery_V8R830_5 -s
[07 Oct 2017 14:51:07,582] <3998084 CTRL  DEBUG  host01> Enter the current password for user root:
[07 Oct 2017 14:51:07,582] <3998084 CTRL  DEBUG  host01>
[07 Oct 2017 14:51:07,582] <3998084 CTRL  DEBUG  host01> The file transfer did not complete sucessfully.
[07 Oct 2017 14:51:07,582] <3998084 CTRL  DEBUG  host01> Verify the remote directory exists, all required files needed for upgrade are there,
[07 Oct 2017 14:51:07,583] <3998084 CTRL  DEBUG  host01> you have read access to both the directory and the files, and then try the operation again.
[07 Oct 2017 14:51:07,583] <3998084 CTRL  DEBUG  host01> hscroot@pddrmd7hmc1:~> echo $?
[07 Oct 2017 14:51:07,583] <3998084 CTRL  DEBUG  host01> 1
[07 Oct 2017 14:51:07,583] <3998084 CTRL  DEBUG  host01> hscroot@pddrmd7hmc1:~>
[07 Oct 2017 14:51:07,585] <3998084 CTRL  DEBUG  host01> From process 4522578: STDERR:
[07 Oct 2017 14:51:07,585] <3998084 CTRL  DEBUG  host01>
[07 Oct 2017 14:51:07,586] <3998084 CTRL  DEBUG  host01> Exit code: 1
[07 Oct 2017 14:51:07,587] <3998084 CTRL  DEBUG  host01> Command return code -> 1
[07 Oct 2017 14:51:07,588] <3998084 CTRL  DEBUG  host01> getupgfiles command failed
[07 Oct 2017 14:51:07,626] <3998084 CTRL  DEBUG  host01> Failed to upgrade release
Workaround:
-----------
Check the log file to see if the update completed on the second or other HMC in the environment. If the update completed successfully then the most likely reason is the known_hosts file for the root user has an incorrect ssh host key associated with the management host. This should be a rare occurrence but can happen if the ssh host keys on the management host change over time and during troubleshooting or a deployment step an ssh session was initiated from the root user on the hmc to the management host causing an issue.

To resolve this issue will require PDOA support to open a secondary to the HMC support team. The HMC support team will lead the customer to obtain pesh access. This is described in the following documentation or pesh for Power 7 and Power 8.

Access to pesh requires accessing the hscpe user and the root user. In PDOA environments the hscpe user is removed before the system is turned over, but it may have been created during troubleshooting steps. Therefore it may be necessary to create the hscpe user or to change the password for the hscpe user if that user already exists due to a previous troubleshooting step. The same is true for the root user, if the root password is not known, then it will be necessary to modify the root password. Both hscpe and root password can be modified through the hscroot user using the chhmcuser command.

Once a pesh session is established and the customer is able to access the root account it is possible to test that this indeed is the problem via the following as the root user:

bash-4.1# ssh root@172.23.1.1
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is



To fix the issue as the root user on the hmc run the following:

Note that if your management host internal network ip address is different from 172.23.1.1 then substitute that IP address in the ssh-keygen command.

ssh-keygen -R 172.23.1.1

This command will remove the entry from /root/.ssh/known_hosts file in hmc. We have
to remove this using root user of hmc.
Fixed:
----------
N/A.
KI007553
"Could not start the product 'GPFS' on" during apply phase.[ Added 2017-11-22 ]
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
"Could not start the product 'GPFS' on" during apply phase.[ Added 2017-11-22 ]
When the fixpack attempts to restart GPFS on a host, it may fail to start GPFS causing the fixpack process to fail.
Workaround:
-----------
This happens due to a limitation in the pflayer code which determines whether all of the GPFS filesystem mount points are indeed mounted, allowing the fixpack process to proceed to the next step. This code works on a very specific naming convention for NSDs and associated GPFS filesystems as well as a one to one mapping of NSDs to filesystems. If a filesystem and nsd do not follow either of these conventions then the GPFS startup code will not be able to determine when all filesystems are indeed mounted. Customers that have added GPFS filesystems that do not follow these two conventions will need to contact IBM for possible remediation options.

Here is the test.

Run the following commands on the hosts identified in the pl_update.log file that could not start GPFS. These commands can be run prior to the fixpack process.

/usr/lpp/mmfs/bin/mmlsfs all -d 2> /dev/null | grep "\-d" | awk '{ sub(/nsd/, "", $2);print $2}'|sort

mount | grep " mmfs " | awk '{ sub(/\/dev\//,"",$1);print $1}' | sort


The expectation is that the output is exactly the same.
Fixed:
----------
V1.0.0.6/V1.1.0.2 The fixpack mechanism has changed. However this symptom could occur when running appl_start commands which still rely on the rules above to work correctly.  The impact is different as the fixpack no longer relies 100% on appl_start and appl_stop to work.
KI007499
Drive update required for product ID ST900MM0006[ Added 2017-11-22 ]
Fixpack
I_V1.0.0.5
Drive update required for product ID ST900MM0006[ Added 2017-11-22 ]
Before starting the apply phases of the fixpack, it is necessary to apply an update to the V7000 drives. These steps can be applied while the system is online. See the linked V7000 tech note for more information.

Product ID ST900MM0006 need to update drives with firmware level B56S before running the "applydrivesoftware" command.

Data Integrity Issue when Drive Detects Unreadable Data
http://www-01.ibm.com/support/docview.wss?rs=591&uid=ssg1S1005289
Workaround:
-----------
1. In an ssh session log as the root user on the management host.

2. Determine the the ip addresses of all of the V7000 enclosures in the environment.

The SAN_FRAME entries in the xcluster.cfg file are V7000 enclosures.

$ grep 'SAN_FRAME[0-9][0-9]*_IP' /pschome/config/xcluster.cfg
SAN_FRAME1_IP    =      172.23.1.181
SAN_FRAME2_IP    =      172.23.1.182
SAN_FRAME3_IP    =      172.23.1.183
SAN_FRAME4_IP    =      172.23.1.184
SAN_FRAME5_IP    =      172.23.1.185
SAN_FRAME6_IP    =      172.23.1.186
SAN_FRAME7_IP    =      172.23.1.187

or

Use the following command to query the console for the storage enclosures.

$ appl_ls_hw -r storage -A M_IP_address,Description
"172.23.1.181","IBM Storwize V7000 Storage"
"172.23.1.182","IBM Storwize V7000 Storage"
"172.23.1.183","IBM Storwize V7000 Storage"
"172.23.1.184","IBM Storwize V7000 Storage"
"172.23.1.185","IBM Storwize V7000 Storage"
"172.23.1.186","IBM Storwize V7000 Storage"
"172.23.1.187","IBM Storwize V7000 Storage"


In the above examples there are several V7000 storage enclosures and the ip addresses are: 172.23.181 to 172.23.1.187

3. Determine if your system has the impacted drive. This command will provide the number of drives that match the 900 GB with type ST900MM0006.

$ grep 'SAN_FRAME[0-9]*[0-9]_IP' /pschome/config/xcluster.cfg | while read a b c d;do echo "*** ${c} ***";ssh -n superuser@${c} 'lsdrive -nohdr| while read id rest;do lsdrive $id;done' | grep -c "product_id ST900MM0006";done

*** 172.23.1.181 ***
0
*** 172.23.1.182 ***
5
*** 172.23.1.183 ***
23
*** 172.23.1.184 ***
0
*** 172.23.1.185 ***
0
*** 172.23.1.186 ***
0
*** 172.23.1.187 ***
0

4 The PureData System for Operational Analytics V1.0 FP5 image includes the necessary files to perform the drive update. These files were unpacked as part of the fixpack registration. Determine the location of the fixpack on the management host.

$ appl_ls_cat
NAME                     VERSION                       STATUS                   DESCRIPTION
bwr0                     4.0.4.0                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics
bwr1                     4.0.5.0                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics_DB2105


In the above command the fixpack files are part of the id 'bwr1'. This means the files were unpacked on the management host in /BCU_share/bwr1.

5. Determine the fix path by changing the <BWR> variable to the identifier determined in step 3 in the path /BCU_share/<BWR>/firmware/storage/2076/image/imports/drives. From the above example, the id was 'bwr1' so the path is "/BCU_share/bwr1/firmware/storage/2076/image/imports/drives".

6. Verify the fix file exists and also the cksum of the file.

$ ls -la /BCU_share/bwr1/firmware/storage/2076/image/imports/drives
total 162728
drwxr-xr-x    2 26976    19768           256 Jan 18 08:53 .
drwxr-xr-x    5 26976    19768           256 Jan 18 08:53 ..
-rw-r--r--    1 26976    19768      83313381 Jan 18 08:53 IBM2076_DRIVE_20160923


$ cksum /BCU_share/bwr1/firmware/storage/2076/image/imports/drives/IBM2076_DRIVE_20160923
3281318949 83313381 /BCU_share/bwr1/firmware/storage/2076/image/imports/drives/IBM2076_DRIVE_20160923


7. For each ip address identified in step 2. The example below uses 172.23.1.183, All V7000s can be updated concurrently.

a. Copy the image to storwize location /home/admin/upgrade

scp /BCU_share/bwr1/firmware/storage/2076/image/imports/drives/IBM2076_DRIVE_20160923 admin@172.23.1.183:/home/admin/upgrade
IBM2076_DRIVE_20160923                  100%   79MB  39.7MB/s   00:02

b. Update the drive using the command below:
ssh admin@172.23.1.183 "applydrivesoftware -file IBM2076_DRIVE_20160923 -all"

7. Monitor the status of drive upgrade using Isdriveupgradeprogress command. This following command like will report on the progress of all of the V7000s. Repeat this command until there is no longer any output indicating the updates have finished.

$ grep 'SAN_FRAME[0-9]*[0-9]_IP' /pschome/config/xcluster.cfg | while read a b c d;do echo "*** ${c} ***";ssh -n superuser@${c} lsdriveprogress;done
*** 172.23.1.181 ***
*** 172.23.1.182 ***
*** 172.23.1.183 ***
*** 172.23.1.184 ***
*** 172.23.1.185 ***
*** 172.23.1.186 ***
*** 172.23.1.187 ***
Fixed:
----------
NA This KI is limited to V1.0.0.5.
KI007566
HA Tools Version 2.0.5.0 hareset fails with "syntax error at line 854 :`else' unexpected" error.[ Added 2018-01-23]
General
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
HA Tools Version 2.0.5.0 hareset fails with "syntax error at line 854 :`else' unexpected" error.[ Added 2018-01-23]
When attempting backup or restore the core TSA domains the hareset command fails with an error similar to the following.

/usr/IBM/analytics/ha_tools/hareset: syntax error at line 854 :`else' unexpected

This is due to an errant edit as part of changes that were incorporated into the hatools in the March fixpacks (V1.0.0.5/V1.1.0.1) as part of HA Tools version 2.0.5.0.
Workaround:
-----------
To fix in the field:

Login to the management host as root:

cp /usr/IBM/analytics/ha_tools/hareset to /usr/IBM/analytics/ha_tools/hareset.bak

Using the vi editor, modify the file /usr/IBM/analytics/ha_tools/hareset.

Find line 850 in this file.

Modify 'if' to say 'fi'.

Save the file.

diff the file:

$ diff hareset.bak hareset
850c850
< if
---
> fi


Copy this new hareset to file to the rest of the hosts
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
KI007610
On FP3->FP5 The TSA upgrade does not include the appropriate TSA license. [ Added 9/5/2018 ]
Fixpack I_V1.0.0.5
On FP3->FP5 The TSA upgrade does not include the appropriate TSA license. [ Added 9/5/2018 ]

This issue only affects customers who apply PDOA V1.0.0.5 (FP5) to V1.0.0.3 (FP3).

There have been two symptoms that have appeared in the field.

The first symptom occurs after trying to run a command to change rsct / TSA policies.

(mkrsrc-api) 2621-309 Command not allowed as daemon does not have a valid license.
mkequ: 2622-009 An unexpected RMC error occurred.The RMC return code was 1.

The second symptom can occur when trying to update TSA when there is no license. The following error can show up when running installSAM:

prereqSAM: All prerequisites for the ITSAMP installation are met on operating system:  AIX 7100-05
installSAM: Cannot upgrade because no valid license was found.
installSAM: No installation was performed.
installSAM: For details, refer to the 'Error:' entries in the log file:  /tmp/installSAM.2.log

Workaround:
-----------

1. Verify that the license is not applied by running the following command from root on the management host.

$ dsh -n ${ALL} 'samlicm -s' | dshbak -c
HOSTS -------------------------------------------------------------------------
host01, host02, host03, host04, host05, host06, host07
-------------------------------------------------------------------------------
Product license is invalid and needs to be upgraded.

2. If planning to update to PDOA V1.0.0.6 then V1.0.0.6 will include instructions on how to remedy this issue. When PDOA V1.0.0.6 is available download the fixpack from FixCentral and follow the instructions to unpack the fixpack and then in the Appendix which describes how to apply the license as part of the TSA update. If not planning on applying FP6 then contact IBM Support to obtain the sam41.lic file and proceed to step 3.

3. Create the directory /stage/FP3_FP5/TSA.

mkdir -p /stage/FP3_FP5/TSA

4. Copy the sam41.lic file to the /stage/FP3_FP5/TSA directory.

5. Verify that stage is mounted on all hosts in the domain.

6. Run the following command to apply the license to all domains. This does not require restart.

dsh -n $ALL "samlicm -i /stage/FP3_FP5/TSA/sam41.lic "

7. Verify the license was applied successfully. The output should be similar to the output below once the TSA copies are licensed.

$ dsh -n $ALL "samlicm -s " | dshbak -c
HOSTS -------------------------------------------------------------------------
host01, host02, host03, host04
-------------------------------------------------------------------------------
Product: IBM Tivoli System Automation for Multiplatforms 4.1.0.0
Creation date: Fri Aug 16 00:00:01 MST 2013
Expiration date: Thu Dec 31 00:00:01 MST 2037

Fixed:
----------
NA. Only applies to V1.0.0.3->V1.0.0.5 scenarios.
KI007570
Multiple DB2 Copies installed on the core hosts can confuse the fixpack. [ Added 2018-10-03 ]
Fixpack
I_V1.0.0.1
I_V1.0.0.2
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.1.0.1
Multiple DB2 Copies installed on the core hosts can confuse the fixpack. [ Added 2018-10-03 ]

The PDOA appliance is designed as follows:

1 DB2 9.7 copy on the management host to support IBM System Director.

1 DB2 10.1 or 10.5 DB2 copy on the management and management standby hosts supporting Warehouse Tools and DPM.

1 DB2 10.1, 10.5, 11.1 copy on all core hosts supporting the core database.

This assumption is built into the PDOA Console and can impact the following:

--> compliance checks comparing what is one the system to the validated stack

--> fixpack application (preview, prepare, apply, commit phases).

The most likely scenario is a that a customer who is very familiar with DB2 may install additional copies as part of a fixpack or special build installation. This is supported by DB2 but if the previous copy is left on the system it can cause various issues with the console with the most severe issues occurring during the fixpack application.

This issue will minimally impact customers on V1.0.0.5 or V1.1.0.1 as the non-cumulative V1.0.0.6 (FP6) / V1.1.0.2 (FP2) have significantly changed and no longer have this restriction and the compliance check for DB2 in the platform layer is not a critical function.

Workaround:
-----------

Remove any extra DB2 copies from the environment on all hosts before running the fixpack preview. This will prevent fixpack failures due to multiple DB2 copies.

If a problem is encountered during the appliance fixpack related to multiple db2 copies it will be necessary to seek guidance from IBM Support as the next steps will depend on the failure as well as when in the process of applying the fixpack.

Fixed:
----------
V1.0.0.6/V1.1.0.2 The fixpack application mechanism has been modified and no longer requires just on DB2 copy to be installed.
KI007499
Drive update required for product ID ST1200MM0007
Fixpack I_V1.1.0.1
Drive update required for product ID ST1200MM0007
Before starting the apply phases of the fixpack, it is necessary to apply an update to the V7000 drives. These steps can be applied while the system is online. See the linked V7000 tech note for more information.

Product ID ST1200MM0007 need to update drives with firmware level B57D before running the "applydrivesoftware" command.

Data Integrity Issue when Drive Detects Unreadable Data
http://www-01.ibm.com/support/docview.wss?rs=591&uid=ssg1S1005289
Workaround:
-----------
1. In an ssh session log as the root user on the management host.

2. Determine the the ip addresses of all of the V7000 enclosures in the environment.

The odd numbered SAN_FRAME entries in the xcluster.cfg file are V7000 enclosures.

$ grep 'SAN_FRAME[0-9]*[13579]_IP' /pschome/config/xcluster.cfg
SAN_FRAME1_IP    =      172.23.1.181
SAN_FRAME3_IP    =      172.23.1.183

or

Use the following command to query the console for the storage enclosures.

$ appl_ls_hw -r storage -A M_IP_address,Description
"172.23.1.181","IBM Storwize V7000 FAB-1 Storage"
"172.23.1.182","IBM FlashSystem 900 Storage"
"172.23.1.183","IBM Storwize V7000 FAB-1 Storage"
"172.23.1.184","IBM FlashSystem 900 Storage"


In the above examples there are two V7000 storage enclosures and the ip addresses are: 172.23.181 and 172.23.1.183

3. Determine if your system has the impacted drive. This command will provide the number of drives that match the 1.2 TB drive that matches the

$ grep 'SAN_FRAME[0-9]*[13579]_IP' /pschome/config/xcluster.cfg | while read a b c d;do echo "*** ${c} ***";ssh -n superuser@${c} 'lsdrive -nohdr| while read id rest;do lsdrive $id;done' | grep -c "product_id ST1200MM0007";done
*** 172.23.1.181 ***
27
*** 172.23.1.183 ***
35


4 The PureData System for Operational Analytics V1.1 FP1 image includes the necessary files to perform the drive update. These files were unpacked as part of the fixpack registration. Determine the location of the fixpack on the management host.

$ appl_ls_cat
NAME                     VERSION                       STATUS                   DESCRIPTION
bwr0                     4.0.4.0                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics
bwr1                     4.0.5.0                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics_DB2105


In the above command the fixpack files are part of the id 'bwr1'. This means the files were unpacked on the management host in /BCU_share/bwr1.

5. Determine the fix path by changing the <BWR> variable to the identifier determined in step 3 in the path /BCU_share/<BWR>/firmware/storage/2076/image/imports/drives. From the above example, the id was 'bwr1' so the path is "/BCU_share/bwr1/firmware/storage/2076/image/imports/drives".

6. Verify the fix file exists and also the cksum of the file.

$ ls -la /BCU_share/bwr1/firmware/storage/2076/image/imports/drives
total 162728
drwxr-xr-x    2 26976    19768           256 Jan 18 08:53 .
drwxr-xr-x    5 26976    19768           256 Jan 18 08:53 ..
-rw-r--r--    1 26976    19768      83313381 Jan 18 08:53 IBM2076_DRIVE_20160923


$ cksum /BCU_share/bwr1/firmware/storage/2076/image/imports/drives/IBM2076_DRIVE_20160923
3281318949 83313381 /BCU_share/bwr1/firmware/storage/2076/image/imports/drives/IBM2076_DRIVE_20160923


7. For each ip address identified in step 2. The example below uses 172.23.1.183, All V7000s can be updated concurrently.

a. Copy the image to storwize location /home/admin/upgrade

scp /BCU_share/bwr1/firmware/storage/2076/image/imports/drives/IBM2076_DRIVE_20160923 admin@172.23.1.183:/home/admin/upgrade
IBM2076_DRIVE_20160923                  100%   79MB  39.7MB/s   00:02

b. Update the drive using the command below:
ssh admin@172.23.1.183 "applydrivesoftware -file IBM2076_DRIVE_20160923 -all"

7. Monitor the status of drive upgrade using Isdriveupgradeprogress command. This following command like will report on the progress of all of the V7000s. Repeat this command until there is no longer any output indicating the updates have finished.

$ grep 'SAN_FRAME[0-9]*[13579]_IP' /pschome/config/xcluster.cfg | while read a b c d;do echo "*** ${c} ***";ssh -n superuser@${c} lsdriveprogress;done
*** 172.23.1.181 ***
*** 172.23.1.183 ***
Fixed:
----------
NA. This known issue is only applicable to this fixpack level.
KI007470
Preview failed for flash storage
Fixpack I_V1.1.0.1
Preview failed for flash storage
ymptoms:
1. Fix pack apply fails

2. The miupdate log shows the following:
=====================================================
[20 Jan 2017 04:09:59,225] <6291746 CTRL TRACE reverseflash01> CMMVC5994E Error in verifying the signature of the update package.
and
20 Jan 2017 04:09:59,230] <6291746 CTRL ERROR reverseflash01> STORAGE:storage1:172.23.1.182:1:Error: The update operation on the system cannot be performed.
[20 Jan 2017 04:09:59,230] <6291746 CTRL ERROR reverseflash01> STORAGE:storage3:172.23.1.184:1:Error: The update operation on the system cannot be performed.
=====================================================

This is a known issue with the Flash900 firmware as listed in the following URL:  https://www-01.ibm.com/support/docview.wss?uid=ssg1S1009254
Workaround:
-----------
If the issue cannot be resolved, contact IBM support
Fixed:
----------
NA
KI007523
Apply failed on one flash storage on reverseflash
Fixpack I_V1.1.0.1
Apply failed on one flash storage on reverseflash
The upgrade cannot proceed because of hardware errors.

LOG excerpts:
============

[02 Feb 2017 13:43:34,301] <3801392 UPDT ERROR reverseflash01> TASK_END::14::1 of 1::StorageUPD::172.23.1.182::::RC=1::Storage update failed on 172.23.1.182
[02 Feb 2017 13:43:34,302] <3801392 UPDT INFO reverseflash01> TASK_END::14::1 of 1::StorageUPD::172.23.1.184::::RC=0::Storage update succeeded on 172.23.1.184
[02 Feb 2017 13:43:34,497] <3801392 UPDT APPI DEBUG reverseflash01> Executing query Logical_name=Infrastructure AND Solution_version=4.0.5.0, to update status of Product
[02 Feb 2017 13:43:34,561] <3801392 UPDT APPI DEBUG reverseflash01> Executing query Sub_module_type=Infrastructure AND Solution_version=4.0.5.0, to update status of sub module
[02 Feb 2017 13:43:34,771] <3801392 UPDT APPI DEBUG reverseflash01> Executing query Logical_name=Infrastructure AND Solution_version=4.0.5.0, to update status of Product
[02 Feb 2017 13:43:34,818] <3801392 UPDT APPI DEBUG reverseflash01> Executing query Sub_module_type=Infrastructure AND Solution_version=4.0.5.0, to update status of sub module
[02 Feb 2017 13:43:34,925] <3801392 UPDT APPI ERROR reverseflash01> Error on nodes (172.23.1.182).
[02 Feb 2017 13:43:34,944] <3801392 UPDT APPI INFO reverseflash01> STEP_END::14::StorageFW_UPD::FAILED
[02 Feb 2017 13:43:34,950] <3801392 UPDT APPI DEBUG reverseflash01> Error occured in apply for product storagefw3
[02 Feb 2017 13:43:35,013] <3801392 UPDT APPI DEBUG reverseflash01> Executing query Logical_name=storagefw3 AND Solution_version=4.0.5.0, to update status of Product
[02 Feb 2017 13:43:35,154] <3801392 UPDT APPI ERROR reverseflash01> Apply (impact) phase for solution has failed.
[02 Feb 2017 13:43:35,224] <3801392 UPDT APPI DEBUG reverseflash01> Executing query Logical_name=bwr1 AND Solution_version=4.0.5.0, to update status of Solution
[02 Feb 2017 13:43:35,349] <3801392 UPDT APPI INFO reverseflash01> PHASE_END APPLY IMPACT
[02 Feb 2017 13:43:35,351] <3801392 UPDT APPI ERROR reverseflash01> The apply phase for the release 'bwr1' failed.
[02 Feb 2017 13:43:35,352] <3801392 UPDT RESU INFO reverseflash01> PHASE_END RESUME
[02 Feb 2017 13:43:35,353] <3801392 UPDT RESU ERROR reverseflash01> The resume phase for the release 'bwr1' failed.


[02 Feb 2017 11:42:10,533] <5112002 CTRL DEBUG reverseflash01> Extracted msg from NLS: apply: 172.23.1.182 ssh admin@172.23.1.182 LANG=en_US svctask applysoftware -file IBM9840_INSTALL_1.4.5.0 c
ommand failed.
[02 Feb 2017 11:42:10,534] <5112002 CTRL DEBUG reverseflash01> apply: 172.23.1.182: error: svctask applysoftware failed, rc=1
[02 Feb 2017 11:42:10,568] <2097734 CTRL DEBUG reverseflash01> apply: storage1: apply failed
[02 Feb 2017 11:46:48,787] <4259928 CTRL DEBUG reverseflash01> get_update_status: status is <upgrading 2
[02 Feb 2017 11:46:48,787] <4259928 CTRL DEBUG reverseflash01> >
Workaround:
-----------
If the issue cannot be resolved, contact IBM support
Fixed:
----------
NA
KI007539
The FixCentral download for V1.1 FP1 includes V1.0.0.5 filenames. This is confusing.
Fixpack I_V1.1.0.1
The FixCentral download for V1.1 FP1 includes V1.0.0.5 filenames. This is confusing.
When downloading the files for V1.1 FP1, I noticed that the filenames say 1.0.0.5. Did I download the right files? Is there a mistake in FixCentral?
Workaround:
-----------
There is no mistake in FixCentral. The fixpack files for V1.0.0.5 (DB2 10.5) and V1.1.0.1 are exactly the same.

The file listing for V1.1.0.1 is:

1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.fo.xml
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.readme
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_001
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_002
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_003
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_004
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_005
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_006
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_007
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_008
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_009
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_010
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_011
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_012
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_013
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_014
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_015
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_016
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_017
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_018
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_019
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_020
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_021
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005.tar_part_022
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005_SG_Single_1489238245139.xml
1.0.0.5-IM-PureData_System_for_OpAnalytics_DB2105-fp005_SG_Single_1489373378095.xml
1.1.0.1-IM-PureData_System_for_OpAnalytics_DB2105-fp005.fo.xml
1.1.0.1-IM-PureData_System_for_OpAnalytics_DB2105-fp005_SG_Single_1489273702786.xml
1.1.0.1-IM-PureData_System_for_OpAnalytics_DB2105-fp005_SG_Single_1489374195057.xml

As per the previous known issue the XML files were inadvertently included in the fixpack and are not needed.
Fixed:
----------
NA
KI007560
A new drive type, AL14SEB120N, in the V7000 gen2 will cause a preview failure. [ Added 2017-12-22]
Fixpack I_V1.1.0.1
A new drive type, AL14SEB120N, in the V7000 gen2 will cause a preview failure. [ Added 2017-12-22]
The above drive type was added as a possible drive for V7000 gen2 enclosures after the PDOA Fixpack was released and is not recognized by the V7000 test application shipped in that fixpack. This can lead to errors similar to the following during the preview stage.

[20 Dec 2017 09:34:43,469] <12386730 CTRL  TRACE  host01>  This tool has detected that the cluster contains
[20 Dec 2017 09:34:43,470] <12386730 CTRL  TRACE  host01>  one or more internal disks that are not known drive types.
[20 Dec 2017 09:34:43,470] <12386730 CTRL  TRACE  host01>  Please retry with the latest version of svcupgradetest. If this
[20 Dec 2017 09:34:43,470] <12386730 CTRL  TRACE  host01>  error is still being reported, please contact your support representative.
[20 Dec 2017 09:34:43,470] <12386730 CTRL  TRACE  host01>  +----------------------+-------------+
[20 Dec 2017 09:34:43,470] <12386730 CTRL  TRACE  host01>  | Reported model       | Drive count |
[20 Dec 2017 09:34:43,471] <12386730 CTRL  TRACE  host01>  +----------------------+-------------+
[20 Dec 2017 09:34:43,471] <12386730 CTRL  TRACE  host01>  | AL14SEB120N          | 32          |
[20 Dec 2017 09:34:43,471] <12386730 CTRL  TRACE  host01>  +----------------------+-------------+
[20 Dec 2017 09:34:43,471] <12386730 CTRL  TRACE  host01>  To see the list of entries in any of the above tables
[20 Dec 2017 09:34:43,471] <12386730 CTRL  TRACE  host01>  re-run the tool with a -d parameter added on the end.
[20 Dec 2017 09:34:43,472] <12386730 CTRL  TRACE  host01>

This will cause the preview step to fail.


No PDOA environments were shipped with this drive type prior to December, 2017. However, it is possible that a drive replacement could introduce this drive type to an existing PDOA V1.1 environment leading to this scenario.
Workaround:
-----------
The only resolution to this issue is to update the firmware of all V7000 enclosures that contain this drive type to a level at or above the level shipped with the fixpack.

The V7000 storage enclosures support concurrent firmware updates, however updates should only be performed when there is a light workload. The V7000 enclosures in a PDOA V1.1 environment provide management storage with generally minimal io requirements as well second tier or backup storage for the database partitions.

It is recommended to quiesce the system prior to applying the update. The update should take approx 3 hours to perform for planning purposes.


Step 1: Identify all V7000 storage enclosures. The number of enclosures will vary depending on the size of the environment. This is run as root on the management node.
=====================================

$: grep 'SAN_FRAME[0-9]*[13579]_IP' /pschome/config/xcluster.cfg | while read a b c d;do echo "*** ${c} ***";ssh -n superuser@${c} 'lsdrive -nohdr| while read id rest;do lsdrive $id;done' | grep -c "product_id AL14SEB120N";done

*** 172.23.1.181 ***
39
*** 172.23.1.183 ***
32
*** 172.23.1.185 ***
46
*** 172.23.1.187 ***
41


Step 2: Identify the pflayer storage# id's associated with the above enclosures.
=====================================

$ appl_ls_hw -r storage -A Logical_name,Description,M_IP_address | sort
"storage0","IBM Storwize V7000 FAB-1 Storage","172.23.1.181"
"storage1","IBM FlashSystem 900 Storage","172.23.1.182"
"storage2","IBM Storwize V7000 FAB-1 Storage","172.23.1.183"
"storage3","IBM FlashSystem 900 Storage","172.23.1.184"
"storage4","IBM Storwize V7000 FAB-1 Storage","172.23.1.185"
"storage5","IBM FlashSystem 900 Storage","172.23.1.186"
"storage6","IBM Storwize V7000 FAB-1 Storage","172.23.1.186"
"storage7","IBM FlashSystem 900 Storage","172.23.1.187"


Step 3: Run the pflayer validation check replacing the 'storage#,storage#,' with the storage identified above.
=====================================

$PL_ROOT/bin/icmds/appl_ctrl_storage update -validate -t 7.5.0.11 -l "storage0,storage2,storage4,storage6" -f /BCU_share/bwr1/firmware/storage/2076/image

STORAGE:storage2:172.23.1.183:1:Error: The update operation on the system cannot be performed.
STORAGE:storage6:172.23.1.187:1:Error: The update operation on the system cannot be performed.
STORAGE:storage4:172.23.1.185:1:Error: The update operation on the system cannot be performed.
STORAGE:storage0:172.23.1.181:1:Error: The update operation on the system cannot be performed.

Step 4: Run the pflayer prepare step by replacing the 'storage#,storage#,' with the storage identified above.
=====================================

$PL_ROOT/bin/icmds/appl_ctrl_storage update -prepare -l "storage0,storage2,storage4,storage6" -f /BCU_share/bwr1/firmware/storage/2076/image
STORAGE:storage2:172.23.1.183:0:
STORAGE:storage6:172.23.1.187:0:
STORAGE:storage4:172.23.1.185:0:
STORAGE:storage0:172.23.1.181:0:

Step 5: Run the pflayer update step by replacing the 'storage#,storage#,' with the storage identified above.
=====================================
nohup $PL_ROOT/bin/icmds/appl_ctrl_storage update -install -t 7.5.0.11 -l "storage0,storage2,storage4,storage6" -f /BCU_share/bwr1/firmware/storage/2076/image


Step 6: Rerun the preview step as part of the Fixpack application.
=====================================
Fixed:
----------
NA.
KI007583
Trial DB2 10.5 License Discovered after fixpack apply phase.
Fixpack
I_V1.0.0.5
I_V1.1.0.1
Trial DB2 10.5 License Discovered after fixpack apply phase.
Due to a change in the mechanims of updating DB2 10.5 in V1.0.0.5 and V1.1.0.1 if DB210.5  is updated by V1.0.0.5 or V1.1.0.1 then it will not apply the appropriate license file to the new DB2 copies on the core hosts. This only impacts the core hosts and not the management hosts. This does not impact DB2 11.1 nor DB2 10.1.
The following command will show the current licenses installed for all Known DB2 copies on the core nodes:
$ dsh -n ${BCUDB2ALL} '/usr/local/bin/db2ls -c | grep -v "#" | cut -d: -f 1 | while read f;do ${f}/adm/db2licm -l;done' | dshbak -c
HOSTS -------------------------------------------------------------------------
kf5hostname02, kf5hostname05, kf5hostname06, kf5hostname07
-------------------------------------------------------------------------------
Product name:                     "DB2 Advanced Enterprise Server Edition"
License type:                     "Trial"
Expiry date:                      "06/04/2018"
Product identifier:               "db2aese"
Version information:              "10.5"
Product name:                     "DB2 Enterprise Server Edition"
License type:                     "Trial"
Expiry date:                      "06/04/2018"
Product identifier:               "db2ese"
Version information:              "10.5"

HOSTS -------------------------------------------------------------------------
kf5hostname04
-------------------------------------------------------------------------------
Product name:                     "DB2 Advanced Enterprise Server Edition"
License type:                     "Trial"
Expiry date:                      "06/09/2018"
Product identifier:               "db2aese"
Version information:              "10.5"
Product name:                     "DB2 Enterprise Server Edition"
License type:                     "Trial"
Expiry date:                      "06/09/2018"
Product identifier:               "db2ese"
Version information:              "10.5"
Workaround:
-----------
Ensure that /BCU_share is mounted on all core hosts.
Login to the management host as the root user.
$ dsh -n ${BCUDB2ALL} 'mount | grep /BCU_share'
### If not mounted, then mount /BCU_share
### This command assumes that the management hosts internal ip address is 172.23.1.1.
$ dsh -n ${BCUDB2ALL} 'mount 172.23.1.1:/BCU_share /BCU_share'

# Verify:
$ dsh -n ${BCUDB2ALL} 'mount | grep /BCU_share'
kf5hostname05: 172.23.1.1 /BCU_share       /BCU_share       nfs3   Mar 14 17:53
kf5hostname02: 172.23.1.1 /BCU_share       /BCU_share       nfs3   Mar 14 17:53
kf5hostname07: 172.23.1.1 /BCU_share       /BCU_share       nfs3   Mar 14 17:53
kf5hostname06: 172.23.1.1 /BCU_share       /BCU_share       nfs3   Mar 14 17:53
kf5hostname04: 172.23.1.1 /BCU_share       /BCU_share       nfs3   Mar 14 17:53

# Find the installation directory for the fixpack.
$ appl_ls_cat
NAME                     VERSION                       STATUS                   DESCRIPTION
bwr0                     4.0.4.2                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics
bwr1                     4.0.5.0                       Committed                Updates for IBM_PureData_System_for_Operational_Analytics_DB2105

--> The DB2  license file used in PDOA environments can be found in the following location. Replace 'bwr1' with the appropriate name from the appl_ls_cat command.
/BCU_share/bwr1/software/ISW/PDS/warehouse/db2aese_c.lic
Verify that the license file is for the DB2 Copy in question, for example, DB2 10.5.
$ cat /BCU_share/bwr1/software/ISW/PDS/warehouse/db2aese_c.lic | grep ProductVersion
ProductVersion=10.5

# Verify that all host can see the file.
dsh -n ${BCUDB2ALL} 'cksum /BCU_share/bwr1/software/ISW/PDS/warehouse/db2aese_c.lic' | dshbak -c
HOSTS -------------------------------------------------------------------------
kf5hostname02, kf5hostname04, kf5hostname05, kf5hostname06, kf5hostname07
-------------------------------------------------------------------------------
1513072379 915 /BCU_share/bwr1/software/ISW/PDS/warehouse/db2aese_c.lic

# Apply the license file. This is done as root for all hosts.
Find the db2 file:
dsh -n ${BCUDB2ALL} '/usr/local/bin/db2ls -c | grep -v "#" | cut -d: -f 1'
kf5hostname02: /usr/IBM/dwe/db2/V10.5.0.8..0
kf5hostname04: /usr/IBM/dwe/db2/V10.5.0.8..0
kf5hostname07: /usr/IBM/dwe/db2/V10.5.0.8..0
kf5hostname06: /usr/IBM/dwe/db2/V10.5.0.8..0
kf5hostname05: /usr/IBM/dwe/db2/V10.5.0.8..0

dsh -n ${BCUDB2ALL} '/usr/IBM/dwe/db2/V10.5.0.8..0/adm/db2licm -a /BCU_share/bwr1/software/ISW/PDS/warehouse/db2aese_c.lic'
kf5hostname02:
kf5hostname02: LIC1402I  License added successfully.
kf5hostname02:
kf5hostname02:
kf5hostname02: LIC1426I  This product is now licensed for use as outlined in your License Agreement.  USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/usr/IBM/dwe/db2/V10.5.0.8..0/license/en_US.iso88591"
kf5hostname04:
kf5hostname04: LIC1402I  License added successfully.
kf5hostname04:
kf5hostname04:
kf5hostname04: LIC1426I  This product is now licensed for use as outlined in your License Agreement.  USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/usr/IBM/dwe/db2/V10.5.0.8..0/license/en_US.iso88591"
kf5hostname07:
kf5hostname07: LIC1402I  License added successfully.
kf5hostname07:
kf5hostname07:
kf5hostname07: LIC1426I  This product is now licensed for use as outlined in your License Agreement.  USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/usr/IBM/dwe/db2/V10.5.0.8..0/license/en_US.iso88591"
kf5hostname06:
kf5hostname06: LIC1402I  License added successfully.
kf5hostname06:
kf5hostname06:
kf5hostname06: LIC1426I  This product is now licensed for use as outlined in your License Agreement.  USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/usr/IBM/dwe/db2/V10.5.0.8..0/license/en_US.iso88591"
kf5hostname05:
kf5hostname05: LIC1402I  License added successfully.
kf5hostname05:
kf5hostname05:
kf5hostname05: LIC1426I  This product is now licensed for use as outlined in your License Agreement.  USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/usr/IBM/dwe/db2/V10.5.0.8..0/license/en_US.iso88591"

# Verify the license is applied.
dsh -n ${BCUDB2ALL} '/usr/IBM/dwe/db2/V10.5.0.8..0/adm/db2licm -l' | dshbak -c
HOSTS -------------------------------------------------------------------------
kf5hostname02, kf5hostname04, kf5hostname05, kf5hostname06, kf5hostname07
-------------------------------------------------------------------------------
Product name:                     "DB2 Advanced Enterprise Server Edition"
License type:                     "CPU Option"
Expiry date:                      "Permanent"
Product identifier:               "db2aese"
Version information:              "10.5"
Enforcement policy:               "Soft Stop"
Fixed:
----------
V1.0.0.6/V1.1.0.2
KI007647
IBM PureData System for Operational Analytics environments may be vulnerable to Flash900 HIPER involving a crash or data corruption.
General I_V1.1.0.0
IBM PureData System for Operational Analytics environments may be vulnerable to Flash900 HIPER involving a crash or data corruption.
BM PureData System for Operational Analytics environments built and shipped in 2015 included Flash900 FW levels that contain a serious HIPER. All customers should verify their Flash900 firmware levels and should plan on applying the firmware level of 1.4.5.0 as soon as possible.
See technote http://www.ibm.com/support/docview.wss?uid=swg22005436 for more information.
Workaround:
-----------
See technote http://www.ibm.com/support/docview.wss?uid=swg22005436 for more information.
Fixed:
----------
V1.1.0.1
KI007447
IBM PureData System for Operational Analytics environment contain an incorrect crontab entry for root.
General
I_V1.0
I_V1.1
IBM PureData System for Operational Analytics environment contain an incorrect crontab entry for root.
IBM PureData System for Operational Analytics V1.0 and V1.1 environments contain the following crontab entry on all AIX hosts which while harmless is unnecessary. "* * * * /usr/bin/stcron -parmfile /etc/stcron >/dev/null 2>&1"
Workaround:
-----------
See technote https://www-01.ibm.com/support/docview.wss?uid=swg21996578 for more information.
Fixed:
----------
NA
KI007366
Excessive events and alerts in an IBM PureData System for Operational Analytics environment.
General
I_V1.0
I_V1.1
Excessive events and alerts in an IBM PureData System for Operational Analytics environment.
Too many events, snmp traps, and active status entries appear in the IBM PureData System for Operational Analytics console as well as in the IBM Systems Director console. In addition too many Active Status records are generated in IBM System Director which leads to increased System Director start times.
Workaround:
-----------
See technote http://www.ibm.com/support/docview.wss?uid=swg21994243  for more information.
Fixed:
----------
NA
KI005724:
Inconsistent or insufficient system dump device definitions on AIX hosts in the appliance.
General
I_V1.0
I_V1.1
Inconsistent or insufficient system dump device definitions on AIX hosts in the appliance
From initial deployment and AIX updates during appliance fixpack applications the following issues may be seen on the AIX hosts in the appliance.  This symptom should only be remedied before a fixpack is applied or after the fixpack is committed when rootvg is mirrored.

1. Insufficient dump device size.

The following errpt snippet may be encountered on one or more of the AIX hosts in the appliance.
  LABEL:                   DMPCHK_TOOSMALL  IDENTIFIER:          E87EF1BE     Date/Time:             Wed Mar  6 15:00:00 CST 2019  Sequence Number: 1890  Machine Id:            00FAC8F34C00  Node Id:                ap29pdimdb02  Class:                    O  Type:                    PEND  WPAR:                 Global  Resource Name:    dumpcheck     Description  The largest dump device is too small.     Probable Causes  Neither dump device is large enough to accommodate a system dump at this time.             Recommended Actions          Increase the size of one or both dump devices.

2. Inconsistent dump device specifications.

a. hd7 is not defined. This occurs most often to V1.0 customers with Power 7 / IOC Systems that have applied FP3.

  # Returns Blank V1.0  $ dsh -n ${ALL} 'lsvg -l rootvg | grep "^hd7"' | sort
or
  $ dsh -n ${ALL} 'lslv -l hd7' 2>&1 | dshbak -c  HOSTS -------------------------------------------------------------------------  stgkf201, stgkf202, stgkf203, stgkf204, stgkf205, stgkf206, stgkf208  -------------------------------------------------------------------------------  0516-306 lslv: Unable to find  hd7 in the Device          Configuration Database.    

# The following three symptoms are likely to happen at the same time. These symptoms occur after AIX is updated as part of V1.0.0.4, V1.0.0.5, V1.0.0.6, V1.1.0.1, V1.1.0.2 or through a support related activity.


b. hd7 is mirrored.
c. hd7 or lg_dumplv is incorrectly sized.
d. hd7 is inactive

  ### The following shows hd7 is inactive (closed/syncd), is mirrored (column 5 shows 2 copies) and incorrectly sized (column 3 shows 3 PPs).  $ dsh -n ${ALL} 'lsvg -l rootvg | grep "^hd7"' | sort  kf5hostname01: hd7                 sysdump    3       6       2    closed/syncd  N/A  kf5hostname02: hd7                 sysdump    3       6       2    closed/syncd  N/A  kf5hostname03: hd7                 sysdump    3       6       2    closed/syncd  N/A  kf5hostname04: hd7                 sysdump    3       6       2    closed/syncd  N/A  kf5hostname05: hd7                 sysdump    3       6       2    closed/syncd  N/A  kf5hostname06: hd7                 sysdump    3       6       2    closed/syncd  N/A  kf5hostname07: hd7                 sysdump    3       6       2    closed/syncd  N/A    ### The following shows hd7 is not defined as the secondary dump device.  $ dsh -n ${ALL} 'sysdumpdev -l | grep "^secondary"' | dshbak -c  HOSTS -------------------------------------------------------------------------  kf5hostname01, kf5hostname02, kf5hostname03, kf5hostname04, kf5hostname05, kf5hostname06, kf5hostname07  -------------------------------------------------------------------------------  secondary            /dev/sysdumpnull
  ### The following shows hd7 is mirrored on both hdisk0 and hdisk1.  $ dsh -n ${ALL} 'lslv -l hd7' | dshbak -c  HOSTS -------------------------------------------------------------------------  kf5hostname01, kf5hostname02, kf5hostname03, kf5hostname06, kf5hostname07  -------------------------------------------------------------------------------  hd7:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk0            003:000:000   100%          000:003:000:000:000  hdisk1            003:000:000   100%          000:003:000:000:000    HOSTS -------------------------------------------------------------------------  kf5hostname04, kf5hostname05  -------------------------------------------------------------------------------  hd7:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk1            003:000:000   100%          000:003:000:000:000  hdisk0            003:000:000   100%          000:003:000:000:000
Workaround:
-----------
The fix for this is a manual fix performed by a user with root authority.
Here are the goals assuming rootvg is comprised of hdisk0 and hdisk1. Over time it is possible that rootvg will use other hdisk #'s although it is rare.
lg_dumplv:
  • Assigned to hdisk0 only. While rootvg is a mirrored vg, lg_dumplv is setup as an LV with 1 copy.
  • Minimum size is 7 PPs. Some customers may need to increase the size if they continue to see DMPCHK_TOOSMALL errors in errpt.
  • lg_dumplv should be defined as the Primary Dump Device.
hd7:
  • Assigned to hdisk1 only. While rootvg is a mirrored vg, hd7 is setup as an LV with 1 copy.
  • Same size as lg_dumplv.
  • hd7 should be defined as the Secondary Dump Device.

ISSUE 2A: If hd7 is not created.

1. Determine the device which contains lg_dumplv. The following commands shows lg_dumplv resides on hdisk0.

  $ lslv -l lg_dumplv  lg_dumplv:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk0            007:000:000   85%           000:006:001:000:000


2. Determine the hdisks assigned to rootvg. The following show hdisk0 and hdisk1.

  $ lsvg -p rootvg  rootvg:  PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION  hdisk0            active            558         373         111..00..39..111..112  hdisk1            active            558         380         111..06..40..111..112


3. Create hd7 on the opposite device of lg_dumplv. The above commands show this is hdisk1.

  $ mklv -y hd7 -t sysdump rootvg 7 hdisk1  hd7


4. Verify it is created and only exists on hdisk1.

  $ lslv -l hd7  hd7:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk1            007:000:000   100%          000:007:000:000:000    5. Verify it is closed/syncd. Not used.    $ lsvg -l rootvg | grep hd7  hd7                 sysdump    7       7       1    closed/syncd  N/A    

ISSUE 2B: hd7 is mirrored.

1. Determine the device which contains lg_dumplv. The following commands shows lg_dumplv resides on hdisk0.

  $ lslv -l lg_dumplv  lg_dumplv:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk0            007:000:000   85%           000:006:001:000:000


2. Determine the devices which contain hd7. The following show hd7 resides on hdisk0 and hdisk1.

  $ lslv -l hd7  hd7:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk0            003:000:000   100%          000:003:000:000:000  hdisk1            003:000:000   100%          000:003:000:000:000


3. Remove the hd7 copy that resides on the same disk as lg_dumplv. The goal is for them to be on opposite disks.

  $ rmlvcopy hd7 1 hdisk0      4. Verify that hd7 is assigned to the write disk.  $ lslv -l hd7  hd7:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk1            003:000:000   100%          000:003:000:000:000    

ISSUE 2C:

1. Compare the sysdumpdev -e output to 7, 8 and 9 GiBs to see if 7, 8 or 9 GiBs are sufficient for dump space.

The output below shows that 7 GiBs is sufficient for the dump for all hosts.
  $ dsh -n ${ALL} 'printf "$(expr 7 \* 1024 \* 1024 \* 1024)";sysdumpdev -e | cut -d: -f2' | sort  kf5hostname01: 7516192768 4168509030  kf5hostname02: 7516192768 6432468499  kf5hostname03: 7516192768 4011411373  kf5hostname04: 7516192768 6454614425  kf5hostname05: 7516192768 7023487876  kf5hostname06: 7516192768 7300311940  kf5hostname07: 7516192768 7224877382    
  $ dsh -n ${ALL} 'printf "$(expr 8 \* 1024 \* 1024 \* 1024)";sysdumpdev -e | cut -d: -f2' | sort  kf5hostname01: 8589934592 4168509030  kf5hostname02: 8589934592 6432468499  kf5hostname03: 8589934592 4011411373  kf5hostname04: 8589934592 6454614425  kf5hostname05: 8589934592 7023487876  kf5hostname06: 8589934592 7300311940  kf5hostname07: 8589934592 7224646696    
  $ dsh -n ${ALL} 'printf "$(expr 9 \* 1024 \* 1024 \* 1024)";sysdumpdev -e | cut -d: -f2' | sort  kf5hostname01: 9663676416 4168509030  kf5hostname02: 9663676416 6432468499  kf5hostname03: 9663676416 4011411373  kf5hostname04: 9663676416 6454614425  kf5hostname05: 9663676416 7023487876  kf5hostname06: 9663676416 7300311940  kf5hostname07: 9663676416 7224877382

2. Check the # of LPs assigned to hd7 and lg_dumplv

  $ for lv in hd7 lg_dumplv;do echo " *** ${lv} *** ";lslv ${lv} | grep LPs;done   *** hd7 ***  MAX LPs:            512                    PP SIZE:        1024 megabyte(s)  LPs:                3                      PPs:            3   *** lg_dumplv ***  MAX LPs:            512                    PP SIZE:        1024 megabyte(s)  LPs:                7                      PPs:            7

In this case the goal is to have both LPs set to 7. PPs will also be 7 as there is only 1 LV copy.
We need to extend hd7 by 4 LPs.

3. Use the extendlv command to increase the number of LPs on all devices to equal the proper amount. In this example, extend hd7 by 4 LPs.

  $ extendlv hd7 4

4. Verify the number of PPs matches the goals.

  $ for lv in hd7 lg_dumplv;do echo " *** ${lv} *** ";lslv ${lv} | grep LPs;done   *** hd7 ***  MAX LPs:            512                    PP SIZE:        1024 megabyte(s)  LPs:                7                      PPs:            7   *** lg_dumplv ***  MAX LPs:            512                    PP SIZE:        1024 megabyte(s)  LPs:                7                      PPs:            7


5. Verify that the two dump devices are not copied and exist on only one hdisk.

  $ for lv in hd7 lg_dumplv;do echo " *** ${lv} *** ";lslv -l ${lv};done   *** hd7 ***  hd7:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk1            007:000:000   100%          000:007:000:000:000   *** lg_dumplv ***  lg_dumplv:N/A  PV                COPIES        IN BAND       DISTRIBUTION  hdisk0            007:000:000   100%          000:007:000:000:000    


ISSUE 2D.

1. Verify the dump settings. The following shows lg_dumplv is defined as the primary and nothing as the secondary.

  $ sysdumpdev -l  primary              /dev/lg_dumplv  secondary            /dev/sysdumpnull  copy directory       /var/adm/ras  forced copy flag     TRUE  always allow dump    FALSE  dump compression     ON  type of dump         fw-assisted  full memory dump     disallow

2. Verify that hd7 exists, is correctly sized and not mirrored and then modify the secondary device to point to hd7.

  $ sysdumpdev -Ps /dev/hd7  primary              /dev/lg_dumplv  secondary            /dev/hd7  copy directory       /var/adm/ras  forced copy flag     TRUE  always allow dump    FALSE  dump compression     ON  type of dump         fw-assisted  full memory dump     disallow

3. Verify that the dump devices are active.

  $ lsvg -l rootvg | grep sysdump  lg_dumplv           sysdump    7       7       1    open/syncd    N/A  hd7                 sysdump    7       7       1    open/syncd    N/A
Fixed:
----------
N/A. Requires workaround.
KI007649
V7000 Canister 1 failed to reboot while V7000 upgrade is in progress
Fixpack
I_V1.0.0.5
I_V1.1.0.1
V700 Canister 1 failed to reboot while V7000 upgrade is in progress
This is similar to KI006996.
The fixpack stage will time out after 4 hours with a failure.
The command "svcupdate lsupdate" will show 50% complete.
Workaround:
-----------
Step 1: Checking the upgrade status in more granularity than the fixpack status dispays. To save time, start this step 90 minutes after the storage update has started., otherwise wait until the apply phase has failed (approximately 4 hours). Repeat this step in 10 minute intervals and look closely at the status and progress fields. If after 30 minutes the progress monitor has not moved proceed to the next step. This command is run as the root user on the management host. The root user on the management host uses key based authentication to access all storage enclosures without needing a password.
  $ grep "SAN_FRAME[0-9]*[0-9]_IP" /pschome/config/xcluster.cfg | while read frame eq ip rest;do echo "*** ${ip} ***";ssh -n superuser@${ip} "lsupdate";done    
  *** 172.23.1.204 ***  status success  event_sequence_number  progress  estimated_completion_time  suggested_action start  system_new_code_level  system_forced no  system_next_node_status none  system_next_node_time  system_next_node_id  system_next_node_name  *** 172.23.1.205 ***  status success  event_sequence_number  progress  estimated_completion_time  suggested_action start  system_new_code_level  system_forced no  system_next_node_status none  system_next_node_time  system_next_node_id  system_next_node_name  *** 172.23.1.206 ***  status success  event_sequence_number  progress  estimated_completion_time  suggested_action start  system_new_code_level  system_forced no  system_next_node_status none  system_next_node_time  system_next_node_id  system_next_node_name  *** 172.23.1.207 ***  status success  event_sequence_number  progress  estimated_completion_time  suggested_action start  system_new_code_level  system_forced no  system_next_node_status none  system_next_node_time  system_next_node_id  system_next_node_name  *** 172.23.1.208 ***  status success  event_sequence_number  progress  estimated_completion_time  suggested_action start  system_new_code_level  system_forced no  system_next_node_status none  system_next_node_time  system_next_node_id  system_next_node_name  *** 172.23.1.209 ***  status success  event_sequence_number  progress  estimated_completion_time  suggested_action start  system_new_code_level  system_forced no  system_next_node_status none  system_next_node_time  system_next_node_id  system_next_node_name

This command will show the update status for all of the storage enclosures and works for V7000 and Flash900 enclosures. Note that in older V7000 firmware levels the command was lssoftwareupgradestatus, this command still works but provides less information in a different format.

Step 2: As root on the management host, run the following command. This is a preliminary check to look for this symptom when it is time to call in a service ticket. All enclosures should show two Active nodes as shown below. For this symptom the lsservicenodes commands will show just one node for the storage enclosure. This is just a quick finding that can be provided initially to IBM Support.

  $ grep "SAN_FRAME[0-9]*[0-9]_IP" /pschome/config/xcluster.cfg | while read frame eq ip rest;do echo "*** ${ip} ***";ssh -n superuser@${ip} "sainfo lsservicenodes";done  *** 172.23.1.204 ***  panel_name cluster_id       cluster_name node_id node_name relation node_status error_data  02-2       0000010020A00EE0 V7_00        1       node1     local    Active  02-1       0000010020A00EE0 V7_00        2       node2     partner  Active  *** 172.23.1.205 ***  panel_name cluster_id       cluster_name node_id node_name relation node_status error_data  01-1       0000020062AA266E Flash_00     1       node1     local    Active  01-2       0000020062AA266E Flash_00     2       node2     partner  Active  *** 172.23.1.206 ***  panel_name cluster_id       cluster_name node_id node_name relation node_status error_data  02-1       0000010020A00EF8 V7_01        2       node2     local    Active  02-2       0000010020A00EF8 V7_01        1       node1     partner  Active  *** 172.23.1.207 ***  panel_name cluster_id       cluster_name node_id node_name relation node_status error_data  01-2       0000020062EA268E Flash_01     2       node2     local    Active  01-1       0000020062EA268E Flash_01     1       node1     partner  Active  *** 172.23.1.208 ***  panel_name cluster_id       cluster_name node_id node_name relation node_status error_data  02-2       0000010020800E4E V7_02        1       node1     local    Active  02-1       0000010020800E4E V7_02        2       node2     partner  Active  *** 172.23.1.209 ***  panel_name cluster_id       cluster_name node_id node_name relation node_status error_data  01-2       0000020062AA26AA Flash_02     2       node2     local    Active  01-1       0000020062AA26AA Flash_02     1       node1     partner  Active

Step 3: Open a ticket with IBM Support. The most common high level scenario is as follows for this symptom:

i. IBM Support will ask for a support package or svc_snap on the problematic enclosure. This will be used to verify system details and look for some root cause for failures.

ii. IBM Support will send a CE or SSR onsite with a replacement canister based on the appropriate part as determined using the svc_snap.

iii. The CE or SSR will work with IBM Support to reseat the canister. This step can take 30 minutes for each reseat attempt which is usually done twice before a replacement is attempted.

iv. If the reseat brings the canister up then IBM Support will work with the customer to complete or backout the update. If the reseat fails then the canister will be replaced. 

Step 4. Once the storage is at a steady state (either completely updated or rolled back) the fixpack stage can be resumed.

Fixed:
----------
N/A. V7000 and Flash900 firmware updates improve with each update and can reduce the possibility of encountering these issues.
KI007650
DPM upgrade failed because OPMDB runs out of transaction log space.
Fixpack
I_V1.0.0.5
I_V1.1.0.1
DPM upgrade failed because OPMDB runs out of transaction log space.
Workaround:
-----------
Update the DPM database setting LOGSECOND to 200. This settings requires the database to be started to change and to be restarted to take effect. While this does not impact all customers, this step can be done for all customers prior to running the management update phase or after the management phase has failed.
Step 1:This step will stop DPM. If this is done prior to the management phase failure verify that DPM can be stopped for a short time. Also do not attempt to do this while the fixpack is actively running without guidance from IBM Support. The following steps are tedious and complicated. Read through the steps and if there is any concern about following these steps please contact IBM Support. If any of the steps don't match expected output, please also contact IBM support.
Step 1: Login as root to the management host.
Step 2: Verify the status of DPM. DPM should be in the same state once the LOGSECOND update is complete. Follow the instructions next to the status the matches.
a) Domain is online and DPM is online. Follow the substeps when the status matches.
  $ hals -mgmt  MANAGEMENT DOMAIN  +============+===============+===============+===============+=================+=================+=============+  | COMPONENT  | PRIMARY       | STANDBY       | CURRENT       | OPSTATE         | HA STATUS       | RG REQUESTS |  +============+===============+===============+===============+=================+=================+=============+  | WASAPP     | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  | DB2APP     | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  | DPM        | kf5hostname01 | kf5hostname03 | kf5hostname01 | Online          | Normal          | -           |  | DB2DPM     | kf5hostname01 | kf5hostname03 | kf5hostname01 | Online          | Normal          | -           |  +============+===============+===============+===============+=================+=================+=============+  
  • SubStep a.1: Run the following as root on the management host to verify the current setting.  Replace the ssh target with the hostname in the CURRENT column for the DPM resources.
  $ ssh kf5hostname01 'su - db2opm -c "db2 get db cfg for opmdb | grep LOGSECOND"'   Number of secondary log files               (LOGSECOND) = 120    

  • SubStep a.2: Update LOGSECOND to 200. Using the following commands as root on management. Notice the requirement to reactive the database. Even though the setting reports as 200, it will require restarting the database.  The second check command shows 120 for the current setting and 200 for the new setting.
  $ ssh kf5hostname01 'su - db2opm -c "db2 update db cfg for opmdb using LOGSECOND 200"'  DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.  SQL1363W  One or more of the parameters submitted for immediate modification  were not changed dynamically. For these configuration parameters, the database  must be shutdown and reactivated before the configuration parameter changes  become effective.    
  $ ssh kf5hostname01 'su - db2opm -c "db2 connect to opmdb;db2 get db cfg for opmdb show detail| grep LOGSECOND"'       Database Connection Information     Database server        = DB2/AIX64 10.5.10   SQL authorization ID   = DB2OPM   Local database alias   = OPMDB     Number of secondary log files               (LOGSECOND) = 120                        200    
  • Substep a.3: Restart the DPM using hastopdpm and hastartdpm. As root on the management host.
  $ hastopdpm;hastartdpm  Stopping DPM and DB2 instance............................Resources offline  MANAGEMENT DOMAIN  +============+===============+===============+===============+=================+=================+=============+  | COMPONENT  | PRIMARY       | STANDBY       | CURRENT       | OPSTATE         | HA STATUS       | RG REQUESTS |  +============+===============+===============+===============+=================+=================+=============+  | WASAPP     | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  | DB2APP     | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  | DPM        | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  | DB2DPM     | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  +============+===============+===============+===============+=================+=================+=============+    Starting DPM and DB2 instance..............................Resources online  MANAGEMENT DOMAIN  +============+===============+===============+===============+=================+=================+=============+  | COMPONENT  | PRIMARY       | STANDBY       | CURRENT       | OPSTATE         | HA STATUS       | RG REQUESTS |  +============+===============+===============+===============+=================+=================+=============+  | WASAPP     | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  | DB2APP     | kf5hostname01 | N/A           | N/A           | Offline         | Offline         | -           |  | DPM        | kf5hostname01 | kf5hostname03 | kf5hostname01 | Online          | Normal          | -           |  | DB2DPM     | kf5hostname01 | kf5hostname03 | kf5hostname01 | Online          | Normal          | -           |  +============+===============+===============+===============+=================+=================+=============+    
  • Substep a.4: Verify the setting and proceed.
  $ ssh kf5hostname01 'su - db2opm -c "db2 connect to opmdb;db2 get db cfg for opmdb show detail| grep LOGSECOND"'       Database Connection Information     Database server        = DB2/AIX64 10.5.10   SQL authorization ID   = DB2OPM   Local database alias   = OPMDB     Number of secondary log files               (LOGSECOND) = 200                        200    
b) Domain is offline.
  $ hals -mgmt  none are available... returning    
  • Substep b.1: Verify the current location of the DPM instance.
  $ cat ~db2opm/sqllib/db2nodes.cfg  0 kf5hostname01 0 kf5hostname01
  • Substep b.2: Check if the database is running with db2pd. This command is run as root replacing the ssh hostname with the hostname listed in the db2nodes.cfg file. If Running proceed to Substep b.4, otherwise use Substep b.3.
If Active proceed to b.4.
  $ ssh kf5hostname01 'su - db2opm -c "db2pd -"'    Database Member 0 -- Active -- Up 0 days 00:06:37 -- Date 2019-03-14-19.32.27.159573
If Non Active proceed to b.3.
$ ssh kf5hostname01 'su - db2opm -c "db2pd -"'
Unable to attach to database manager on member 0.
Please ensure the following are true:
   - db2start has been run for the member.
   - db2pd is being run on the same physical machine as the member.
   - DB2NODE environment variable setting is correct for the member
     or db2pd -mem setting is correct for the member.
  • Substep b.3:  Start the database.
  $ ssh kf5hostname01 'su - db2opm -c "db2start"'  03/14/2019 19:36:30     0   0   SQL1063N  DB2START processing was successful.  SQL1063N  DB2START processing was successful.    
  • Substep b.4: Check the current setting.
  $ ssh kf5hostname01 'su - db2opm -c "db2 get db cfg for opmdb | grep LOGSECOND"'    Number of secondary log files (LOGSECOND) = 120
  • Substep b.5: Update the setting and verify
  $ ssh kf5hostname01 'su - db2opm -c "db2 update db cfg for opmdb using LOGSECOND 200"'    DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.   SQL1363W One or more of the parameters submitted for immediate modification   were not changed dynamically. For these configuration parameters, the database   must be shutdown and reactivated before the configuration parameter changes   become effective.      $ ssh kf5hostname01 'su - db2opm -c "db2 connect to opmdb;db2 get db cfg for opmdb show detail| grep LOGSECOND"'       Database Connection Information     Database server        = DB2/AIX64 10.5.10   SQL authorization ID   = DB2OPM   Local database alias   = OPMDB     Number of secondary log files               (LOGSECOND) = 120                        200
  • Substep b.6: Restart the database.
  $ ssh kf5hostname01 'su - db2opm -c "db2stop"'  03/14/2019 19:43:16     0   0   SQL1064N  DB2STOP processing was successful.  SQL1064N  DB2STOP processing was successful.    (0) root @ kf5hostname01: 7.1.0.0: /  $ ssh kf5hostname01 'su - db2opm -c "db2start"'  03/14/2019 19:43:26     0   0   SQL1063N  DB2START processing was successful.  SQL1063N  DB2START processing was successful.
  • Substep b.7: Verify the setting.
  $ ssh kf5hostname01 'su - db2opm -c "db2 connect to opmdb;db2 get db cfg for opmdb show detail| grep LOGSECOND"'       Database Connection Information     Database server        = DB2/AIX64 10.5.10   SQL authorization ID   = DB2OPM   Local database alias   = OPMDB     Number of secondary log files               (LOGSECOND) = 200                        200    

  • Substep b.8: If the database showed as stopped in b.2 the stop the database.
  $ ssh kf5hostname01 'su - db2opm -c "db2stop"'  03/14/2019 19:47:28     0   0   SQL1064N  DB2STOP processing was successful.  SQL1064N  DB2STOP processing was successful.
  • Substep b.8: Resume the fixpack.
Fixed:
----------
N/A.
KI007655
SSH Bad protocol 2 host key algorithms '+ssh-dss'.  [ Added 2019-09-12 ]
Fixpack
I_V1.0.0.6
I_V1.1.0.2
SSH Bad protocol 2 host key algorithms '+ssh-dss'.
In the V1.0.0.6 and V1.1.0.2 Readme Version 337 document the section "STAGE 1 - Prerequisites" had a element d. that discusses updates to the /etc/ssh/sshd_config settings.
At issue is hte settings for HostKeyAlgorithms and PubkeyAcceptedKeyTypes which for AIX 7.1 TL5 PDOA environment should include +ssh-dss . When updating from PDOA V1.0.0.5 IF01 or PDOA V1.1.0.1 IF01 the edits do not hurt ssh or ssd. In AIX 7.1 TL3 levels of PDOA such as in V1.0.0.5 and V1.1.0.1 and earlier, this setting in invalid and will result in failures for the ssh client and sshd server.
The readme document incorrectly has these modifications as part of Pre-requisite section and instead should be only be done after the AIX levels are updated.
The challenges related to updating SSH which happens as part of the AIX 7.1 TL3 to AIX 7.1 TL5 update are documented in the following technote:
https://www.ibm.com/support/pages/ibm-aix-various-ssh-problems-after-upgrading-openssh-7x
Customers who have applied V1.0.0.5 IF01 or V1.1.0.1 IF01 will have experienced these SSH issues.
Workaround:
-----------
PermitRootLogin must still be set to 'yes' explicitly as part of the pre-requisite steps for the update. This is required for the pflayer to function correctly during the fixpack.
Except for the PermitRootLogin step undo the HostKeyAlgorithms/PubkeyAcceptedKeyTypes and wait until after STAGE 6 for the management hosts and STAGE 7 for the core hosts to update the /etc/ssh/ssh_config and /etc/ssh/sshd_config files.
 
Fixed:
----------
FP7_FP3 will not be impacted.
If there are future updates to the FP6_FP2 documentation this will be fixed at that time.
KI007641
0516-1734 chvg: Warning, savebase failed.  Please manually run 'savebase' before rebooting.  [ Added 2019-09-12 ]
Fixpack
I_V1.0.0.6
I_V1.1.0.2
0516-1734 chvg: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
As part of Stage 8 Phase 6 Remmirror the rootvg volume group after the update is successful. Step e.
$ mirrorvg rootvg hdisk1
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1734 mklvcopy: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
0516-1804 chvg: The quorum change takes effect immediately.
0516-1734 chvg: Warning, savebase failed.  Please manually run 'savebase' before rebooting.



0516-1126 mirrorvg: rootvg successfully mirrored, user should perform
        bosboot of system to initialize boot records.  Then, user must modify
        bootlist to include:  hdisk0 hdisk1.
0516-1734 mirrorvg: Warning, savebase failed.  Please manually run 'savebase' before rebooting.
 
Workaround:
-----------
Execute savebase command after execution of bosboot -a and bootlist command
Fixed:
----------
This appears to be a very rare error and no root cause or fix has been identified.
KI007660
appl_ctrl_net_adapter commands do not return any output. [ Added 2019-09-12 ]
Fixpack
I_V1.0.0.6
I_V1.1.0.2
appl_ctrl_net_adapter commands do not return any output.
From Appendix J: Netowrk adapter Firmware Update
The following commands may not always return any output.
$ $PL_ROOT/bin/icmds/appl_ctrl_net_adapter update -validate -l net_adapter8,net_adapter9,net_adapter0,net_adapter1 -f /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image
$  $PL_ROOT/bin/icmds/appl_ctrl_net_adapter update -install -l net_adapter8,net_adapter9,net_adapter0,net_adapter1 -f /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image
During testing this was caused when the appliance filesystem /BCU_share was not NFS mounted from the management node on the impacted nodes.
Workaround:
-----------
Run the verify commands in the step immediately after, i this case step 5. appl_ctrl_adapter query -l <adapter>
And/Or:
Verify the update by reviewing the log in $VAR_PL_ROOT/log/platform_layer.log  on the management host.
Failed Examples: In this example /BCU_share was purposely unmounted to illustrate the issue and demonstrate failure messages.
  [16 May 2019 06:41:17,243] <8716478 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: net_adapter9 Output: Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmware/net_adapter/  a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.63.  [16 May 2019 06:41:17,245] <8716478 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: writing to STATUSFILE: net_adapter9==1==Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/fi  rmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.63.  [16 May 2019 06:41:17,279] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Status of server: server4, pid: 8716478 is 1  [16 May 2019 06:41:17,281] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Convert status info file to ret hash  [16 May 2019 06:41:17,282] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter0==1==Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmwar  e/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.65.  [16 May 2019 06:41:17,282] <10289372 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 06:41:17,283] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter8==1==Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmwar  e/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.63.  [16 May 2019 06:41:17,283] <10289372 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 06:41:17,284] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter1==1==Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmwar  e/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.65.  [16 May 2019 06:41:17,284] <10289372 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 06:41:17,284] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter9==1==Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmwar  e/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.63.  [16 May 2019 06:41:17,284] <10289372 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 06:41:17,285] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Deleting statusfile /tmp/status_adapters.txt  [16 May 2019 06:41:17,303] <10289372 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Contents of ret hash$VAR1 = {  [16 May 2019 06:41:17,303] <10289372 CTRL  DEBUG  stgkf201>           'net_adapter9' => {  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'success' => '1',  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmware/net_ada  pter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.63.  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201> '  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                             },  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>           'net_adapter0' => {  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'success' => '1',  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmware/net_ada  pter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.65.  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201> '  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                             },  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>           'net_adapter1' => {  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'success' => '1',  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmware/net_ada  pter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.65.  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201> '  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                             },  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>           'net_adapter8' => {  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'success' => '1',  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: Failed to unpack the adapter firmware file /BCU_share/FP6_FP2/firmware/net_ada  pter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on 172.23.2.63.  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201> '  [16 May 2019 06:41:17,304] <10289372 CTRL  DEBUG  stgkf201>                             }  [16 May 2019 06:41:17,305] <10289372 CTRL  DEBUG  stgkf201>         };
Successful Examples:
  [16 May 2019 07:12:41,834] <11600048 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Convert status info file to ret hash  [16 May 2019 07:12:41,835] <11600048 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter8==0==Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent2(172.23.2.63).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent3(172.23.2.63).  [16 May 2019 07:12:41,835] <11600048 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 07:12:41,835] <11600048 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter0==0==Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent0(172.23.2.65).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent1(172.23.2.65).  [16 May 2019 07:12:41,835] <11600048 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 07:12:41,836] <11600048 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter9==0==Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent0(172.23.2.63).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent1(172.23.2.63).  [16 May 2019 07:12:41,836] <11600048 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 07:12:41,837] <11600048 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Processing line: net_adapter1==0==Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent2(172.23.2.65).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent3(172.23.2.65).  [16 May 2019 07:12:41,837] <11600048 CTRL  DEBUG  stgkf201>  of file /tmp/status_adapters.txt  [16 May 2019 07:12:41,838] <11600048 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Deleting statusfile /tmp/status_adapters.txt  [16 May 2019 07:12:41,857] <11600048 CTRL  DEBUG  stgkf201> FCAdapterFW-apply: Contents of ret hash$VAR1 = {  [16 May 2019 07:12:41,857] <11600048 CTRL  DEBUG  stgkf201>           'net_adapter9' => {  [16 May 2019 07:12:41,857] <11600048 CTRL  DEBUG  stgkf201>                               'success' => '0',  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent0(172.23.2.63).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent1(172.23.2.63).  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201> '  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                             },  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>           'net_adapter0' => {  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                               'success' => '0',  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent0(172.23.2.65).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent1(172.23.2.65).  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201> '  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                             },  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>           'net_adapter1' => {  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                               'success' => '0',  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent2(172.23.2.65).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent3(172.23.2.65).  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201> '  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                             },  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>           'net_adapter8' => {  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                               'success' => '0',  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                               'message' => 'Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent2(172.23.2.63).Update status: The adapter update is successful for the adapter firmware image /BCU_share/FP6_FP2/firmware/net_adapter/a21910071410d103/image/a21910071410d103.0400401800009.aix.rpm on ent3(172.23.2.63).  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201> '  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>                             }  [16 May 2019 07:12:41,858] <11600048 CTRL  DEBUG  stgkf201>         };  
Fixed:
----------
N/A.
KI007616
scp fails to complete on some files to and from PDOA AIX servers. [ Added 2019-09-12 ]
Fixpack
I_V1.0.0.6
I_V1.1.0.2
scp fails to complete on some files to and from PDOA AIX servers.
When attempting to use scp to copy files into and out of PDOA AIX hosts after updating to IF01, V1.0.0.6, V1.1.0.2, the scp command will run for a short time and then fail on certain files and succeed on others.
This symptom matches this APAR found in AIX 7.2 https://www-01.ibm.com/support/entdocview.wss?uid=isg1IJ03680.
When debugging the case where AIX is the target server running ssh over port 9022 we can see the server, when running some scp commands to port 9022 the following output.
# As root on any PDOA aix server run the following. This requires port 9022 to be available. Then connect to that port with a client scp session that fails.
$(which sshd) -o Port=9022 -ddd

debug2: channel 0: rcvd adjust 98304
ssh_packet_send: invalid format
debug1: do_cleanup
debug1: audit event euid 0 user root event 12 (SSH_connabndn)
debug1: Return Val-1 for auditproc:0
Workaround:
-----------
If there are failures during scp activiies to or from the AIX hosts add the following to the ssh command options, '-o Compression=off'.
Fixed:
----------
Unknown at this time.
KI007598
apply: storage1: apply failed
Fixpack
I_V1.1.0.2
apply: storage1: apply failed
During Stage 4 on Flash900 firmware updates we have seen a Drive Module failure occur while attempting to apply 1.4.7.1  to a Flash900 enclosure. This is a not a common error.
pflayer log entires may include lines as follows.
  $  grep "apply: " platform_layer.trace.6  [18 Apr 2018 07:28:14,786] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <4>: update_status=<upgrading 23>  [18 Apr 2018 07:28:16,956] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: waiting for upgrade to complete, iteration <4>: update_status=<upgrading 23>  [18 Apr 2018 07:33:16,374] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <5>: update_status=<upgrading 23>  [18 Apr 2018 07:33:20,170] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: waiting for upgrade to complete, iteration <5>: update_status=<upgrading 23>  [18 Apr 2018 07:38:18,915] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <6>: update_status=<committing 47>  [18 Apr 2018 07:38:22,543] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: waiting for upgrade to complete, iteration <6>: update_status=<upgrading 23>  [18 Apr 2018 07:43:22,314] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <7>: update_status=<updating_hardware 50>  [18 Apr 2018 07:43:25,034] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: waiting for upgrade to complete, iteration <7>: update_status=<upgrading 23>  [18 Apr 2018 07:48:24,577] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <8>: update_status=<updating_hardware 56>  [18 Apr 2018 07:48:27,757] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: waiting for upgrade to complete, iteration <8>: update_status=<committing 47>  [18 Apr 2018 07:53:26,636] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <9>: update_status=<updating_hardware 77>  [18 Apr 2018 07:53:30,571] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: waiting for upgrade to complete, iteration <9>: update_status=<hardware_failed 50>  [18 Apr 2018 07:58:29,505] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <10>: update_status=<updating_hardware 79>  [18 Apr 2018 07:58:30,572] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: broke out of timed wait after 10 iterations of maximum 48. update status is <hardware_failed 50>  [18 Apr 2018 07:58:30,573] <4129530 CTRL  DEBUG  kf5hostname01> Extracted msg from NLS: apply: 172.23.1.205 Error: The update status of the end point is <hardware_failed 50>.  [18 Apr 2018 07:58:30,573] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: error: error state, update status is <hardware_failed 50>  [18 Apr 2018 07:58:30,616] <3146728 CTRL  DEBUG  kf5hostname01> apply: storage1: apply failed  [18 Apr 2018 08:03:32,417] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <11>: update_status=<updating_hardware 85>    (0) root @ kf5hostname01: 7.1.0.0: /BCU_share/aixappl/pflayer/log    [18 Apr 2018 07:58:29,504] <5701764 CTRL  TRACE  kf5hostname01> Exiting Ctrl::Updates::Storage::get_update_status }  [18 Apr 2018 07:58:29,505] <5701764 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.209: waiting for upgrade to complete, iteration <10>: update_status=<updating_hardware 79>  [18 Apr 2018 07:58:30,572] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: broke out of timed wait after 10 iterations of maximum 48. update status is <hardware_failed 50>  [18 Apr 2018 07:58:30,573] <4129530 CTRL  DEBUG  kf5hostname01> Extracted msg from NLS: apply: 172.23.1.205 Error: The update status of the end point is <hardware_failed 50>.  [18 Apr 2018 07:58:30,573] <4129530 CTRL  DEBUG  kf5hostname01> apply: 172.23.1.205: error: error state, update status is <hardware_failed 50>  [18 Apr 2018 07:58:30,616] <3146728 CTRL  DEBUG  kf5hostname01> apply: storage1: apply failed
Accessing the Flash900 CLI via ssh may reveal the following.  The event log shows 'Array rebuild complete' and 'Array mdisk is not protected by sufficient spares' right after the log shows the update failure.
  BM_FlashSystem:Flash_00:admin>lseventlog  sequence_number last_timestamp object_type object_id object_name copy_id status  fixed event_id error_code description                                       secondary_object_type secondary_object_id  102             150901084131   node        2         node2               message no    980349              Node added                                                                                 103             150901090631   enclosure   0                             message no    988113              Internal hardware update completed                                                         114             150903035055   drive       0                             message no    988024              Flash module format complete                                                               115             150903035055   drive       4                             message no    988024              Flash module format complete                                                               116             150903035100   drive       1                             message no    988024              Flash module format complete                                                               117             150903035100   drive       9                             message no    988024              Flash module format complete                                                               118             150903035100   drive       2                             message no    988024              Flash module format complete                                                               119             150903035100   drive       5                             message no    988024              Flash module format complete                                                               120             150903035100   drive       6                             message no    988024              Flash module format complete                                                               121             150903035100   drive       7                             message no    988024              Flash module format complete                                                               122             150903035100   drive       8                             message no    988024              Flash module format complete                                                               123             150903035105   drive       3                             message no    988024              Flash module format complete                                                               124             160111081546   cluster               Flash_00            message no    980506              Update prepared                                                                            125             160111082846   node        2         node2               message no    980349              Node added                                                                                 126             160111084312   node        1         node1               message no    980349              Node added                                                                                 9000005         161003035616   enclosure   2                             alert   no    045071   1037       Unsupported canister combination                                                           9000006         161003035656   node        1         node1               alert   no    077187   1092       Temperature critical threshold exceeded                                                    147             170102161358   cluster               Flash_00            message no    980506              Update prepared                                                                            148             170102162543   node        1         node1               message no    980349              Node added                                                                                 151             170102163830   node        2         node2               message no    980349              Node added                                                                                 156             170213164006   node        2         node2               message no    980349              Node added                                                                                 159             180418070758   cluster               Flash_00            message no    980506              Update prepared                                                                            160             180418072508   node        2         node2               message no    980349              Node added                                                                                 164             180418074545   node        1         node1               message no    980349              Node added                                                                                 165             180418075055   enclosure   1                             alert   no    085048   2060       Reconditioning of batteries required                                                       166             180418075055   enclosure   1                             alert   no    085048   2060       Reconditioning of batteries required                                                       167             180418075220   enclosure   0                             alert   no    085118   2010       Update process failed                                                                      168             180418083654   mdisk       0         array0              message no    988023              Array rebuild complete                                                                     169             180418083654   mdisk       0         array0              alert   no    085031   1690       Array mdisk is not protected by sufficient spares                                              IBM_FlashSystem:Flash_00:admin>lseventlog 167  sequence_number 167  first_timestamp 180418075220  first_timestamp_epoch 1524048740  last_timestamp 180418075220  last_timestamp_epoch 1524048740  object_type enclosure  object_id 0  object_name  copy_id  reporting_node_id 1  reporting_node_name node1  root_sequence_number  event_count 1  status alert  fixed no  auto_fixed no  notification_type error  event_id 085118  event_id_text System update halted  error_code 2010  error_code_text Update process failed  machine_type 9840AE2  serial_number 1351337  FRU None  fixed_timestamp  fixed_timestamp_epoch  callhome_type hardware  sense1 01 00 31 33 35 31 33 33 37 00 00 00 00 00 00 00  sense2 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  sense3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  sense4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  sense5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  sense6 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  sense7 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  sense8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  secondary_object_type  secondary_object_id  IBM_FlashSystem:Flash_00:admin>    IBM_FlashSystem:Flash_00:superuser>lsenclosureslot  enclosure_id slot_id port_1_status port_2_status drive_present drive_id  1 1 offline offline no  1 2 online online yes 0  1 3 online online yes 1  1 4 online online yes  1 5 online online yes 3  1 6 online online yes 4  1 7 online online yes 5  1 8 online online yes 6  1 9 online online yes 7  1 10 online online yes 8  1 11 online online yes 9  1 12 offline offline no 8:54:06 PM   

 
Workaround:
-----------
Immediately contact IBM Support.
Consult the mustgather link at the end of this document for any relevant data required.
Fixed:
----------
N/A.
KI007658
Missing license gpfs.license.std
Fixpack
I_V1.1.0.2
I_V1.0.0.6
Missing license gpfs.license.std
During the application of the fixpack for PDOA in the readme for GPFS updates in Appendix F as part of Stage 6 and Stage 8  the following message will appear when running installp to update GPFS.
stgkf203: FAILURES
stgkf203: --------
stgkf203:   Filesets listed in this section failed pre-installation verification
stgkf203:   and will not be installed.
stgkf203:
stgkf203:   Requisite Failures
stgkf203:   ------------------
stgkf203:   SELECTED FILESETS:  The following is a list of filesets that you asked to
stgkf203:   install.  They cannot be installed until all of their requisite filesets
stgkf203:   are also installed.  See subsequent lists for details of requisites.
stgkf203:
stgkf203:     gpfs.license.std 4.2.3.0                  # IBM Spectrum Scale Standard ...
stgkf203:
stgkf203:   MISSING REQUISITES:  The following filesets are required by one or more
stgkf203:   of the selected filesets listed above.  They are not currently installed
stgkf203:   and could not be found on the installation media.
stgkf203:
stgkf203:     gpfs.license.std 4.2.0.0                  # Base Level Fileset
stgkf203:
stgkf203:   << End of Failure Section >>
This is consistent with what we experience in the lab during testing. The rest of of the installation log should show success for the rest of the filesests and the verifcation steps with lslpp -l display should be consistent with the documentation.
stgkf201: Name                        Level           Part        Event       Result
stgkf201: -------------------------------------------------------------------------------
stgkf201: gpfs.msg.en_US              4.2.3.0         USR         APPLY       SUCCESS
stgkf201: gpfs.base                   4.2.3.0         USR         APPLY       SUCCESS
stgkf201: gpfs.base                   4.2.3.0         ROOT        APPLY       SUCCESS
stgkf201: gpfs.ext                    4.2.3.0         USR         APPLY       SUCCESS
stgkf201: gpfs.docs.data              4.2.3.0         SHARE       APPLY       SUCCESS
stgkf201: gpfs.gskit                  8.0.50.75       USR         APPLY       SUCCESS
 
$ dsh -n ${BCUMGMT},${BCUMGMTSTDBY} 'lslpp -l "*gpfs*"' | dshbak -c
HOSTS -------------------------------------------------------------------------
kf5hostname01, kf5hostname03
-------------------------------------------------------------------------------
  Fileset                      Level  State      Description
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  gpfs.base                  4.2.3.7  APPLIED    GPFS File Manager
  gpfs.ext                   4.2.3.7  APPLIED    GPFS Extended Features
  gpfs.gskit               8.0.50.75  APPLIED    GPFS GSKit Cryptography
                                                 Runtime
  gpfs.msg.en_US             4.2.3.6  APPLIED    GPFS Server Messages - U.S.
                                                 English
Path: /etc/objrepos
  gpfs.base                  4.2.3.7  APPLIED    GPFS File Manager
Path: /usr/share/lib/objrepos
  gpfs.docs.data             4.2.3.6  APPLIED    GPFS Server Manpages and
                                                 Documentation
Proceed with the update as long as the verification steps are consistent.
Workaround:
-----------
N/A
Fixed:
----------
N/A
KI007677
ssh Connection reset by peer to BNT switches [ Added 2019-09-17 ]
General I_V1.1
ssh Connection reset by peer to BNT switches [ Added 2019-09-17 ]
This can lead to miauth failures when attempting to change the password on the BNT switches or simply accessing the switches via ssh on the command line.
When trying to connect via ssh from the root user on the management host to one of the BNT network switches in a PDOA environment, ssh can fail with errors similar to below either on the ssh command line or in the platform layer log files that use ssh.
$ ssh -o Protocol=2,1 admin@172.23.1.254
Read from socket failed: Connection reset by peer
In PDOA V1.1 environments from GA to FP2 the use of the /etc/ssh/ssh_config setting of Protocol=2,1 leads to connection issues to the BNT switches.
On the same system the following works:
$ ssh -o Protocol=2 admin@172.23.1.254
Enter login password:
IBM Networking Operating System RackSwitch G8052.
Workaround:
-----------
PDOA was not shipped with this value uncommented so as fixpacks are applied the default behavior of ssh clients in PDOA will change due to updates in security defaults.
Fixed:
----------
Update /etc/ssh/ssh_config to set the value of Protocol to 2 or comment the value and allow the ssh client to follow the default behavior.
KI007679
FP6_FP2 Readme Appendix I is not clear about Installation Manager removal [ Added 2019-09-20 ]
Fixpack
I_V1.0.0.6
I_V1.1.0.2
FP6_FP2 Readme Appendix I is not clear about Installation Manager removal [ Added 2019-09-20 ]
There are two issues with the Appendix that have been identified.
The first issue is related to formatting.
In 'v.' the following instruction appears as a single command in the document instead of command and output. The standard should be commands are in bold.
$ /opt/IBM/InstallationManager/eclipse/tools/imcl listInstalledPackages
com.ibm.cic.agent_1.8.5000.20160506_1125
The second issue is the command to remove installation manager is actually missing.
Workaround:
-----------
This step can be performed at any time during or after the fixpack has completed.
There is no outage required and no impact on the appliance function.
Issue #1:
The command is:
$ /opt/IBM/InstallationManager/eclipse/tools/imcl listInstalledPackages
The output is:
com.ibm.cic.agent_1.8.5000.20160506_1125
Issue #2:
As the root user on management run the following command (in bold)
$ time /opt/IBM/InstallationManager/eclipse/tools/imcl uninstall  com.ibm.cic.agent_1.8.5000.20160506_1125
Uninstalled com.ibm.cic.agent_1.8.5000.20160506_1125 from the /opt/IBM/InstallationManager/eclipse directory.

real    0m29.70s
user    0m4.03s
sys     0m0.43s
### Verify that Installation is uninstalled.
$ ls -la /opt/IBM/InstallationManager
ls: 0653-341 The file /opt/IBM/InstallationManager does not exist.
Fixed:
----------
N/A.
KI007473
Mail to root user with subject "Electronic Service Agent not" received on AIX hosts. [ Added 2019-09-23 ]
General
Fixpack
I_V1.0
I_V1.0.0.5_IF01
I_V1.0.0.6
I_V1.1.0.1_IF01
I_V1.1.0.2
Mail to root user with subject "Electronic Service Agent not" received on AIX hosts.
The root user on al AIX hosts may receive the following e-mail:
Message 13:
From esaadmin Sun Aug 26 03:01:01 2018
Date: Sun, 26 Aug 2018 03:01:01 -0400
From: esaadmin
To: root
Subject: Electronic Service Agent not activated
Electronic Service Agent has not been activated.  To activate Electronic Service Agent, do the following:  From the SMIT main menu, select Electronic Service Agent, then select Configure Electronic Service Agent.
For information about Electronic Service Agent, including the benefits of activating it, see the following:
http://publib.boulder.ibm.com/infocenter/eserver/v1r2/topic/eicbd/eicbdkickoff.htm
To discontinue this periodic reminder message, execute the command /usr/esa/sbin/rmESAReminder.
The e-mail originates from an AIX component called the Electronic Service Agent. This is documented in the following Knowledge Center for AIX 7.1 link. https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/electronicserviceagent/eicbdkickoff.html
PDOA includes the bos.esagent filesets.
In V1.0 systems the esaadmin user was also defined as was the cron job associated with that user that generated the above mail.
In V1.1 systems the esadmin user was removed.
For customers who may have resolved the issue who had not yet applied a fixpack that includes an AIX 7.1 TL5 based update (IF01/FP6_FP2) when they do apply these updates it is believed that as part of the update that the esaadmin user is recreated and the cron job to notify root that the Electronic Service Agent is not configured is reset.
The following commands can be used to check the appliance state:
  $ dsh -n ${ALL} 'lslpp -l bos.esagent' | dshbak -c  HOSTS -------------------------------------------------------------------------  reverseflash01, reverseflash02, reverseflash03, reverseflash04, reverseflash05, reverseflash06  -------------------------------------------------------------------------------    Fileset                      Level  State      Description    ----------------------------------------------------------------------------  Path: /usr/lib/objrepos    bos.esagent                7.1.5.0  COMMITTED  Electronic Service Agent    Path: /etc/objrepos    bos.esagent                7.1.5.0  COMMITTED  Electronic Service Agent
$ dsh -n ${ALL} 'crontab -l esaadmin' | dshbak -c
HOSTS -------------------------------------------------------------------------
reverseflash01, reverseflash02, reverseflash03, reverseflash04, reverseflash05, reverseflash06
-------------------------------------------------------------------------------
0 3 * * 0 /usr/esa/sbin/esa_awareness
$ dsh -n ${ALL} 'mail -H | grep -c "Electronic Service Agent not"' | sort
reverseflash01: 7
reverseflash02: 20
reverseflash03: 20
reverseflash04: 20
reverseflash05: 20
reverseflash06: 20
Workaround:
-----------
The Electronic Service is not configured or needed on PDOA environments as PDOA uses HMCs for alerting and call home. To remediate this issue following  the instructions in the e-mail to disable the reminder.
$ dsh -n ${ALL} '/usr/esa/sbin/rmESAReminder'
reverseflash01: ...checking for user esaadmin
reverseflash01: esaadmin id=12 pgrp=system groups=system,staff home=/var/esa shell=/usr/bin/ksh login=false su=true rlogin=false daemon=true admin=true sugroups=ALL admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=files SYSTEM=compat logintimes= loginretries=0 pwdwarntime=0 account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0 minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8 minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=SysConfig efs_initialks_mode=admin efs_keystore_algo=RSA_2048 efs_keystore_access=file efs_adminks_access=file efs_allowksmodechangebyuser=true efs_file_algo=AES_128_CBC fsize=-1 cpu=-1 data=-1 stack=393216 core=-1 rss=-1 nofiles=-1 stack_hard=393216 roles=SysConfig
reverseflash01: ...checking existence of crontab file
reverseflash01: ...removing the crontab entry
reverseflash06: ...checking for user esaadmin
reverseflash06: esaadmin id=12 pgrp=system groups=system,staff home=/var/esa shell=/usr/bin/ksh login=false su=true rlogin=false daemon=true admin=true sugroups=ALL admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=files SYSTEM=compat logintimes= loginretries=0 pwdwarntime=0 account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0 minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8 minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=SysConfig fsize=-1 cpu=-1 data=-1 stack=393216 core=-1 rss=-1 nofiles=-1 stack_hard=393216 roles=SysConfig
reverseflash06: ...checking existence of crontab file
reverseflash06: ...removing the crontab entry
reverseflash02: ...checking for user esaadmin
reverseflash02: esaadmin id=12 pgrp=system groups=system,staff home=/var/esa shell=/usr/bin/ksh login=false su=true rlogin=false daemon=true admin=true sugroups=ALL admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=files SYSTEM=compat logintimes= loginretries=0 pwdwarntime=0 account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0 minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8 minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=SysConfig fsize=-1 cpu=-1 data=-1 stack=393216 core=-1 rss=-1 nofiles=-1 stack_hard=393216 roles=SysConfig
reverseflash02: ...checking existence of crontab file
reverseflash02: ...removing the crontab entry
reverseflash03: ...checking for user esaadmin
reverseflash03: esaadmin id=12 pgrp=system groups=system,staff home=/var/esa shell=/usr/bin/ksh login=false su=true rlogin=false daemon=true admin=true sugroups=ALL admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=files SYSTEM=compat logintimes= loginretries=0 pwdwarntime=0 account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0 minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8 minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=SysConfig fsize=-1 cpu=-1 data=-1 stack=393216 core=-1 rss=-1 nofiles=-1 stack_hard=393216 roles=SysConfig
reverseflash03: ...checking existence of crontab file
reverseflash03: ...removing the crontab entry
reverseflash04: ...checking for user esaadmin
reverseflash04: esaadmin id=12 pgrp=system groups=system,staff home=/var/esa shell=/usr/bin/ksh login=false su=true rlogin=false daemon=true admin=true sugroups=ALL admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=files SYSTEM=compat logintimes= loginretries=0 pwdwarntime=0 account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0 minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8 minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=SysConfig fsize=-1 cpu=-1 data=-1 stack=393216 core=-1 rss=-1 nofiles=-1 stack_hard=393216 roles=SysConfig
reverseflash04: ...checking existence of crontab file
reverseflash04: ...removing the crontab entry
reverseflash05: ...checking for user esaadmin
reverseflash05: esaadmin id=12 pgrp=system groups=system,staff home=/var/esa shell=/usr/bin/ksh login=false su=true rlogin=false daemon=true admin=true sugroups=ALL admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=files SYSTEM=compat logintimes= loginretries=0 pwdwarntime=0 account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0 minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8 minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=SysConfig fsize=-1 cpu=-1 data=-1 stack=393216 core=-1 rss=-1 nofiles=-1 stack_hard=393216 roles=SysConfig
reverseflash05: ...checking existence of crontab file
reverseflash05: ...removing the crontab entry
$ dsh -n ${ALL} 'crontab -l esaadmin' 2>&1 | dshbak -c
HOSTS -------------------------------------------------------------------------
reverseflash01, reverseflash02, reverseflash03, reverseflash04, reverseflash05, reverseflash06
-------------------------------------------------------------------------------
0481-103 Cannot open a file in the /var/spool/cron/crontabs directory.
A file or directory in the path name does not exist.
Verify the esadmin user details and the easadmin user is not allowed to login.
$ dsh -n ${ALL} 'lsuser -a login rlogin esaadmin' | dshbak -c
HOSTS -------------------------------------------------------------------------
reverseflash01, reverseflash02, reverseflash03, reverseflash04, reverseflash05, reverseflash06
-------------------------------------------------------------------------------
esaadmin login=false rlogin=false

Questions:

Can the esaadmin user be removed?

In V1.1 this user was removed and FP1/FP2 did not seem to be impacted nor did runtimes. So it should be safe to remove this user.

Can the bos.esagent filesets be removed?

The effect of removing these filesets hasn't been verified within our appliance. However there is documentation in the AIX knowledge center describing these steps:

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/electronicserviceagent/eicbduninstallserviceagent_aix.html

PDOA has not tested whether the removal of these filesets wil alleviate the issue coming back after an AIX update is performed. Since this is a documented AIX procedure and PDOA does not use the user nor the pacakge it should be safe to remove.

Fixed:
----------
N/A. Follow the workaround as needed.
KI007701
IBM PureData System for Operational Analytics V1.1 Appliances May experience an Internal Raid Card Failure During A Power Cycle [ Added 2019-10-18 ]
General
Fixpack
I_V1.1
I_V1.1.0.1
I_V1.1.0.2
IBM PureData System for Operational Analytics V1.1 Appliances May experience an Internal Raid Card Failure During A Power Cycle
The most common symptom is an LPAR fails to boot after a power cycle of the LPAR or CEC. The raid adapter will appear as missing.
See this technote for more infomration about this issue: 
https://www.ibm.com/support/pages/node/1088866
Workaround:
-----------
The following technote illustrates how to reduce the risk of this issue prior to FP7_FP3. https://www.ibm.com/support/pages/node/1088866
Fixed:
----------
FP7_FP3
KIG00052
FP7_FP3 gen_update_script.sh does not compare version places with double digits correctly. [ Added 2020-02-14 ]
Fixpack
I_V1.0.0.7
I_V1.1.0.3
FP7_FP3 gen_update_script.sh does not compare version places with double digits correctly.
From Stage 6 Phase 3 in the FP7_FP3 Readme when directed to
Appendix H - DB2 Update the script gen_update_script.sh does not generate mgmtfixpack.sh or corefixpack.sh files.
This is due to an issue with the version comparison algorithm between DB2 versions that doesn't compare double and single digits in the same location.
For example, DB2 10.5 updates from V1.1 FP1 to V1.1 FP3
  unpack_10.5.0.5..1  unpack_10.5.0.10..6
will not recognize that 10.5.0.10 is higher than 10.5.0.5.
Workaround:
-----------
This issue should only impact DB2 10.5 customers.
Use the following command for 'mgmtfixpack.sh' for DB2 10.5.0.10..6 updates to the management host.
$ cat mgmtfixpack.sh
/BCU_share/FP7_FP3/software/DB2/unpack_10.5.0.10..6/universal/installFixPack -n -b /usr/IBM/dwe/mgmt_db2/V10.5 -c /BCU_share/FP7_FP3/software/DB2/unpack_nlpack_10.5.0.10..0/nlpack -f NOTSAMP -f update -t /tmp/$(hostname)_db2_10.5.0.5..1_$(date +%Y%m%d_%H%M%S).trc -n
The corefixpack.sh is a litlte more complicated as it varies depending on the source path and the DB2 version. Replace the <current db2 10.5 copy> the copy directory associated with your core instance name.
$ cat corefixpack.sh
/BCU_share/FP7_FP3/software/DB2/unpack_10.5.0.10..6/universal/installFixPack -n -b <current db2 10.5 copy> -p /usr/IBM/dwe/db2/V10.5.0.10..6 -c /BCU_share/FP7_FP3/software/DB2/unpack_nlpack_10.5.0.10..0/nlpack -f NOTSAMP -f update -t /tmp/$(hostname)_db2_10.5.0.10..0_$(date +%Y%m%d_%H%M%S).trc -n
In the example below, the copy would be /usr/IBM/dwe/db2/V10.5.0.10..0. This represents the copy that the installer will use to determine the components and licenses to carry over to the updated db2 copy as part of the -p option.
$  dsh -n ${ALL} -e $(pwd)/scripts/get_db2_data.sh | dshbak -c
HOSTS -------------------------------------------------------------------------
reverseflash01, reverseflash03
-------------------------------------------------------------------------------
/usr/IBM/dwe/mgmt_db2/V10.5|10.5|10.5.0.10|10.5.0.10..0|db2opm,dweadmin|db2opm,dweadmin||
HOSTS -------------------------------------------------------------------------
reverseflash02, reverseflash04, reverseflash05, reverseflash06
-------------------------------------------------------------------------------
/usr/IBM/dwe/db2/V10.5.0.10..0|10.5|10.5.0.10|10.5.0.10..0|bcuaix|bcuaix||
OR
Contact IBM Support to obtain an updated gen_update_script.sh.
Backup the gen_update_script.sh file in /BCU_share/FP7_FP3/software/DB2/scripts and replace it with the script obtained from IBM Support.
Update the permissions to read and execute for the user.
chmod 500 gen_update_script.sh
Fixed:
----------
FP8_FP4
KIG00063
After the FP7_FP3 update the HMC call home settings are lost and call home tickets for HMC or Power Server issues are not submitted to IBM. (Added: 2020-03-12)
Fixpack
I_V1.0.0.7
I_V1.1.0.3
After the FP7_FP3 update the HMC call home settings are lost and call home tickets for HMC or Power Server issues are not submitted to IBM. (Added: 2020-03-12)
During Stage 3 of this PDOA FP7_FP3 fixpack the HMC FW level is updated to V9R1.930.0. Anytime after this update if there is a Call Home event the information sent to IBM does not include the the Solution MTM nor the Solution Serial number. Instead the device MTM and Serial number are sent. Without the Solution Serial Number and the Solution Machine Type information these tickets will fail entitlement and will not result in a HW ticket.
Other symptoms:
hscroot@dsshmc49:~> ls -l /opt/hsc/data/ISASconfig
ls: cannot access /opt/hsc/data/ISASconfig: No such file or directory
hscroot@dsshmc49:~> cat /opt/ccfw/data/ecc/hmc_ecc.properties
cat: /opt/ccfw/data/ecc/hmc_ecc.properties: No such file or directory
Workaround:
-----------
If this is part of a planning exercise before the fixpack verify that you have a copy of isas_config.xml file that was used to setup call home for the HMC.
This file may be found in one or more of the following locations:
Management host:
                                /pschome/config/isas_config.xml
HMC:
                                /hscroot/config/isas_config.xml
                                /opt/hsc/data/ISASconfig/isas_config.xml
                               /opt/sfp/data/service/ISASconfig/isas_config.xml
If you cannot find your isas_config.xml file then this can be generated with help from IBM Support.
Verify the contents of the isas_config.xml:
The file should include a solution entry for each Solution Serial.
Each solution entry should include the type (8279/8280), the model, and the solution serial number for all the components in that rack.
IBM Support can look up your solution serial numbers and rack assignments.
  <ISASSolution>          <solution>                  <type>8280</type>                  <model>A02</model>                  <serial>RACK01SSN1</serial>                  <groupname>ISASHMC</groupname>                  <server>                          <type>8286</type>                          <model>42A</model>                          <serial>2168C1V</serial>                  </server>                  <server>                          <type>8286</type>                          <model>42A</model>                          <serial>2168BFV</serial>                  </server>                  <server>                          <type>7042</type>                          <model>CR8</model>                          <serial>21E719C</serial>                  </server>                  <server>                          <type>7042</type>                          <model>CR8</model>                          <serial>21E717C</serial>                  </server>          </solution>          <solution>                  <type>8280</type>                  <model>A02</model>                  <serial>RACK01SSN1</serial>                  <groupname>ISASHMC</groupname>                  <server>                          <type>8284</type>                          <model>22A</model>                          <serial>216B47V</serial>                  </server>                  <server>                          <type>8284</type>                          <model>22A</model>                          <serial>216B42V</serial>                  </server>                  <server>                          <type>8231</type>                          <model>22A</model>                          <serial>216B44V</serial>                  </server>          </solution>  </ISASSolution>
The following information will be needed if the isas support mode is disabled.
Work with IBM Support to verify that the hscpe user is configured and you know the password.
Work with IBM Support to verify that you know the root password.
Verify the HMC Serial Number of the current Date on both HMC servers.
Ensure the isas_config.xml is located in the hscroot users home directory on both HMCs. This file can be copied from the management host as the root user to each HMC.
cd <path to xml file>
scp -p isas_config.xml hscroot@hmc:
### Replace hmc with ip addresses of the two HMCs.
Run this command to import the isas_config.xml file into each hmc.
ssh hscroot@hmc 'cpfile -t modelmap -l l -o import -f /home/hscroot/isas_config.xml'
### Replace hmc with the ip addresses of the two HMCs.
Verify that the file was imported. The directory where this file is imported has changed since PDOA FP6_FP2.
 ssh hscroot@hmc1 'ls -la /opt/sfp/data/service/ISASconfig/isas_config.xml'
-rw-r--r-- 1 root sfp 948 Mar 12 15:00 /opt/sfp/data/service/ISASconfig/isas_config.xml
Verify that the isas support mode is enabled.
ssh hscroot@hmc 'ls -la /opt/sfp/data/sa/hmc_ecc.properties'
-rw-rw-rw- 1 sfp sfp 66 Feb 12 14:39 /opt/sfp/data/sa/hmc_ecc.properties
ssh hscroot@hmc 'cat /opt/sfp/data/sa/hmc_ecc.properties'
# This flag controls iSAS problem reporting mode
isasmode = False
If isasmode = False, the it is necesary to work with support to obtain the PESH password.
Once you have the PESH password do the following.
## As root on the management or through a putty session login as the hscpe user.
$ ssh hscpe@172.23.1.245
Password:
## Using the PESH password requested from support, enter PESH mode with pesh <HMC SERIAL>
hscpe@dsshmc49:~> pesh 840A7BD
Password:
## Login as root with 'su -'
[hscpe@dsshmc49 ~] $ su -
Password:
## Enable ISAS / PDOA reporting mode.
[root@dsshmc49 ~] # /opt/hsc/bin/chisascfg --mode enable
iSAS problem reporting mode enabled. Please reboot the HMC.
[root@dsshmc49 ~] # cat /opt/sfp/data/sa/hmc_ecc.properties
# This flag controls iSAS problem reporting mode
isasmode = True
## Reboot the HMC and work with support to submit a test ticket to verify tickets are working again.
Fixed:
----------
N/A.
KIG00064

PDOA rsct filesets are in APPLIED state. (Added: 2020-03-18)

Fixpack
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.2
I_V1.1.0.3
PDOA FP7_FP3 rsct filesets are in APPLIED state. (Added: 2020-03-18)
In Appendix F of the PDOA FP6_FP2 and FP7_FP3 readme documents there is a check to see what AIX filesets are in the APPLIED state. This check is part of the instructions to commit GPFS filesets. If this check is run after TSA is updated, then it will reveal several RSCT filesets that are in the APPLIED state. Below is an excerpt from a V1.1 FP3 environment after TSA has been upgraded and committed.
  $ time dsh -n ${ALL} 'installp -s' 2>&1 | dshbak -c  HOSTS -------------------------------------------------------------------------  hostname01, hostname02, hostname03, hostname04, dancehostname05, dancehostname06, dancehostname07  -------------------------------------------------------------------------------   rsct.basic.rte                      USR     3.2.4.1                 APPLIED   rsct.basic.rte                      ROOT    3.2.4.1                 APPLIED   rsct.basic.rte                      USR     3.2.4.2                 APPLIED   rsct.basic.rte                      ROOT    3.2.4.2                 APPLIED   rsct.core.rmc                       ROOT    3.2.4.1                 APPLIED   rsct.core.rmc                       USR     3.2.4.1                 APPLIED   rsct.core.rmc                       ROOT    3.2.4.2                 APPLIED   rsct.core.rmc                       USR     3.2.4.2                 APPLIED   rsct.core.rmc                       USR     3.2.4.3                 APPLIED   rsct.core.rmc                       ROOT    3.2.4.3                 APPLIED   rsct.core.utils                     USR     3.2.4.1                 APPLIED   rsct.core.utils                     ROOT    3.2.4.1                 APPLIED   rsct.core.utils                     ROOT    3.2.4.2                 APPLIED   rsct.core.utils                     USR     3.2.4.2                 APPLIED   rsct.core.utils                     USR     3.2.4.3                 APPLIED   rsct.core.utils                     ROOT    3.2.4.3                 APPLIED   rsct.opt.stackdump                  ROOT    3.2.4.1                 APPLIED   rsct.opt.stackdump                  USR     3.2.4.1                 APPLIED   Installp Status   ---------------   Name                                Part    Level                   State   ---------------------------------------------------------------------------    real    0m1.02s  user    0m0.00s  sys     0m0.00s
Workaround:
-----------

here is no workaround as yet. As part of FP8_FP4 we will look at this state to determine if any actions need to be taken. PDOA has used the same TSA update model over many years and we expect that this issue has been around through all of those fixpacks and it was only due to additional checking introduced in FP6_FP2 that this issues was recognized.

There is no functionality issue with APPLIED filesets as the latest filesets are in use. This is evidenced by the checks performed during the TSA updates.

Fixed:
----------
N/A
KIG00058
PDOA FP7_FP3 has failures when running Power Firmware (PFW) updates in parallel on the same server (MTM) type. (Added: 2020-03-18)
Fixpack
I_V1.0.0.7
I_V1.1.0.3
PDOA FP7_FP3 has failures when running Power Firmware (PFW) updates in parallel on the same server (MTM) type.
In Stage 7 of the FP7_FP3 Readme there are instructions to update the Power Firmware (PFW) for one or more servers in the environment. Many customers are choosing to take a full outage during this time and are planning to apply the PFW in parallel on all of the servers instead of following the model of only updating the power firmware for Standby servers.
While this issue impacts all customers, the cost is felt for customers taking the outage windows and also have large environments.
During Stage 7 the customer may experience the following symptom:
fsp=server_fsp1,server_fsp2,server_fsp3,server_fsp4;s=$(date);echo "Starting at ${s}.";$PL_ROOT/bin/icmds/appl_ctrl_fsp update -install -l ${fsp} -f /BCU_share/FP7_FP3/firmware/server_fsp/22A_42A/image/imports;e=$(date);echo "Started: ${s} Ended: ${e}."
Starting at Thu Feb 20 23:47:55 IST 2020.
PFW:server_fsp1:0:Successfully updated
PFW:server_fsp4:0:Successfully updated
PFW:server_fsp2:1:update failed for server server_fsp2
PFW:server_fsp3:1:update failed for server server_fsp3
Updates failed for one or more CECs
The text above shows that two servers, server_fsp1 and server_fsp4 succeeded, while server_fsp2 and server_fsp3 failed. While it appears that fsp1 and fsp4 ran in a parallel, the pflayer actually runs updates in sets based on the MTM of the server. All PDOA environments at 1.5 DNs or higher have two different Power servers in the environment. 1 4U and 1 2U that have different MTMs. FSP1 and FSP4 have different MTMs and a close examination of the log would show that they ran about 30 to 40 minutes apart. The log will also show that the fsp2 and fsp3 failed almost immediately.
This is a confirmed limitation of the HMC firmware level shipped in FP7_FP3 which does not allow the updlic command to be run in a parallel.
Examination of the platform_layer.log will show the following messages:
[21 Feb 2020 00:55:17,846] <2360134 CTRL DEBUG flashdancehostname01> Successfully changed CEC power start policy as autostart
[21 Feb 2020 00:55:17,883] <1508172 CTRL DEBUG flashdancehostname01> Starting update for -> server_fsp3
[21 Feb 2020 00:55:17,886] <1508172 CTRL DEBUG flashdancehostname01> Executing command on hmc 172.23.1.245 => LANG=en_US /usr/bin/ssh hscroot@172.23.1.245 updlic -m Data1-8284-22A-SN216B47V -o u -t sys -l 01SV860_205 -r mountpoint -d /home/hscroot/01SV860_205_165
[21 Feb 2020 00:55:18,263] <2360134 CTRL DEBUG flashdancehostname01> update failed for server_fsp3
This is not a failure in terms of applying the power firmware but a failure from the updlic command which won't allow this update to proceed while another update is running.
This scenario has not impacted any of the previous PDOA fixpacks.
Workaround:
-----------
This is not a functional issue, but rather a time issue.
There is no need for a workaround for 0.5 DN systems as PDOA updates the foundation hosts serially.
For smaller systems consider adding the time costs to the outage windows, about 1 hour per Server.
One workaround is to follow the Stage 7 model for those updates and only update Quiesced Servers. This is more tedious but allows the updates to happen outside of outage windows.
Another workaround is to use the HMC GUI to run the updates. The HMC GUI should support running the PFW updates in parallel for servers of the same type.
PDOA development has not tested this method.
Fixed:
----------
N/A.
Outside of the GUI workaround to run in parallel there is no fix from the HMC team available to address this at the command line.
KIG00004

alt_disk_install fails with long lv names.(Added: 2020-03-18)

Fixpack
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
alt_disk_install fails with long lv names.(Added: 2020-03-18)
PDOA fixpack processes have use hdisk cloning from AIX as part of our fixpack procedures for a long time. This is a known model for updating AIX where in a two disk rootvg mirror the mirror is broken and a copy or clone of the running disk is created.
In the event of a failure or corruption during the AIX update (or any change to rootvg) it is then possible to boot using the cloned disk to get back to the original rootvg .
As part of this process however there are some potential for LVs that are created after PDOA was shipped to the customer to cause the cloning step to fail. In this case LVs with names that are too long may cause a falure like the one below.
0505-129 alt_disk_install: The rootvg contains logical
volume name(s):
01234567890lv,
which exceed the 11 character limitation. To correct this
problem, unmount the logical volume(s). Then, rename and
mount the logical volume(s) and retry the command.

 
For customers applying V1.0 FP5 or earlier or V1.1 FP1 this will prevent the fixpack during management and core phases.
For customers applying V1.0 FP6+ or V1.1 FP2+ this will prevent updates on that LPAR in Stage 6 (management) or Stage 7(core).
It is possible to check this in a PDOA environment using the following command. This command checks for LVs with 9 or more (max=8) characters. The max=8 will return on all PDOA servers to validate the command works, update to max=11 to find lvs that are problematic.
dsh -n ${ALL} 'max=8;lsvg -l rootvg | egrep -v "^rootvg:|^LV NAME" | while read lvn rest;do l=$(echo ${lvn} | wc -c);if [ ${l} -gt ${max} ];then echo "${lvn} exceeds ${max} chars at ${l} characters long.";fi;done' | dshbak -c
HOSTS -------------------------------------------------------------------------
reverseflash01
-------------------------------------------------------------------------------
hd11admin exceeds 8 chars at 10 characters long.
lg_dumplv exceeds 8 chars at 10 characters long.
livedump exceeds 8 chars at 9 characters long.
paging00 exceeds 8 chars at 9 characters long.
gsacache exceeds 8 chars at 9 characters long.
HOSTS -------------------------------------------------------------------------
reverseflash02, reverseflash03, reverseflash04, reverseflash05, reverseflash06
-------------------------------------------------------------------------------
paging00 exceeds 8 chars at 9 characters long.
hd11admin exceeds 8 chars at 10 characters long.
lg_dumplv exceeds 8 chars at 10 characters long.
livedump exceeds 8 chars at 9 characters long.
Workaround:
-----------

To proceed rename any logical volumes that exceed the 11 character limit.

For FP5_FP1 and earlier resume. For FP6_FP2 and higher rerun the alt_disk_install step to clone rootvg once the LV have been renamed.

Fixed:
----------
N/A. There is no fix for this issue as it is a limitation of the cloning procedure used by PDOA.
KIG00057
PDOA Power Firmware Update process leaves LPAR with a BA218001 error code. (Added: 2020-03-18)
Fixpack I_V1.1.0.3
PDOA Power Firmware Update process leaves LPAR with a BA218001 error code. (Added: 2020-03-18)
During the Power Firmware update stage on V1.1.0.3 a customer experienced an LPAR that did not start. In this case the case was SRC BA218001.
This issue is rare but it matched another issue with a Power 9 system in a similar scenario. The root cause appears to be a an incompatibility that exists between the Fiber Channel Firmware and the Power Firmware which results in stack-underflow during POST.
Workaround:
-----------
Once this issue is hit it is important to open a ticket with IBM as this remedy may not fix all issues matching this SRC.
Once the ticket it opened it is possible to attempt the workaround. This is the advice from a similar ticket.
The workaround that has been devised and found to work for this problem, until the root cause is established and fixes are released, is as follows:
1. Shut down the partition
2. Change the partition profile to remove any/all fibre channel adapters
3. Activate the partition being sure to specify the partition profile with the change (the one with no fibre adapters)
4. Let partition boot - it will likely stop with a CA00E175 or AA00E1A9. Make sure it does not hang with any BA21xxxx condition.
5. Again shut down the partition
6. Modify the partition profile to add the desired fibre channel adapters back
7. Activate the partition again being sure to specify the partition profile with the change
8. Make sure the partition does not hang with BA210001 or BA218001 9. Retry the VIOS install
For PDOA systems we followed something slightly different.
Pre-Requisites:
You will need the hscroot password for one of the HMCs on the PDOA environment.
You will need browser access to the HMC and can login as hscroot.
You will need to identify the Server and associated LPAR. For Foundation Servers there are two LPARs/Server.
Login the HMC as hscroot.
1. Find the Server hosting the LPAR with the SRC code.
2. Shut down the LPAR in the HMC.
3. Find the Managed Profiles window for the LPAR.
4. Create a copy of the current profile. For example, if the profile name is 'adm_node' create a copy called 'adm_node_nofcs'
5. Edit the profile copy to remove all HBA adapters from teh lpar assignment. In the I/O tab there will be slots listed with 'Required'.  Select all Fibre Channel Adapters that are currently required and hit the Remove button.
6. Attempt to start  or Activate the LPAR and choose the newly copied profile.
7. If the system boots login to the environment. If the LPAR is still not active, stop and work with support for further troubleshooting.
8. On boot,GPFS will start automatically, you may want to run '/usr/lpp/mmfs/bin/mmumount all' and then '/usr/lpp/mmfs/bin/mmshutdown' to cleanly stop it.
9. Shutdown the server from the command line but do not reboot it. 'shutdown +0'.
10. In the HMC verify the LPAR is not active.
11. Active the LPAR, this time choose the original profile.
12. Verify that the LPAR has booted and all FC cards are available and not Defined.
13. Delete the copied profille from the Managed Profiles page for that LPAR.
Fixed:
----------
There is no fix available for PDOA V1.1.0.3's fixpack applcation.
KIG00066

XML Load In PDOA Systems Returns SQL1406N Shared sort memory cannot be allocated for this utility. (Added: 2020-03-20)

General
I_V1.0
I_V1.1

XML Load In PDOA Systems Returns SQL1406N Shared sort memory cannot be allocated for this utility.

When trying to execute a LOAD into an XML table, you may see the following errors.

Agent Type Node SQL Code Result


LOAD 001 -00001406 Error. RESTART required.


LOAD 002 -00001406 Error. RESTART required.


LOAD 003 -00001406 Error. RESTART required.


LOAD 004 -00001406 Error. RESTART required.


LOAD 005 -00001406 Error. RESTART required.


LOAD 006 -00001406 Error. RESTART required.


LOAD 007 -00001406 Error. RESTART required.


LOAD 008 -00001406 Error. RESTART required.


LOAD 009 -00001406 Error. RESTART required.


LOAD 010 -00001406 Error. RESTART required.


LOAD 011 -00001406 Error. RESTART required.


LOAD 012 -00001406 Error. RESTART required.


LOAD 013 -00001406 Error. RESTART required.


LOAD 014 -00001406 Error. RESTART required.


LOAD 015 -00001406 Error. RESTART required.


LOAD 016 -00001406 Error. RESTART required.


LOAD 017 -00001406 Error. RESTART required.


LOAD 018 -00001406 Error. RESTART required.


LOAD 019 -00001406 Error. RESTART required.


LOAD 020 -00001406 Error. RESTART required.


LOAD 021 -00001406 Error. RESTART required.


LOAD 022 -00001406 Error. RESTART required.


LOAD 023 -00001406 Error. RESTART required.


LOAD 024 -00001406 Error. RESTART required.


LOAD 025 -00001406 Error. RESTART required.


RESULTS: 0 of 25 LOADs completed successfully.

This appears to be a memory issue, but even with 0 rows to load the error will appear.

Workaround:
-----------

This type of load operation is not supported on PDOA environments in general as it requires shared memory to be enabled. PDOA environments are primarily configured to use private sorts. Changing the memory parameters to use shared sorts will allow this type of operation to work, however doing so should be considered a tuning exercise that will ensure existing workloads and SLAs are not impacted.

Primarily this is due to the following settings: (examples are from V1.1 systems)

DBM CFG:

SHEAPTHRES 2800000

DB CFG:

INTRA_PARALLEL NO
SHEAPTHRES_SHR 5000
SORTHEAP 50000

See https://www.ibm.com/support/knowledgecenter/en/SSH2TE_1.1.0/com.ibm.7700.r2.common.doc/doc/c00000109.html for more information.

For Db2 Requirements for XML Load refer to this link.

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.xml.doc/doc/c0024119.html

Fixed:
----------
N/A
KIG00067
/var/log/syslog.out shows new messages for ssh. (Added: 2020-03-24)
Fixpack
I_V1.0.0.5_IF01
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.1_IF01
I_V1.1.0.2
I_V1.1.0.3
/var/log/syslog.out shows new messages for ssh.
After applying a fixpack or interim fix that updates AIX to 7.1 TL5, sshd reports several deprecated or unsupported options in the system log.
Oct  4 18:01:28 kf5hostname04 auth|security:info sshd[6554284]: rexec line 31: Deprecated option KeyRegenerationInterval
Oct  4 18:01:28 kf5hostname04 auth|security:info sshd[6554284]: rexec line 47: Deprecated option RSAAuthentication
Oct  4 18:01:28 kf5hostname04 auth|security:info sshd[6554284]: rexec line 52: Deprecated option RhostsRSAAuthentication
Oct  4 18:01:28 kf5hostname04 auth|security:info sshd[6554284]: rexec line 96: Unsupported option PrintLastLog
Oct  4 18:01:28 kf5hostname04 auth|security:info sshd[6554284]: rexec line 99: Deprecated option UsePrivilegeSeparation
Oct  4 18:01:28 kf5hostname04 auth|security:info sshd[6554284]: reprocess config line 47: Deprecated option RSAAuthentication
Oct  4 18:01:28 kf5hostname04 auth|security:info sshd[6554284]: reprocess config line 52: Deprecated option RhostsRSAAuthentication
The AIX update includes updates to OpenSSH where these options are either deprecated or unsupported.
These options were originally setup in the default /etc/sshd_config files as part of the PDOA deployment.
$ dsh -n ${ALL} 'egrep "KeyRegenerationInterval|RSAAuthentication|RhostsRSAAuthentication|PrintLastLog|UsePrivilegeSeparation" /etc/ssh/ssh*_config' | dshbak -c
HOSTS -------------------------------------------------------------------------
flashdancehostname01, flashdancehostname02, flashdancehostname03, flashdancehostname04, flashdancehostname05, flashdancehostname06, flashdancehostname07
-------------------------------------------------------------------------------
/etc/ssh/ssh_config:#   RhostsRSAAuthentication no
/etc/ssh/ssh_config:#   RSAAuthentication yes
/etc/ssh/sshd_config:KeyRegenerationInterval 1h
/etc/ssh/sshd_config:RSAAuthentication yes
/etc/ssh/sshd_config:RhostsRSAAuthentication no
/etc/ssh/sshd_config:# RhostsRSAAuthentication and HostbasedAuthentication
/etc/ssh/sshd_config:PrintLastLog yes
/etc/ssh/sshd_config:UsePrivilegeSeparation yes
These messages will appear for for each ssh session initiated on the server which can lead to unnecessary increased syslog traffic.
Workaround:
-----------
These options can be commented out in the /etc/ssh/sshd_config and /etc/ssh/ssh_config files.
Great care should be taken before updating /etc/ssh/sshd_config files and refreshing the sshd daemon as this could cause unexpected outages if there are errors in the config files. GPFS and Db2 are two appliance components dependent on ssh.
Always ensure that it is possible to login via the HMC to this LPAR using vtmenu on the command line or through Console connections through the HMC GUI.
To test sshd_config file edits:
As root create a sandbox location to test sshd_config updates.
$ mkdir /tmp/ssh_test
$ cp /etc/ssh/sshd_config /tmp/ssh_test/
$ cd /tmp/ssh_test/
Edit the file by commenting out KeyRegenerationInterval,RSAAuthentication,RhostsRSAAuthentication,PrintLastLog.
$ egrep "KeyRegenerationInterval|RSAAuthentication|RhostsRSAAuthentication|PrintLastLog|UsePrivilegeSeparation" /tmp/ssh_test/sshd_config
#KeyRegenerationInterval 1h
#RSAAuthentication yes
#RhostsRSAAuthentication no
# RhostsRSAAuthentication and HostbasedAuthentication
#PrintLastLog yes
#UsePrivilegeSeparation yes
Create a separate sshd session on port 10022 using the new configuration file.
This will not run in the background and will not fork any processes.
$ $(which sshd) -d -D -p 10022 -f /tmp/ssh_test/sshd_config
debug1: sshd version OpenSSH_7.5, OpenSSL 1.0.2o  27 Mar 2018
debug1: private host key #0: ssh-rsa SHA256:1TkrCRt7BFWLrPs+LIi51tmoChteTRNLKQ9LUszCBXk
debug1: private host key #1: ssh-dss SHA256:CHyQxuRUB2No3hP5k+Bj8GPeYzFKGdiSlnKt5oU/SE8
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-d'
debug1: rexec_argv[2]='-D'
debug1: rexec_argv[3]='-p'
debug1: rexec_argv[4]='10022'
debug1: rexec_argv[5]='-f'
debug1: rexec_argv[6]='/tmp/ssh_test/sshd_config'
debug1: Bind to port 10022 on 0.0.0.0.
Server listening on 0.0.0.0 port 10022.
debug1: Bind to port 10022 on ::.
Server listening on :: port 10022.
In a separate window, use ssh to login to this sshd deamon.
$ ssh -p 10022 flashdancehostname01
debug1: AIX/loginsuccess: msg Last unsuccessful login: Sun Mar  8 13:43:58 IST 2020 on rexec from motte2.canlab.ibm.com
Last login: Tue Mar 24 19:17:57 IST 2020 on ssh from 172.23.1.1
debug1: audit session open euid 0 user root tty name /dev/pts/6
Last unsuccessful login: Sun Mar  8 13:43:58 IST 2020 on rexec from motte2.canlab.ibm.com
Last login: Tue Mar 24 19:16:06 IST 2020 on /dev/pts/6 from 172.23.1.1
*******************************************************************************
*                                                                             *
*                                                                             *
*  Welcome to AIX Version 7.1!                                                *
*                                                                             *
*                                                                             *
*  Please see the README file in /usr/lpp/bos for information pertinent to    *
*  this release of the AIX Operating System.                                  *
*                                                                             *
*                                                                             *
*******************************************************************************
debug1: ACCESS KEy before calling efslogin:
debug1: permanently_set_uid: 0/0
Environment:
  USER=root
  LOGNAME=root
  LOGIN=root
  HOME=/
  PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java5/jre/bin:/usr/java5/bin:/usr/lpp/htx/etc/scripts:/test/tools:/usr/lpp/htx/test/tools:/home/monitor/test/tools:/nim/build_net/tools:/u
  MAIL=/var/spool/mail/root
  SHELL=/usr/bin/ksh
  TZ=Asia/Calcutta
  SSH_CLIENT=172.23.1.1 65207 10022
  SSH_CONNECTION=172.23.1.1 65207 172.23.1.1 10022
  SSH_TTY=/dev/pts/6
  TERM=xterm
  AUTHSTATE=compat
  LANG=en_US
  LOCPATH=/usr/lib/nls/loc
  NLSPATH=/usr/lib/nls/msg/%L/%N:/usr/lib/nls/msg/%L/%N.cat:/usr/lib/nls/msg/%l.%c/%N:/usr/lib/nls/msg/%l.%c/%N.cat
  LC__FASTMSG=true
  ODMDIR=/etc/objrepos
  CLCMD_PASSTHRU=1
  MANPATH=/opt/ibm/director/man
  NUM_PARALLEL_LPS=2
(0) root @ flashdancehostname01: 7.1.0.0: /
$ exit
Connection to flashdancehostname01 closed.
After logging in and exiting the following messages will appear on your sshd debug session and it will exit.
debug1: fd 5 clearing O_NONBLOCK
debug1: Server will not fork when running in debugging mode.
debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
debug1: inetd sockets after dupping: 3, 3
debug1: audit connection from 172.23.1.1 port 65207 euid 0
Connection from 172.23.1.1 port 65207 on 172.23.1.1 port 10022
debug1: Client protocol version 2.0; client software version OpenSSH_7.5
debug1: match: OpenSSH_7.5 pat OpenSSH* compat 0x04000000
debug1: Local version string SSH-2.0-OpenSSH_7.5
debug1: Enabling compatibility mode for protocol 2.0
debug1: Failed dlopen: /usr/krb5/lib/libkrb5.a(libkrb5.a.so):   0509-022 Cannot load module /usr/krb5/lib/libkrb5.a(libkrb5.a.so).
        0509-026 System error: A file or directory in the path name does not exist.
debug1: Error loading Kerberos, disabling the Kerberos auth
debug1: permanently_set_uid: 202/201 [preauth]
debug1: list_hostkey_types: ssh-rsa,rsa-sha2-512,rsa-sha2-256 [preauth]
debug1: SSH2_MSG_KEXINIT sent [preauth]
debug1: SSH2_MSG_KEXINIT received [preauth]
debug1: kex: algorithm: curve25519-sha256 [preauth]
debug1: kex: host key algorithm: rsa-sha2-512 [preauth]
debug1: kex: client->server cipher: aes128-ctr MAC: umac-64-etm@openssh.com compression: none [preauth]
debug1: kex: server->client cipher: aes128-ctr MAC: umac-64-etm@openssh.com compression: none [preauth]
debug1: expecting SSH2_MSG_KEX_ECDH_INIT [preauth]
debug1: rekey after 4294967296 blocks [preauth]
debug1: SSH2_MSG_NEWKEYS sent [preauth]
debug1: expecting SSH2_MSG_NEWKEYS [preauth]
debug1: SSH2_MSG_NEWKEYS received [preauth]
debug1: rekey after 4294967296 blocks [preauth]
debug1: KEX done [preauth]
debug1: userauth-request for user root service ssh-connection method none [preauth]
debug1: attempt 0 failures 0 [preauth]
debug1: userauth-request for user root service ssh-connection method publickey [preauth]
debug1: attempt 1 failures 0 [preauth]
debug1: userauth_pubkey: test whether pkalg/pkblob are acceptable for RSA SHA256:1TkrCRt7BFWLrPs+LIi51tmoChteTRNLKQ9LUszCBXk [preauth]
debug1: temporarily_use_uid: 0/0 (e=0/0)
debug1: trying public key file //.ssh/authorized_keys
debug1: fd 5 clearing O_NONBLOCK
debug1: matching key found: file //.ssh/authorized_keys, line 1 RSA SHA256:1TkrCRt7BFWLrPs+LIi51tmoChteTRNLKQ9LUszCBXk
debug1: restore_uid: 0/0
debug1: Failed to collect Cookie from Keystore
debug1: Keystore Opening wil be failed after login
debug1: Cookie received :
 [preauth]
Postponed publickey for root from 172.23.1.1 port 65207 ssh2 [preauth]
debug1: userauth-request for user root service ssh-connection method publickey [preauth]
debug1: attempt 2 failures 0 [preauth]
debug1: temporarily_use_uid: 0/0 (e=0/0)
debug1: trying public key file //.ssh/authorized_keys
debug1: fd 8 clearing O_NONBLOCK
debug1: matching key found: file //.ssh/authorized_keys, line 1 RSA SHA256:1TkrCRt7BFWLrPs+LIi51tmoChteTRNLKQ9LUszCBXk
debug1: restore_uid: 0/0
debug1: Failed to collect Cookie from Keystore
debug1: Keystore Opening wil be failed after login
debug1: Cookie received :
 [preauth]
Accepted publickey for root from 172.23.1.1 port 65207 ssh2: RSA SHA256:1TkrCRt7BFWLrPs+LIi51tmoChteTRNLKQ9LUszCBXk
debug1: AIX/loginsuccess: msg Last unsuccessful login: Sun Mar  8 13:43:58 IST 2020 on rexec from motte2.canlab.ibm.com
Last login: Tue Mar 24 19:16:06 IST 2020 on /dev/pts/6 from 172.23.1.1
debug1: monitor_child_preauth: root has been authenticated by privileged process
debug1: Entering sshefs_option_check [preauth]
debug1: AllowPkcs12KeystoreAutoOpen option not set [preauth]
debug1: EFS ACESS KEY:
 [preauth]
debug1: monitor_read_log: child log fd closed
debug1: audit event euid 0 user root event 2 (SSH_authsuccess)
debug1: Return Val-1 for auditproc:0
debug1: rekey after 4294967296 blocks
debug1: rekey after 4294967296 blocks
debug1: ssh_packet_set_postauth: called
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch
debug1: server_input_channel_open: ctype session rchan 0 win 1048576 max 16384
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: server_input_global_request: rtype no-more-sessions@openssh.com want_reply 0
debug1: server_input_channel_req: channel 0 request pty-req reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/6
debug1: server_input_channel_req: channel 0 request shell reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req shell
debug1: Values: options.num_allow_users: 0
debug1: RLOGIN VALUE  :1
debug1: AIX/loginsuccess: msg Last unsuccessful login: Sun Mar  8 13:43:58 IST 2020 on rexec from motte2.canlab.ibm.com
Last login: Tue Mar 24 19:17:57 IST 2020 on ssh from 172.23.1.1
Starting session: shell on pts/6 for root from 172.23.1.1 port 65207 id 0
setsid: Operation not permitted.
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 5177780
debug1: session_exit_message: session 0 channel 0 pid 5177780
debug1: session_exit_message: release channel 0
debug1: session_pty_cleanup: session 0 release /dev/pts/6
debug1: audit session close euid 0 user root tty name /dev/pts/6
Received disconnect from 172.23.1.1 port 65207:11: disconnected by user
Disconnected from user root 172.23.1.1 port 65207
debug1: do_cleanup
debug1: audit event euid 0 user root event 12 (SSH_connabndn)
debug1: Return Val-1 for auditproc:0
(255) root @ flashdancehostname01: 7.1.0.0: /tmp/ssh_test
$
Once a file is known good, then it can be copied to /etc/ssh/sshd_config and the sshd deamon can be restarted.
Fixed:
----------
N/A
KIG00072

FP7_FP3 Readme Appendix M: mksysb command does not capture /bosinst.data for host

 (Added: 2020-04-15)
Fixpack
I_V1.0.0.7
I_V1.1.0.3
FP7_FP3 Readme Appendix M: mksysb command does not capture /bosinst.data for host
The command documented in V101 of the FP7_FP3 Readme does not copy the bosinst.data along with the mksysb and image.data files. When registering a mksysb image in NIM all three files are necessary.
The documented command is:
time dsh -n ${ALL} 'dir=/stage/backups/FP7_FP3/kf1/$(hostname);mkdir -p ${dir};mksysb -ip ${dir}/$(hostname).mksysb;cp /image.data ${dir}'
Workaround:
-----------
a) If a mksysb has been taken and it is not possible to retrieve the /bosinst.data file from the host and it is necessary to restore the host using mksysb.
The bosint.data file is documented in the AIX Knowledge Center.
While the /bosinst.data file has specific information about the host, that information can be omitted.
In a PDOA environment you can see the differences per host by running the following as root on the management host.
dsh -n ${ALL} 'ssh 172.23.1.1 "cat /bosinst.data" | diff /bosinst.data -' | dshbak -c
Here is an example of the difference between a management host and the standby management host.
HOSTS -------------------------------------------------------------------------
flashdancehostname03
-------------------------------------------------------------------------------
4c4
<     CONSOLE = /dev/vty0
---
>     CONSOLE = Default
7c7
<     PROMPT = no
---
>     PROMPT = yes
22c22
<     DESKTOP =
---
>     DESKTOP = NONE
48c48
<     BOSINST_LANG = en_US
---
>     BOSINST_LANG = C
55,57c55,57
<       PVID = 00f968c1cfa4c789
<   PHYSICAL_LOCATION = U78C9.001.WZS02HM-P1-C14-T1-L205DA5D000-L0
<       CONNECTION = sas0//205da5d000,0
---
>       PVID = 00f968bfcf3f7fcc
>   PHYSICAL_LOCATION = U78C9.001.WZS02F5-P1-C14-T1-L205DA55500-L0
>       CONNECTION = sas0//205da55500,0
Management host:
-----------------------
(0) root @ flashdancehostname01: 7.1.0.0: /
$ cat /bosinst.data
# Basic bosinst_data file created by NIM
control_flow:
    CONSOLE = Default
    INSTALL_METHOD = overwrite
    INSTALL_EDITION = standard
    PROMPT = yes
    EXISTING_SYSTEM_OVERWRITE = yes
    INSTALL_X_IF_ADAPTER = yes
    RUN_STARTUP = yes
    RM_INST_ROOTS = no
    ERROR_EXIT =
    CUSTOMIZATION_FILE =
    TCB = no
    INSTALL_TYPE =
    BUNDLES =
    SWITCH_TO_PRODUCT_TAPE =
    RECOVER_DEVICES = Default
    BOSINST_DEBUG = no
    ACCEPT_LICENSES = yes
    ACCEPT_SWMA =
    DESKTOP = NONE
    INSTALL_DEVICES_AND_UPDATES = yes
    IMPORT_USER_VGS =
    CREATE_JFS2_FS = yes
    ALL_DEVICES_KERNELS = yes
    GRAPHICS_BUNDLE =
    SYSTEM_MGMT_CLIENT_BUNDLE =
    FIREFOX_BUNDLE =
    KERBEROS_5_BUNDLE =
    SERVER_BUNDLE =
    ALT_DISK_INSTALL_BUNDLE =
    REMOVE_JAVA_118 =
    HARDWARE_DUMP =
    ADD_CDE =
    ADD_GNOME =
    ADD_KDE =
    ERASE_ITERATIONS = 0
    ERASE_PATTERNS =
    MKSYSB_MIGRATION_DEVICE =
    TRUSTED_AIX =
    TRUSTED_AIX_LSPP =
    TRUSTED_AIX_SYSMGT =
    SECURE_BY_DEFAULT =
    ADAPTER_SEARCH_LIST =
locale:
    BOSINST_LANG = C
    CULTURAL_CONVENTION = en_US
    MESSAGES = en_US
    KEYBOARD = en_US

target_disk_data:
        PVID = 00f968bfcf3f7fcc
  PHYSICAL_LOCATION = U78C9.001.WZS02F5-P1-C14-T1-L205DA55500-L0
        CONNECTION = sas0//205da55500,0
        LOCATION = 03-00-00
        SIZE_MB = 544792
        HDISKNAME = hdisk0
Standby Managmeent Host:
---------------------------------
$ ssh flashdancehostname03 cat /bosinst.data
# Basic bosinst_data file created by NIM
control_flow:
    CONSOLE = /dev/vty0
    INSTALL_METHOD = overwrite
    INSTALL_EDITION = standard
    PROMPT = no
    EXISTING_SYSTEM_OVERWRITE = yes
    INSTALL_X_IF_ADAPTER = yes
    RUN_STARTUP = yes
    RM_INST_ROOTS = no
    ERROR_EXIT =
    CUSTOMIZATION_FILE =
    TCB = no
    INSTALL_TYPE =
    BUNDLES =
    SWITCH_TO_PRODUCT_TAPE =
    RECOVER_DEVICES = Default
    BOSINST_DEBUG = no
    ACCEPT_LICENSES = yes
    ACCEPT_SWMA =
    DESKTOP =
    INSTALL_DEVICES_AND_UPDATES = yes
    IMPORT_USER_VGS =
    CREATE_JFS2_FS = yes
    ALL_DEVICES_KERNELS = yes
    GRAPHICS_BUNDLE =
    SYSTEM_MGMT_CLIENT_BUNDLE =
    FIREFOX_BUNDLE =
    KERBEROS_5_BUNDLE =
    SERVER_BUNDLE =
    ALT_DISK_INSTALL_BUNDLE =
    REMOVE_JAVA_118 =
    HARDWARE_DUMP =
    ADD_CDE =
    ADD_GNOME =
    ADD_KDE =
    ERASE_ITERATIONS = 0
    ERASE_PATTERNS =
    MKSYSB_MIGRATION_DEVICE =
    TRUSTED_AIX =
    TRUSTED_AIX_LSPP =
    TRUSTED_AIX_SYSMGT =
    SECURE_BY_DEFAULT =
    ADAPTER_SEARCH_LIST =
locale:
    BOSINST_LANG = en_US
    CULTURAL_CONVENTION = en_US
    MESSAGES = en_US
    KEYBOARD = en_US

target_disk_data:
        PVID = 00f968c1cfa4c789
  PHYSICAL_LOCATION = U78C9.001.WZS02HM-P1-C14-T1-L205DA5D000-L0
        CONNECTION = sas0//205da5d000,0
        LOCATION = 03-00-00
        SIZE_MB = 544792
        HDISKNAME = hdisk0
For a missing /bosinst.data file it is possible to use one from another host and edit the 'target_disk_data' stanza to blank to remove system specific details:
target_disk_data:
    LOCATION =
    SIZE_MB =
    HDISKNAME =
b) If it is not necessary to restore the host via mksysb as the host is available.
i. If mksysb has been taken already. Copy the /bosinst.data file to the mksysb directory on /stage. On the host as root:
dir=/stage/backups/FP7_FP3/kf1/$(hostname)
cp /bosinst.data ${dir}
ii. If mksysb has not been taken already use the following command instead.
dsh -n ${ALL} 'dir=/stage/backups/FP7_FP3/FP3/$(hostname);mkdir -p ${dir};mksysb -ip ${dir}/$(hostname).mksysb;cp /image.data ${dir};cp /bosinst.data ${dir}'
Fixed:
----------
Targeted to be fixed in FP8_FP4 Readme.
If there is an update to the FP7_FP3 readme it will be addressed in that update.
KIG00073
FP7_FP3: BNT Update May Fail with on one or more switches with: BNT:net2:172.23.1.252:1:Compare of firmware failed for switch after copy. (Added: 2020-04-18)
Fixpack
I_V1.0.0.7
I_V1.1.0.3
FP7_FP3: BNT Update May Fail with on one or more switches with: BNT:net2:172.23.1.252:1:Compare of firmware failed for switch after copy.
As part of Stage 8 the BNT switches are updated. In some enviroments one or more BNT switches will report a failure message after attempting to update.
$ /opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_net update -install -l "net0,net1,net2,net3" -f /BCU_share/FP7_FP3/firmware/net
BNT:net3:172.23.1.251:0:Compare of firmware success for switch after copy.
BNT:net1:172.23.1.253:0:Compare of firmware success for switch after copy.
BNT:net0:172.23.1.254:0:Compare of firmware success for switch after copy.
BNT:net2:172.23.1.252:1:Compare of firmware failed for switch after copy.
Verify the symptom by logging into the switch.
Login to the switch that failed to verify the symptom matches. In the above case the switch is 172.23.1.252.
$ ssh admin@172.23.1.252
flashdance64c1>show boot
Current running image version:7.11.8
Currently set to boot software image1, active config block.
NetBoot: disabled, NetBoot tftp server:  , NetBoot cfgfile:
Current boot Openflow protocol version: 1.0
USB Boot: disabled
Currently profile is default, set to boot with default profile next time.
Current FLASH software:
  image1: version 7.11.8, downloaded 23:55:36 Wed Feb  5, 2020
          NormalPanel
  image2: version 7.11.15, downloaded  3:15:03 Thu Feb 20, 2020
          NormalPanel
  boot kernel: version 7.11.8
Currently scheduled reboot time: none
The boot kernel: version says 7.11.8 (if updating from FP5_FP1) or 7.11.11 (if updating from FP6_FP2).
During the attempt to copy the boot kernal update to the switch the scp or outer pflayer failed and the image was not fully copied.
Verify that the boot kernel version is updated.
Workaround:
-----------
1. Manually copy the boot kernel image to the affected switch. As root on the management host.
Replace the 172.23.1.252 with the ip address of the switch with the issue. Note that scp will report 100% but will not complete right away.
The file putboot is a /proc filesystem link so the switch will post-process the file during the scp session. The timing below is about how long the scp session should last.
scp -l 500 /BCU_share/FP7_FP3/firmware/net/8264/boot_image/G8264-RS-7.11.15.0_Boot.img admin@172.23.1.252:putboot
Enter login password:
Switch: executing scp command - putboot.
G8264-RS-7.11.15.0_Boot.img                                                                                                                                                                              100%   10MB  45.7KB/s   03:53
2. Login to the switch to verify the boot kernel image is updated.
$ ssh admin@172.23.1.252
flashdance64c1>show boot
Current running image version:7.11.8
Currently set to boot software image1, active config block.
NetBoot: disabled, NetBoot tftp server:  , NetBoot cfgfile:
Current boot Openflow protocol version: 1.0
USB Boot: disabled
Currently profile is default, set to boot with default profile next time.
Current FLASH software:
  image1: version 7.11.8, downloaded 23:55:36 Wed Feb  5, 2020
          NormalPanel
  image2: version 7.11.15, downloaded  3:15:03 Thu Feb 20, 2020
          NormalPanel
  boot kernel: version 7.11.15
Currently scheduled reboot time: none
3. Rerun the command to update the switches. It should proceed through and update the switches which will include a switch reboot. Be sure to run the command in a screen session or console session from the HMC.
Fixed:
----------
N/A.
KIG00093
The df command shows inconsistent or incorrect results on PDOA AIX hosts for GPFS or Spectrum Scale filesystems.
(Added: 2020-06-23)
General
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.2
I_V1.1.0.3
On PDOA systems at V1.0.0.6 or V1.1.0.2 and higher customers who use the df command for checking disk usage have opened tickets indicating one or more of the following symptoms:
a) The df command reports different disk and inode usage across multiple hosts for the same filesystem.
b) The df command reports 100% disk usage for a filesystem that when checked against the GPFS command /usr/lpp/mmfs/bin/mmdf incorrectly shows the filesystem is full when it is not.
c) The df command reports 100% inode usage for a filesystem that when checked against the GPFS command /usr/lpp/mmfs/bin/mmdf incorrectly shows the filesystem is out of inodes when it is not.
Since df is used as a health check mechanism this discrepancy can lead to unnecessary alerts or actions in an attempt to alleviate the discrepancy.
Workaround:
-----------
To workaround this issue it is possible to use the 'mmdf' command to synchronize the data provided to the df command for a particular filesystem on a particular host. By default this command is not in the path for any user on the pdoa envrionment.
/usr/lpp/mmfs/bin/mmdf <filesystem>
Where <filesystem> is replaced with the filesystem device name.
There are two options for using this command as described below.
a) For any filesystem that is reporting 100% disk or 100% inode full use the following command on the host.
Replace '/db2home' with the absolute filesystem path. Be sure to leave the "^" and the ":" at the beginning and end of the filesystem path to only pick the filesystem with the discrepancy. This command will retrieve the device name that can be used with the mmdf command.
$ lsfs -c | grep "^/db2home:" | cut -d: -f 2 | xargs -n 1 /usr/lpp/mmfs/bin/mmdf
disk                disk size  failure holds    holds              free KB             free KB
name                    in KB    group metadata data        in full blocks        in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 2.4 TB)
nsddb2home          314572800       -1 yes      yes       175472384 ( 56%)         38008 ( 0%)
                -------------                         -------------------- -------------------
(pool total)        314572800                             175472384 ( 56%)         38008 ( 0%)
                =============                         ==================== ===================
(total)             314572800                             175472384 ( 56%)         38008 ( 0%)
Inode Information
-----------------
Number of used inodes:           10259
Number of free inodes:          297005
Number of allocated inodes:     307264
Maximum number of inodes:       307264
After running mmdf on the host with the discrepancy, the df command should show the correct values.
b) For a broader method, run the following command on a daily basis.
$ time dsh -n ${ALL} -f 1 'lsfs -c | grep mmfs | cut -d: -f2 | while read x;do echo "${x}"; /usr/lpp/mmfs/bin/mmlsfs $x > /dev/null 2>&1 && /usr/lpp/mmfs/bin/mmdf $x;done'
This command is typically run as root unless another user has been enabled for DSH. This command takes 35 minutes on a 2.5 DN V1.1 environment so it is a long running command. The mmdf command is governed by locking mechanisms in GPFS which prevent multiple mmdf commands from running on the same filesystem at the same time across the clusters.
Fixed:
----------
For IBM Spectrum Scale (GPFS) based filesystems the best command to use to check for filesystem health is the /usr/lpp/mmfs/bin/mmdf command for the most accurate results.
The IBM Spectrum Scale Knowledge Center contains a link describing how to query filesystem space.
The documentation for mmdf can be found here:
In PDOA systems the following command line illustrates how to report filesystem disk and inode usage for a single host. Row one is the filesystem, row two is the disk usage and free statistics, row three is the inode usage and free statistics.
$ mount | grep -i " mmfs " | while read fd fs rest;do echo "$fs:";/usr/lpp/mmfs/bin/mmdf $fd -Y | egrep 'fsTotal:|inode:' | grep -v HEADER | cut -d: -f 7,8;done
/opmfs:
838860800:830921984
18212:481820
/db2home:
314572800:175472384
10259:297005
/dwhome:
10485760:10018304
4044:61748
/stage:
10482614272:2211388416
37465:462567
/usr/IBM/dwe/appserver_001:
209715200:203930880
5371:199493
Third party monitoring tools may rely on the more familiar 'df' command. For cases where 'df' must be used, refer to the workarounds to use the mmdf command on a periodic basis to synchronize the data for df on that host, and/or add the mmdf command as a response to verify any disk or inode threshold alerts.
KIG00107
The High Availability Toolkit may not execute a script defined in RESOURCE_MOVE_TARGET_SCRIPT or SUCCESSFUL_FAILOVER_SCRIPT may not run on a successful failover.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
The High Availability Toolkit may not execute a script defined in RESOURCE_MOVE_TARGET_SCRIPT or SUCCESSFUL_FAILOVER_SCRIPT may not run on a successful failover.
Workaround:
-----------
N/A
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
However this only reduces the number of cases where failovers do not generate callout actions. Successful failover attempts that take more than an hour to resolve will not initiate a callout. Also some support assisted starts or failovers may not be recognized as failovers. This is a limitation of the implementation.
 
If using the following features:
EMAIL_ADDRESS
EMAIL_ON_MOVE=1
EMAIL_ON_SUCCESSFUL_FAILOVER=1
EMAIL_ON_UNSUCCESSFUL_FAILOVER=1
in the hatools.conf file then the ha tools will send e-mail alerts to the EMAIL_ADDRESS with warnings that a partition set has not started every 10 minutes as well as a final warning and callout that the failover was unsuccessful.  In these rare cases this should ensure administrators are notified and can take appropriate actions.
KIG00109
The High Availability Toolkit may not attempt to stop all resources on a failed start attempt leading to a never ending transitional state.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
Partitions may fail to start due to a timeout or may fail outright due to a db2sysc error. When partitions fail due to a timeout the automation tool is able to attempt to failover those partitions if another host is available. If the partitions fail outright, ha tools does not pass the failure notification to TSA which prevents TSA from taking appropriate actions to failover.  This leaves the partitions in a transitional state that can never be resolved.
Workaround:
-----------
If this scenario is encountered contact IBM Support to help diagnose why Db2 was not able to start.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
Two changes to HA Tools also help with this scenario.
 
1. Added to hatools.conf is the ability to specify two callout scripts using two new variables: SUCCESSFUL_START_SCRIPT and FAILED_START_SCRIPT.  For cases of outright success or outright failure this callout can be used to notify an admin. These callouts will not fire in the event of a failure to start due to a timeout, however if a timeout is hit while trying to start Db2 TSA is able to failover if a standby or the original primary is available again. There are also changes to alerting behavior in 2.0.8.0 that are explained in item 2 below.
 
2. As part of KIG00107 if a partition set takes more than 10 minutes to start a warning is issued if the EMAIL settings are configured as well as in the syslog on the primary and standby hosts for that resource. This occurs every 10 minutes until an hour has passed in which case a final warning is issued.
 
Using these feature can help ensure administrators are notified and can help address these issues.
KIG00110
The High Availability Toolkit may start database partitions on a standby host before those partitions are fully stopped on the primary host.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
This scenario occurs when Db2 is not able to stopped by the ha tools to help facilitate a failover. However hatools does not communicate this failute to TSA which will initiate a failover if the standby is available.
The case where this occurred was on a GPFS node expulsion scenario where the db2sys process could not be killed by a kill -9 command. While GPFS was recovering db2sysc was still detected leading to multiple attempts by the automation to attempt and fail to stop those processes. Once GPFS recovered the db2sysc process was able to be killed but this leads to churn and can lead to KIG00112 which will force down partitions that have already failed over.
Workaround:
-----------
If this scenario is detected the best approach is to reboot or shutdown the host with the hang as soon as possible. This should allow TSA and GPFS to recover and reach a stable state.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
The monitoring algorithms have been updated to avoid using an optimistic algorithm that assumes it is possible to kill db2 processes as part of stop orders from TSAMP. instead, the monitor may require another monitoring period (30 seconds) to ensure that all database partitions are stopped.
KIG00111
The High Availability policies may shutdown partition resources on a healthy node if another node in the same domain is expelled from GPFS.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
This is a tradeoff within the policy design.  This occurs in systems that are 2.5 DN or larger and appears to only be a factor in domains with at least 2 active data nodes. The expected action is that the database partitions on the non-expelled node will be able to restart. This may prolong a failover but there are changes to the behavior in HA Tools 2.0.8.0 that significant improve the ability to resolve this scenario. This symptom can be easy to miss as it usually part of a larger failover type event and will likely elongate the failover time.
Workaround:
-----------
N/A
Fixed:
----------
N/A
KIG00112
The High Availability Toolkit may stop database partitions on the wrong host if the host running the stop is not associated with the partitions in db2nodes.cfg being stopped.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
This issue can occur if KIG00109 happens or if a false positive db2sysc process appears on a standby host.  This symptom can be easy to miss as it usually happens in the larger context of a failover.
Workaround:
-----------
N/A
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
KIG00116
The High Availability Toolkit command hastartdb2 may fail to start partitions correctly if they will start as a failover.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
When running 'hastartdb2' the script will verify that the primary host definitions as per the high availability policy match the partition assignments in db2nodes.cfg. If they don't match then it will attempt to update the policy to match db2nodes.cfg. Prior to 2.0.8.0 this algorithm failed and could leave the domain in an odd state.
The most common scenario is a failover as a result of host rebooting in a two node domain. For example all PDOA environments have their admin and admin standby nodes in a single domain. A failover when one node is no longer in the domain does not allow the roving HA algorithme to update the primary and standby hosts leading to the discrepancy. The goal of this is to prevent a second failover (after stopping and starting the database). 
Workaround:
-----------
Contact IBM Support.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
In 2.0.8.0 this algorithm is improved and should not result in bad domain states, however, if there are many inconsistencies hatools may not be able to rectify all cases to prevent failovers. This can lead to much longer start times and potential Failed Offline states due to timeouts. In those cases it may be neccesary to use 'hareset' or 'hachkconfig' to resolve those inconsistencies.
KIG00123
The High Availability Toolkit may leave .failoverInProgress semaphore files that can prevent other failovers from completing.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
Failover attempts are serialized so no two partition sets will attempt to failover at the same time. This is controlled using .failoverInProgress files which act as a semaphore. In some cases this file may be not be removed which prevents all other failovers from starting. This leads to TSA timeouts and Failed Offline states.
Workaround:
-----------
Once TSA has reached a steady state remove .failoverInProgress files in the instance owners directory. This will allow failovers to proceed. Note it may be necesary to run hareset to clear any Failed Offline states.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
KIG00124
The High Availability Toolkit hals command may show a message like this  "halscore[74]: db2_bcuaix_0_1_2_3_4_5-rg: bad number" if a node is Pending Offline in the domain.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
This is a transitory issue while the node is pending offline and only occurs if hatools confirmed that it was a connect node. Re-run hals until the node is offline.
Workaround:
-----------
Rerun hals until the pending state is resolved.
Fixed:
----------
NA
KIG00125
The High Availability Toolkit hals command will not show a standby node as not available if that node is excluded from its TSA domain.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
TSA provides the ability to exclude a host from hosting managed resources in a domain. In PDOA this only happens as part of support scenarios as there are no hatools or ha scenarios that lead to this state. If a node is excluded all managed GPFS filesystems (such as /db2home) will be unmounted on that host. Since /stage and /dwhome are not managed these filesystems will continue to be available on that host. Any attempt to mount a managed filesystem will result in that filesystem being unmounted as soon as TSA detects it is up. Any attempts to failover to that host will result in a restart on the same host.
The hals utility does not detect this so this may be hard for PDOA customers to diagnose.
Workaround:
-----------
Use the following command as root on any host to determine if there are excluded nodes in the environment. Any nodes listed in the brackets next to 'ExcludedNodes' are excluded.
$ dsh -n ${ALL} 'lssamctrl ExcludedNodes 2> /dev/null' | dshbak -c
HOSTS -------------------------------------------------------------------------
b30i01, b30i02, b30i05, b30i06
-------------------------------------------------------------------------------
Displaying SAM Control information:
SAMControl:
        ExcludedNodes = {}
Contact IBM support to help diagnose why the nodes are excluded and what actions to take next.
Fixed:
----------
V1.1: Fixed in V1.1 FP4 with HA Tools 2.0.8.1 see V1.1 FP4 Readme
KIG00129
The High Availability Toolkit does not have callouts for successful partition set starts or partition set stops.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
The callout mechansim described here https://www.ibm.com/support/pages/node/259139?_ga=2.249700869.2043190155.1604591443-1840982800.1601911348 does not provide for notifications of successful start or failed starts as the focus is only on failovers.
There are several cases where failovers can go undetected.
Workaround:
-----------
NA until HA Tools  2.0.8.0
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
Two new variables were added to hatools.conf:  SUCCESSFUL_START_SCRIPT and FAILED_START_SCRIPT. These callouts use a different mechanism than the failover callout scripts to provide an alternative to those callouts. These scripts are called without arguments whenever a partition set is successfully started (regardless of failover or regular start), or explicitly fails to start on a host. If TSAMP kills the process due to a start timeout then no callouts are made. However, when used in conjunction with the EMAIL alerting features the TSAMP timeouts should result in e-mail warnings every 10 minutes.
KIG00133
The High Availability Toolkit command hareset -restore will not work correctly when run on management nodes after FP6_FP2 (V1.0 FP6/V1.1 FP2) is applied.
(Added: 2020-11-05)
General I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.2
I_V1.1.0.3
When attempting to run 'hareset -restore' there will be an error saying that no backups are found. This is a bug that appears after V1.0FP6/V1.1 FP2 as that update creates management domain backup images that interfere with the ability of hareset to find core domain backups.
Workaround:
-----------
Try running hareset -restore from the admin node. This prevents hareset from seeing the management domain backups.
Fixed:
----------
V1.1: Fixed in V1.1 FP4 with HA Tools 2.0.8.1 see V1.1 FP4 Readme
KIG00135
The High Availability Toolkit command hastartdb2 may return SQL1035N when trying to activate the database when failovers are encountered.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
This issue occurs when there is a failover in the admin nodes and the former primary node for the admin partition set is still running. This only impacts the explicit activation call as part of hastartdb2.
Workaround:
-----------
If detected early enough run activate database explicitly as the instance owner. Otherwise the database will be implicitly activated by connecting applications.
Fixed:
----------
NA
KIG00136
The High Availability Toolkit command hareset -rebuild command does not complete the rebuild leaving the domain in an incomplete state.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
This issue occurs due to a timing issue and seems to be more prevalent in earlier PDOA fixpack levels such as V1.1 GA.
Workaround:
-----------
Try 'hareset -restore' instead of rebuild.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
KIG00137

upgrade to HMC V9R1.941.x from HMC V9R1.930.0 (MH01810) daily SRC E3325009 errors, domain suffix needs populated (Added: 2020-11-25)

General
I_V1.0.0.7
I_V1.1.0.3
I_V1.1.0.4
Some PDOA V1.1.0.3 customers who needed to update their HMC levels have experienced daily SRC E3325009 errors after updating their HMC levels.
In some cases the hostname of their HMCs and DNS settings may not have been setup where the HMC hostname is available through DNS.
This may also impact V1.1 FP4 customers.
On PDOA a way to verify the host settings is to run the following command as root from the management host. This will connect to both hmcs and run the host command against the HMC hostname.
$ appl_ls_hw -r hmc -A M_IP_address | sed 's|"||g' | while read ip;do echo "*** ${ip} ***";ssh -n hscroot@${ip} 'lshmc -n -Fhostname | while read h;do host $h;done';done
*** 172.23.1.245 ***
dsshmc49.torolab.ibm.com has address 9.26.18.135
*** 172.23.1.246 ***
dsshmc50.torolab.ibm.com has address 9.26.18.136
(0) root @ flashdancehostname01: 7.1.0.0: /
Another test from the test note can be run to verify both systems see the same primary HMC assignments for each server.
$ appl_ls_hw -r hmc -A M_IP_address | sed 's|"||g' | while read ip;do echo "*** ${ip} ***";ssh -n hscroot@${ip} 'lssyscfg -r sys -F name | sort| while read m;do printf "$m: ";lsprimhmc -m $m;done';done
*** 172.23.1.245 ***
Data1-8284-22A-SN216B47V: is_primary=1,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49,primary_hmc_ipv6addr=
Data2-8284-22A-SN216B44V: is_primary=1,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49,primary_hmc_ipv6addr=
FDNactive-8286-42A-SN2168BFV: is_primary=1,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49,primary_hmc_ipv6addr=
FDNstby-8286-42A-SN2168C1V: is_primary=1,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49,primary_hmc_ipv6addr=
Stby-Data-8284-22A-SN216B42V: is_primary=1,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49,primary_hmc_ipv6addr=
*** 172.23.1.246 ***
Data1-8284-22A-SN216B47V: is_primary=0,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49.torolab.ibm.com,primary_hmc_ipv6addr=
Data2-8284-22A-SN216B44V: is_primary=0,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49.torolab.ibm.com,primary_hmc_ipv6addr=
FDNactive-8286-42A-SN2168BFV: is_primary=0,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49.torolab.ibm.com,primary_hmc_ipv6addr=
FDNstby-8286-42A-SN2168C1V: is_primary=0,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49.torolab.ibm.com,primary_hmc_ipv6addr=
Stby-Data-8284-22A-SN216B42V: is_primary=0,primary_hmc_mtms=7042-CR8/840A7BD,"primary_hmc_ipaddr=172.23.1.245,172.16.0.1,9.26.18.135",primary_hmc_hostname=dsshmc49.torolab.ibm.com,primary_hmc_ipv6addr=
Workaround:
--------------
Contact IBM support.
Fixed:
--------------
NA
KIG00138
The High Availability Toolkit command hareset -rebuild does not support restoring third tier storage management resources.
(Added: 2020-11-05)
General
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
After 'hareset -rebuild' completes if third tier storage was placed under HA management control the resource definitions will not be recreated. This requires manual intervention from support to restore.
Workaround:
-----------
Use 'hareset -restore' which restores the domain versus rebuilding it. As a practice any changes to the domain should be backed up.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
A keyword was added to hatools.conf called 'TIERSTORAGE" which can be used to specify additional storage tiers that should be managed as high availability resources.
 
For example: if each partition has additional cooling and cold storage as specified by this pattern, where <part> is the 4 digit 0 padded partition number.
 
/db2fscool/<instance>/NODE<part>
/db2fscold/<instance>/NODE<part>
 
Then specify the following in hatools.conf:
 
TIERSTORAGE="db2fscool/${INSTANCE} db2fscold/${INSTANCE}"
 
This will allow hachkconfig -restore and hareset -rebuild to correct/add/rebuild the domains to include those paths per partition.
Third tier filesystems must match the PDOA 1 NSD to 1 FS ratio as well as meet our filesystem and NSD naming conventions.
KIG00139
The High Availability Toolkit command hachkconfig -repair cannot repair corporate network relationships that use names like 'db2_bcuaix_0_1_2_3_4_5-rs_DependsOn_db2_VLAN501_network'
(Added: 2020-11-05)
General
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
The High Availability Toolkit command hachkconfig -repair cannot repair corporate network relationships that use names like 'db2_bcuaix_0_1_2_3_4_5-rs_DependsOn_db2_VLAN501_network' which may have been created as part of the corporate network configuration process in early V1.1 systems. The hatools would expect the name to be 'db2_bcuaix_0_1_2_3_4_5-rs_DependsOn_db2_VLAN501_network-rel'. This ha no impact on the function of the policy but it will cause hachkconfig to fail as it cannot correct it.
Workaround:
-----------
This workaround can be applied with resources online or offline during an outage window.
If the Opstates are Offline then it is not necessary to put the domain in manual mode.
Put the domain in manual mode.
 
(0) root @ b30i04: 7.1.0.0: /tmp/halog
$ hadomain -core manual

All core domains set to Manual mode.
(0) root @ b30i04: 7.1.0.0: /tmp/halog
$ hals
CORE DOMAIN
+============+=========+=========+=============+=================+=================+=============+
| PARTITIONS | CURRENT | STANDBY | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+=========+=========+=============+=================+=================+=============+
| 0-5        | b30i04  | b30i02  | bcudomain01 | Online          | MANUAL MODE     | -           |
| 6-15       | b30i05  | b30i07  | bcudomain02 | Online          | MANUAL MODE     | -           |
| 16-25      | b30i06  | b30i07  | bcudomain02 | Online          | MANUAL MODE     | -           |
+============+=========+=========+=============+=================+=================+=============+


 
Find the relationships that are improperly named.
 
$ dsh -n ${BCUDB2ALL} "lsrel -D@ -s 'Name like \"%network\"' Name | grep network | cut -d@ -f1" 2> /dev/null | dshbak -c
HOSTS -------------------------------------------------------------------------
b30i02, b30i04
-------------------------------------------------------------------------------
db2_bcuaix_0_1_2_3_4_5-rs_DependsOn_db2_VLAN501_network
 
Rename those relationships.
 
$ dsh -n ${BCUDB2ALL} "lsrel -D@ -s 'Name like \"%network\"' Name 2> /dev/null | grep network | cut -d@ -f1 | while read f;do echo \$f;chrel -c \$f-rel \$f;done" | dshbak -c
HOSTS -------------------------------------------------------------------------
b30i04
-------------------------------------------------------------------------------
db2_bcuaix_0_1_2_3_4_5-rs_DependsOn_db2_VLAN501_network
 
Verify that the the relationships are renamed. The following should return blank.
 
$  dsh -n ${BCUDB2ALL} "lsrel -D@ -s 'Name like \"%network\"' Name | grep network | cut -d@ -f1" 2> /dev/null | dshbak -c
(0) root @ b30i04: 7.1.0.0: /tmp/halog
$
Verify that the domains are still in Manual Mode with the OPSTATE column showing Online. If there are Pending Offline states contact support.
$ hals
CORE DOMAIN
+============+=========+=========+=============+=================+=================+=============+
| PARTITIONS | CURRENT | STANDBY | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+=========+=========+=============+=================+=================+=============+
| 0-5        | b30i04  | b30i02  | bcudomain01 | Online          | MANUAL MODE     | -           |
| 6-15       | b30i05  | b30i07  | bcudomain02 | Online          | MANUAL MODE     | -           |
| 16-25      | b30i06  | b30i07  | bcudomain02 | Online          | MANUAL MODE     | -           |
+============+=========+=========+=============+=================+=================+=============+

 
 
Restore the domains to automation mode.
 
$ hadomain -core auto
All core domains set to Auto mode.
(0) root @ b30i04: 7.1.0.0: /tmp/halog
$ hals
CORE DOMAIN
+============+=========+=========+=============+=================+=================+=============+
| PARTITIONS | CURRENT | STANDBY | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+=========+=========+=============+=================+=================+=============+
| 0-5        | b30i04  | b30i02  | bcudomain01 | Online          | Normal          | -           |
| 6-15       | b30i05  | b30i07  | bcudomain02 | Online          | Normal          | -           |
| 16-25      | b30i06  | b30i07  | bcudomain02 | Online          | Normal          | -           |
+============+=========+=========+=============+=================+=================+=============+
Rerun 'hachkconfig' to verify no more errors related to these relationships.
Fixed:
----------
NA (The workaround is a permanent solution).
KIG00148
The High Availability Toolkit command hasetuptemp will fail if DB2OPTIONS include -v. This only occurs on V1.0 environments.
(Added: 2020-11-05)
General I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
This symptom only impacts V1.0 customers as this is required when the temp tablespace is recreated which reside on local ssd (/db2ssd) filessytems. This symptom was not seen in the field so it is likely the '-v' option is not used, or not used by customers who have needed to recreate their system temp.
Workaround:
-----------
Remove the '-v' option and rerun the command.
Fixed:
----------
NA
KIG00150
The High Availability Toolkit commands like hareset and hackconfig may fail or corrupt the domain on systems V1.0 systems at 12.5 DNs or higher and V1.1 systems at 10.5 DNs or higher due to TSA commands truncating long object names.
(Added: 2020-11-05)
General I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
On larger PDOA environments with three digit partitions some of the resource names are truncated when using TSA commands in table format. This leads hatools to get incorrect names for some resources. This can lead to failures that leave domains corrupted.
Workaround:
-----------
Run 'hareset -restore' to restore the domain from a good copy. Contact IBM Support to verify the health of the domain.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
Replaced TSA table based commands with delimited options to prevent truncation.
KIG00157
The High Availability Toolkit hals command does not recognize a standby node is not available if GPFS is not started on the node but the node is online in the domain.
(Added: 2020-11-05)
General I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
The algorithm to determine whether a node is available as a standby does not check to see if GPFS is online and only checks to see if the domain is online and there are no Failed, Stuck or Unknown states. The filesystem monitors do check for GPFS but when GPFS is Offline they return that the filesystem is also Offline.
Workaround:
-----------
If failovers are not working, and hals does not indicate a failure.
Verify the GPFS state on each node in the system using this command.
$ dsh -n ${ALL} '/usr/lpp/mmfs/bin/mmgetstate -a' | dshbak -c
HOSTS -------------------------------------------------------------------------
b30i01, b30i03
-------------------------------------------------------------------------------
 Node number  Node name        GPFS state
------------------------------------------
       1      b30i01           active
       2      b30i03           active
HOSTS -------------------------------------------------------------------------
b30i02, b30i04, b30i05, b30i06, b30i07
-------------------------------------------------------------------------------
 Node number  Node name        GPFS state
------------------------------------------
       1      b30i02           active
       2      b30i04           active
       3      b30i05           active
       4      b30i06           active
       5      b30i07           active
Contact IBM Support to help determine why a host may not be active.
Fixed:
----------
NA
KIG00168
The High Availability Toolkit commands hachkconfig and hareset -rebuild will only check or create the first corporate VLAN set of resources as specified in hatools.conf.
(Added: 2020-11-05)
General I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
This can lead to inconsistencies in the ServiceIP settings when more than one corporate service ip is defined. While generally harmless in some cases this can prevent Db2 from starting.
Workaround:
-----------
Contact IBM to help fix any inconsistency.
Fixed:
----------
V1.0. Contact IBM Support.
V1.1: Fixed in HA Tools 2.0.8.0 which is available by technote or as part of PDOA V1.1 FP4. See IBM PureData System for Operational Analytics High Availability toolkit component 2.0.8.0 update.
HA tools will now check and be able to address multiple corporate service ips.
KIG00222
Management host syslog has "Crypto library (CLiC) error: Wrong signature" and "Keystore doesn't contain ssh public key cookie" sshd errors. (Added: 2020-11-05)
General
I_V1.0.0.3
I_V1.0.0.4
I_V1.0.0.5
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.0
I_V1.1.0.1
I_V1.1.0.2
I_V1.1.0.3
SSH logins on the management nodes will generate many messages like the ones below:
Nov 26 01:56:03 flashdancehostname01 auth|security:err|error sshd[4653944]: Keystore doesn't contain ssh public key cookie
Nov 26 01:56:04 flashdancehostname01 auth|security:err|error sshd[2950080]: Crypto library (CLiC) error: Wrong object type
This is related to the use of EFS on the management host which is used by the platform layer.
PDOA does not allow EFS to be automatically opened through SSH connections.
Workaround:
-----------
N/A
This is a tradeoff in the way PDOA uses EFS on the management host.
Fixed:
----------
N/A
KIG00006
PDOA FP7_FP3/ FP6_FP2 Readmes Appendix H incorrectly using the "-d" option with "db2fm" to disable the Db2 Fault Monitor on startup.
[ Added 2021-03-12]
Fixpack
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.2
I_V1.1.0.3
When Db2 is updated in PDOA environments the Db2 Fault Monitor may be enabled. The readme's provide instructions on how to stop and disable Db2's fault monitor in Appendix H.
The following command listed in Appendix H to do this is as follows.
dsh -n ${BCUDB2ALL} "/usr/IBM/dwe/db2/V10.5.0.10..0/bin/db2fm -i bcuaix -d" | dshbak -c
The '-d' option will bring the instance down. The option that should be used is '-D'.
dsh -n ${BCUDB2ALL} "/usr/IBM/dwe/db2/V10.5.0.10..0/bin/db2fmcu -D" | dshbak -c
Workaround:
-----------
In Db2 10.5 and 11.1 systems the fault monitor daemon does not seem to be enabled as part of updating Db2 and this command may not be necessary.
Use '-D' when attempting to stop the db2fm fault monitor daemon.
dsh -n ${BCUDB2ALL} "/usr/IBM/dwe/db2/V10.5.0.10..0/bin/db2fmcu -D" | dshbak -c
Fixed:
----------
N/A
KIG00026
When the starting point for a fixpack is V1.0.0.5 or V1.1.0.1 if AIX is updated to Version 7.1 TL5 before Stage 4 is completed this may prevent Stage 4 from completing successfully due to SSH connectivity issues.
[ Added 2021-03-12]
Fixpack
I_V1.0.0.6
I_V1.0.0.7
I_V1.1.0.2
I_V1.1.0.3
I_V1.1.0.4
Scenario:
V1.1 FP1->FP3 update.
Instead of updating AIX in Stage 6 on the management host, AIX was updated much earlier in the cycle before Stage 4 had completed.
During Stage 4 the V7000 enclosures are updated and in the update process one canister is updated and then recycled, after that canister boots the configuration role moves to the updated canister. At this point the platform layer becomes unable to monitor the enclosure and eventually times out and reports a failure. In the meantime, any enclosures that were in the process of updating will continue to update the canister firmware.
The exact reason why this fails is not known but because the V7000 adds ECDSA keys that appear after the updated canister becomes the configuration node, the speculation is that the tighter security settings in AIX no longer allow the connection leading to ssh connections issues.
Between V1.0 FP7/V1.1 FP1 and V1.0 FP7/V1.1 FP3 there were issues with SSH during each update which if fixed may alleviate this issue from occurring for some customers.
Workaround:
-----------
If there is  Stage 4 failure and the examination of the platform layer logs and trace files shows connectivity issues to the storage, then do the following.
These commands are run as root on the management host.
1. Check the current keys for the V7000 enclosures as listed in 'known_hosts' file for root. There are two examples below. It is more likely that the keys shown are RSA keys.
$ appl_ls_hw -r storage -A M_IP_address,Machine_type < /dev/null | grep "2076" | sed 's|"||g' | cut -d, -f1 | while read ip;do echo " *** ${ip} ***";ssh-keygen -F ${ip};done
 
# Host 172.24.1.181 found: line 15
172.24.1.181 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDhFYisNOOUPZKXOljT3OH/jkmCiRVS8hthriLJeb4E4P5XMRMCf5HjMr9bTSRXTT+TU7j+e0oIzFz1lpPtMC3KVhLBiwuGuT38PClvufMxEJCn9zdGLcxy9CLmwRwT/UkRPfxioG8z+TPx677BW34JZs+QVAqmeCU9wsDvrm7g5I/Osj4dqSHkEcwhzrO7A1jNZNoxLGsYSrtfhkPQxP+gvf9hiq3lN/MAYKqJD8w2Au1I0iz4xJmkpnokYpmikpiQTk1Mr7YrTEJpQOwBsaPwooYO1mUeN3N8GU4oeI3hqnFoZSQEzIzBjtIIwml3ZDM6Fz+FGO20kRWYM/zIiIGJ
 *** 172.24.1.183 ***
# Host 172.24.1.183 found: line 13
172.24.1.183 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFNIHVGXRfiEWAdGWjoQo3jfrBjTpymJo5bVUnNsUe9Ru2AU1VRvqyWaEpbeDSTfj6lZBjhYbh2d9CoA0ZRD2HbAj6zF9YSZkt0elbEbBKFldFp/AFef0IARC7/xghGCkZRlCjkAB7DtfKObtrT2xfmax2abZy0wzzrOnL7qppdZ9KLCJ3uwbyBxTo7IrtGorejddh6fMKuGiaflStoWez5vzzdtYdPsXmbJ4Fibz5Td6gJyFFjkfRDwWnNmpNbGxRyc9pyTv2fhGZOmoqjZoERYFaxLJs46kcEYBOnl1xqpFs1BQVXAM/dn74DGCX9bCXKhSZ90icsbLVQldJF4HT
 *** 172.24.1.185 ***
# Host 172.24.1.185 found: line 14
172.24.1.185 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDxiX5ztNtWuuvqGljV/8STvl85bKee7UFg/+dgxvU6dSylRLAkhf5GRQt3XTc0HWqPxRE2Tiag0Y30B4ryezzdZ7DW7w2RYhDY0T+S6w17hmZg2uPqAggw48QgOUTAD62+aDKHaL7ngHWydVKFybB//TT873kJ9/kJuR5bwX959MY18Dk6ZB/9CU2J+C5r6E5gVtniPepCGyWATGnwUSpJhPaYhbJp6BH6S0++QEHZ4CbQMlAkbLS2G30ELcNhQ492hWih0FH07Cw+g3Sa/npU/jZFYAfb4/v2Y+kWy/bXLxV6zRcxf+Ovn/FnOUt87e9If5tHXT9LyavCK4DMW75H
$ appl_ls_hw -r storage -A M_IP_address,Machine_type < /dev/null | grep "2076" | sed 's|"||g' | cut -d, -f1 | while read ip;do echo " *** ${ip} ***";ssh-keygen -F ${ip};done
 
 *** 172.23.1.204 ***
# Host 172.23.1.204 found: line 48
172.23.1.204 ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBAAJ6D0nWA5A/z6XntN6JWxnf5EJ38GMdSDemVhlesLoIkLoSVdDXM0qfpsZ/fVhXLOAPmKQ68JxT9lk8oD/t3G3NAFUqEWzsUvpBxkPYeDHkGISe1My/Wnr4QN8L0ZtgsascB8V0QwYnugUCbx5nHVjvSdUTqx/96OVA5aFoPr/sAc6tg==
 *** 172.23.1.206 ***
# Host 172.23.1.206 found: line 45
172.23.1.206 ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBAFyZIHqBO9Ay2u9aRbw3OSFASv86YO6fmyO/Ol53FEwNkWLlZeqqacivo+crZxO8J8m3BRdebeZ37clXEVFb+DNugHSscB1E6NIj90LkWFkn9kdIKKjP/gMsvty4I3palHxldqnHOcjfgqCEK5q9nII8mX3MddM9x6ItYytZVufaPmBLw==
 *** 172.23.1.208 ***
# Host 172.23.1.208 found: line 46
172.23.1.208 ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBAAgUPvhy+zhJBMUA0RPsAS3+XDTlaz2ryQ6n1ZZER52/tiUZldBw6uNidBTWFBSgCs5XPlnpKsucS0lmi9ju0FASwHfjGBjO1XN2eVXphSN9e2jDjJEA4lotB126Hhbb1rTVNEKO2OG00YazvSX/1Ua7Mxml0cZ4l1kfXIpZKh+ByoDjA==
2. Remove the problematic ssh keys for the storage from roots 'known_hosts' file.
$ appl_ls_hw -r storage -A M_IP_address,Machine_type < /dev/null | grep "2076" | sed 's|"||g' | cut -d, -f1 | while read ip;do echo " *** ${ip} ***";ssh-keygen -R ${ip};done
 *** 172.23.1.204 ***
# Host 172.23.1.204 found: line 48
/.ssh/known_hosts updated.
Original contents retained as /.ssh/known_hosts.old
 *** 172.23.1.206 ***
# Host 172.23.1.206 found: line 45
/.ssh/known_hosts updated.
Original contents retained as /.ssh/known_hosts.old
 *** 172.23.1.208 ***
# Host 172.23.1.208 found: line 45
/.ssh/known_hosts updated.
Original contents retained as /.ssh/known_hosts.old
3. Recheck the  keys. All should be empty.
$ appl_ls_hw -r storage -A M_IP_address,Machine_type < /dev/null | grep "2076" | sed 's|"||g' | cut -d, -f1 | while read ip;do echo " *** ${ip} ***";ssh-keygen -F ${ip};done
 *** 172.23.1.204 ***
 *** 172.23.1.206 ***
 *** 172.23.1.208 ***
4. Run the following to re-populate the known hosts file. You will need to reply 'yes' to each of the prompts. This will add the ECDSA keys to known hosts replacing the RSA keys. It will also show the update status.
$ appl_ls_hw -r storage -A M_IP_address,Machine_type < /dev/null | grep "2076" | sed 's|"||g' | cut -d, -f1 | while read ip;do echo " *** ${ip} ***";ssh -n superuser@${ip} 'lsupdate';done
 *** 172.23.1.204 ***
The authenticity of host '172.23.1.204 (172.23.1.204)' can't be established.
ECDSA key fingerprint is SHA256:dUaEPdNk5MnzHdIXSFip+mms61SEBt6ASBI2Y3ldPxU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '172.23.1.204' (ECDSA) to the list of known hosts.
status success
event_sequence_number
progress
estimated_completion_time
suggested_action start
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name
system_next_pause_time
 *** 172.23.1.206 ***
The authenticity of host '172.23.1.206 (172.23.1.206)' can't be established.
ECDSA key fingerprint is SHA256:TsKafP3voTG8kGkqhEIsGa0jhEJO9wwgI15t7jb0KiU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '172.23.1.206' (ECDSA) to the list of known hosts.
status success
event_sequence_number
progress
estimated_completion_time
suggested_action start
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name
system_next_pause_time
 *** 172.23.1.208 ***
The authenticity of host '172.23.1.208 (172.23.1.208)' can't be established.
ECDSA key fingerprint is SHA256:NSdzP0Idc5wer2dFhr9ESYKMjQgBU+ycYluIfwI/414.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '172.23.1.208' (ECDSA) to the list of known hosts.
status success
event_sequence_number
progress
estimated_completion_time
suggested_action start
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name
system_next_pause_time
4. Check the status again of the updates on the V7000s. The following command, run as root on the management host, will collect the ip addresses of all of the V7000s and will run 'lsupdate' to show the current status. This time there should be no prompting.
$ appl_ls_hw -r storage -A M_IP_address,Machine_type < /dev/null | grep "2076" | sed 's|"||g' | cut -d, -f1 | while read ip;do echo " *** ${ip} ***";ssh -n superuser@${ip} 'lsupdate';done
 *** 172.23.1.204 ***
status success
event_sequence_number
progress
estimated_completion_time
suggested_action start
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name
system_next_pause_time
 *** 172.23.1.206 ***
status success
event_sequence_number
progress
estimated_completion_time
suggested_action start
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name
system_next_pause_time
 *** 172.23.1.208 ***
status success
event_sequence_number
progress
estimated_completion_time
suggested_action start
system_new_code_level
system_forced no
system_next_node_status none
system_next_node_time
system_next_node_id
system_next_node_name
system_next_pause_time
5. After all updates are completed rerun the update again to ensure the hard drive firmware update completes. The platfrom layer will detect that the canister firmware is updated and will then verify the hard disk firmware is updated if needed.
Fixed:
----------
See workaround.
KIG00091
When running db2_all or rah commands the following error is returned:
stty: tcgetattr: A specified file does not support the ioctl system call.
[ Added 2021-03-12]
General All Versions
When running db2_all or rah commands the following error is returned:
stty: tcgetattr: A specified file does not support the ioctl system call.
This error may appear when customers add 'stty' based commands to shell profiles without checking first for interactive or non-interactive sessions.
One common technique is to change the backspace character to the BACKSPACE key using 'stty erase ^?' in their .profile or .bashrc files. While this works with interactive sessions it can cause messages like the one above to occur.
These messages can impact non-interactive sessions by adding unexpected output that needs to be parsed to overloading log files.
Workaround:
-----------
One way to avoid this is to use a check to make sure the terminal is interactive.
tty > /dev/null 2>&1 &&  <command>
The tty command will return an non-zero error code if run without a tty attached. So the <command> will only run when it is an interactive session.
This could also be used with a 'if' block to check the return code of the tty command if more complex commands are used for interactive sessions.
Fixed:
----------
See workaround.
KIG00092
Db2 installation, db2_all, rah or db2iupdate on PDOA appears to hang.
[ Added 2021-03-12]
General All Versions
When attempting to install Db2 a fixpack, running db2_all, rah or db2iupdate commands appears to hang.
The default shell for the instance owner (bcuaix by default) is 'ksh'.
In the field one we found some customers prefer to use the 'bash' shell for their instance owner. 
PDOA does not ship with bash but some customers may download it from the AIX Toolbox for Linux site.
The user that manages the instance owner may add bash as the at end of their .profile file.
This has the effect of changing the shell to bash at login. The problem with this is that it causes issues with non-interactive shells appearing to hang. 
In fact, the non-interactive commands are waiting for the 'bash' shell to do something, but since their is no interaction, a hang occurs preventing the .profile file from finishing.
While this was discovered performing a db2 update, this scenario can cause issues for any user attempting to run an interactive shell command in their .profile when running non-interactive shell sessions.
Workaround:
-----------
a) Remove 'bash' from the .profile script. Force the user to run bash after login.
b) Only run 'bash' when it is an interactive shell.
Similar to the solution for KIG00091, the answer is to only run bash when it is an interactive shell.
tty > /dev/null 2>&1 &&  bash
c) Change the default shell from ksh to bash for the instance owner.
A different option would be to change the shell for the instance owner from ksh to bash to avoid this check. This change may be simple but it is not recommended without understanding the implications and doing careful planning.
Fixed:
----------
N/A
KIG00316
The message "Unable to find host in active machine list. Exiting." was encountered when running 'hafailover host'.
[Added 2021-03-25]
General All Versions
In the following PDOA V1.1 0.5 DN scenario the partition set 0-5 is active on 'host04' and has a standby of 'host02'. The 'hals' command shows:

$ hals
MANAGEMENT DOMAIN
+============+=========+=========+=========+=================+=================+=============+
| COMPONENT  | PRIMARY | STANDBY | CURRENT | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+=========+=========+=========+=================+=================+=============+
| WASAPP     | host01  | host03  | host01  | Online          | Normal          | -           |
| DB2APP     | host01  | host03  | host01  | Online          | Normal          | -           |
| DPM        | host01  | host03  | host01  | Online          | Normal          | -           |
| DB2DPM     | host01  | host03  | host01  | Online          | Normal          | -           |
+============+=========+=========+=========+=================+=================+=============+
CORE DOMAIN
+============+=========+=========+=============+=================+=================+=============+
| PARTITIONS | CURRENT | STANDBY | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+=========+=========+=============+=================+=================+=============+
| 0-5        | host04  | host02  | bcudomain01 | Online          | Normal          | -           |
+============+=========+=========+=============+=================+=================+=============+

 
An attempt to failover to host02 is done using hafailover.
$ hafailover host02
Unable to find host02 in active machine list. Exiting
This fails as the hafailover command requires an already active host, as shown in the CURRENT column to failover FROM versus failing over TO.
This can be confusing .
Workaround:
-----------
There is no workaround needed, instead, ensure to only pass hostnames for partition sets that appear in the CURRENT column.
For more information about hafailover see PDOA V1.1 Knowledge Center Failover Documentation
Fixed:
----------
N/A
KIG00384
The update_pfw.sh fails when attempting to update more than one server during the power firmware stage 8 update step in V1.1 FP4.
[ Added 2021-08-30]
[ Updated 2022-06-30 ]
Fixpack I_V1.1.0.4
When attempting to update multiple servers of the same type, core servers with MT 8284, this script constructs an invalid command line to the platform layer.
Instead of looping through the servers it only runs one platform layer update command with invalid parameters. The script should construct one platform layer command for each server.
This only impacts customers who attempt to run this update in a full outage scenario that have 1.5DN or more, or customers that have 5.5 DN or more when using the failover scenario.
Here is an example of the output when failing withthe incorrectly generated platform layer update command.
 ./update_pfw.sh update
20210826_151846 (mgmt:update_pfw.sh): Starting date: Thu Aug 26 15:18:46 BST 2021.
20210826_151846 (mgmt:update_pfw.sh): Running update.
20210826_151846 (mgmt:update_pfw.sh): Running validation.
20210826_151846 (mgmt:update_pfw.sh): Collecting servers types.
20210826_151847 (mgmt:update_pfw.sh): Loading available updates.
20210826_151848 (mgmt:update_pfw.sh): Found server 'server_fsp5' which requires an update.
20210826_151849 (mgmt:update_pfw.sh): Checking server LPARs/Hosts 'server08' to verify that they are quiesced and eligible for updates.
20210826_151853 (mgmt:update_pfw.sh): Host 'server08' is quiesced.
20210826_151853 (mgmt:update_pfw.sh): Found server 'server_fsp6' which requires an update.
20210826_151854 (mgmt:update_pfw.sh): Checking server LPARs/Hosts 'server09' to verify that they are quiesced and eligible for updates.
20210826_151857 (mgmt:update_pfw.sh): Host 'server09' is quiesced.
20210826_151858 (mgmt:update_pfw.sh): No eligible servers are available to update.
20210826_151858 (mgmt:update_pfw.sh): Found the following model:target versions.  22A:860.226:/BCU_share/FP8_FP4/firmware/server_fsp/22A_42A/image/imports:server_fsp5,server_fsp6.
20210826_151858 (mgmt:update_pfw.sh): Running '/opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_fsp update -validate -l server_fsp5 -f server_fsp6/opt/ibm/aixappl/pflayer/bin/icmds/app               l_ctrl_fsp update /BCU_share/FP8_FP4/firmware/server_fsp/22A_42A/image/imports -l  -f '.
Can not get serial number of server
20210826_151858 (mgmt:update_pfw.sh): Error: Failed to validate server 'server_fsp5 server_fsp6' for power firmware update.
20210826_151859 (mgmt:update_pfw.sh): Error: Update was not able to proceed due to a failed validation.
20210826_151859 (mgmt:update_pfw.sh): Starting date: Thu Aug 26 15:18:46 BST 2021   Ending Date: Thu Aug 26 15:18:59 BST 2021.
To work around the issue run the platform layer commands separately.
Workaround:
-----------
Replace server_fsp5 with the correct platform layer name for the server to be updated.

To Validate:

/opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_fsp update -validate -l server_fsp5 -f /opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_fsp/BCU_share/FP8_FP4/firmware/server_fsp/22A_42A/image/imports
To Update:

/opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_fsp update -install -l server_fsp5 -f /opt/ibm/aixappl/pflayer/bin/icmds/appl_ctrl_fsp/BCU_share/FP8_FP4/firmware/server_fsp/22A_42A/image/imports
Fixed:
----------
Follow the instructions in the updated V1.1 FP8_FP4 Readme V229 which reference this technote.
KIG00497
In V1.1 FP4 Stage 7 the command to quiesce nodes (quiesce_node.sh) may  fail to detect active services leading to an outage.
[ Added 2022-03-16 ]
[ Updated 2022-06-30 ]
Fixpack I_V1.1.0.4
V1.1 FP4 Stage 7 allows for updates to be applied to standby servers. The quiesce_nodes.sh command is
designed to run on all core hosts and to identify hosts that are currently standby servers. In one case the
command was run all all core servers but was cancelled using ctrl-c. After a few seconds the command was
issued again. While the first command correctly identified all active and standby hosts, the second command
failed to recognize some hosts as active leading to those hosts being quiesced. This can lead to a prolonged
outage to troubleshoot and bring the system back online.
As long as the command is not killed this scenario shoud not happen in the field, however, it does illustrate
that there is risk when running the stage 7 quiesce steps.
Update: 2022-03-18: This was encounted in a V1.1 FP2 to V1.1 FP4 scenario.
After some more investigation it appears that TSA's lsrg command may be an issue here after a node leaves the domain. The lsrg command is used by quiesce_nodes.sh and
hals. The below output was taken after completing a Stage 7 pass while the system was in manual mode. Notice that hals shows N/A and
the lsrg -m output shows no active host for some of the domains. The last command shows that db2sysc is running on all hosts. Individal lssam
output also shows the correct state of the resources.
$ dsh -n ${BCUDB2ALL} 'lsrg -m | grep "IBM.Application:db2_bcuaix" |sort' | dshbak -c
HOSTS -------------------------------------------------------------------------
host02, host04
-------------------------------------------------------------------------------
IBM.Application:db2_bcuaix_0_1_2_3_4_5-rs         True      db2_bcuaix_0_1_2_3_4_5-rg     Online
HOSTS -------------------------------------------------------------------------
host15, host16, host17, host18, host19
-------------------------------------------------------------------------------
IBM.Application:db2_bcuaix_106_107_108_109_110_111_112_11... True      db2_bcuaix_106_107_108_109_110_111_112_113_114_115-rg     Online  Nominal   host15
IBM.Application:db2_bcuaix_116_117_118_119_120_121_122_12... True      db2_bcuaix_116_117_118_119_120_121_122_123_124_125-rg     Online  Nominal   host16
IBM.Application:db2_bcuaix_86_87_88_89_90_91_92_93_94_95-rs  True      db2_bcuaix_86_87_88_89_90_91_92_93_94_95-rg               Online  Nominal   host19
IBM.Application:db2_bcuaix_96_97_98_99_100_101_102_103_10... True      db2_bcuaix_96_97_98_99_100_101_102_103_104_105-rg         Online  Nominal   host18
HOSTS -------------------------------------------------------------------------
host05, host06, host07, host08, host09
-------------------------------------------------------------------------------
IBM.Application:db2_bcuaix_16_17_18_19_20_21_22_23_24_25-rs True      db2_bcuaix_16_17_18_19_20_21_22_23_24_25-rg     Online
IBM.Application:db2_bcuaix_26_27_28_29_30_31_32_33_34_35-rs True      db2_bcuaix_26_27_28_29_30_31_32_33_34_35-rg     Online
IBM.Application:db2_bcuaix_36_37_38_39_40_41_42_43_44_45-rs True      db2_bcuaix_36_37_38_39_40_41_42_43_44_45-rg     Online
IBM.Application:db2_bcuaix_6_7_8_9_10_11_12_13_14_15-rs     True      db2_bcuaix_6_7_8_9_10_11_12_13_14_15-rg         Online
HOSTS -------------------------------------------------------------------------
host10, host11, host12, host13, host14
-------------------------------------------------------------------------------
IBM.Application:db2_bcuaix_46_47_48_49_50_51_52_53_54_55-rs True      db2_bcuaix_46_47_48_49_50_51_52_53_54_55-rg     Online  Nominal   host12
IBM.Application:db2_bcuaix_56_57_58_59_60_61_62_63_64_65-rs True      db2_bcuaix_56_57_58_59_60_61_62_63_64_65-rg     Online  Nominal   host13
IBM.Application:db2_bcuaix_66_67_68_69_70_71_72_73_74_75-rs True      db2_bcuaix_66_67_68_69_70_71_72_73_74_75-rg     Online  Nominal   host11
IBM.Application:db2_bcuaix_76_77_78_79_80_81_82_83_84_85-rs True      db2_bcuaix_76_77_78_79_80_81_82_83_84_85-rg     Online  Nominal   host10

$ hals
none are available... returning
CORE DOMAIN
+============+=========+=========+=============+=================+=================+=============+
| PARTITIONS | CURRENT | STANDBY | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+=========+=========+=============+=================+=================+=============+
| 0-5        | N/A     | host02  | bcudomain01 | Online          | MANUAL MODE     | -           |
| 6-15       | N/A     | host05  | bcudomain02 | Online          | MANUAL MODE     | -           |
| 16-25      | N/A     | host05  | bcudomain02 | Online          | MANUAL MODE     | -           |
| 26-35      | N/A     | host05  | bcudomain02 | Online          | MANUAL MODE     | -           |
| 36-45      | N/A     | host05  | bcudomain02 | Online          | MANUAL MODE     | -           |
| 46-55      | host12  | host14  | bcudomain03 | Online          | MANUAL MODE     | -           |
| 56-65      | host13  | host14  | bcudomain03 | Online          | MANUAL MODE     | -           |
| 66-75      | host11  | host14  | bcudomain03 | Online          | MANUAL MODE     | -           |
| 76-85      | host10  | host14  | bcudomain03 | Online          | MANUAL MODE     | -           |
| 86-95      | host19  | host17  | bcudomain04 | Online          | MANUAL MODE     | -           |
| 96-105     | host18  | host17  | bcudomain04 | Online          | MANUAL MODE     | -           |
| 106-115    | host15  | host17  | bcudomain04 | Online          | MANUAL MODE     | -           |
| 116-125    | host16  | host17  | bcudomain04 | Online          | MANUAL MODE     | -           |
+============+=========+=========+=============+=================+=================+=============+
$ dsh -n ${ALL} 'ps -ef | grep -v grep | grep -c db2sysc ' | sort
host01: 0
host02: 0
host03: 0
host04: 6
host05: 0
host06: 10
host07: 10
host08: 10
host09: 10
host10: 10
host11: 10
host12: 10
host13: 10
host14: 0
host15: 10
host16: 10
host17: 0
host18: 10
host19: 10
$ dsh -n ${BCUDB2ALL} 'lsrg -m -d | grep IBM.Application:db2_bcuaix | sort | cut -d: -f4 | xargs -n 1 lssam -g | grep db2_bcuaix' | dshbak -c
HOSTS -------------------------------------------------------------------------
host02, host04
-------------------------------------------------------------------------------
Online IBM.ResourceGroup:db2_bcuaix_0_1_2_3_4_5-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_0_1_2_3_4_5-rs
                |- Offline IBM.Application:db2_bcuaix_0_1_2_3_4_5-rs:host02
                '- Online IBM.Application:db2_bcuaix_0_1_2_3_4_5-rs:host04
HOSTS -------------------------------------------------------------------------
host15, host16, host17, host18, host19
-------------------------------------------------------------------------------
Online IBM.ResourceGroup:db2_bcuaix_106_107_108_109_110_111_112_113_114_115-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_106_107_108_109_110_111_112_113_114_115-rs
                |- Online IBM.Application:db2_bcuaix_106_107_108_109_110_111_112_113_114_115-rs:host15
                '- Offline IBM.Application:db2_bcuaix_106_107_108_109_110_111_112_113_114_115-rs:host17
Online IBM.ResourceGroup:db2_bcuaix_116_117_118_119_120_121_122_123_124_125-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_116_117_118_119_120_121_122_123_124_125-rs
                |- Online IBM.Application:db2_bcuaix_116_117_118_119_120_121_122_123_124_125-rs:host16
                '- Offline IBM.Application:db2_bcuaix_116_117_118_119_120_121_122_123_124_125-rs:host17
Online IBM.ResourceGroup:db2_bcuaix_86_87_88_89_90_91_92_93_94_95-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_86_87_88_89_90_91_92_93_94_95-rs
                |- Offline IBM.Application:db2_bcuaix_86_87_88_89_90_91_92_93_94_95-rs:host17
                '- Online IBM.Application:db2_bcuaix_86_87_88_89_90_91_92_93_94_95-rs:host19
Online IBM.ResourceGroup:db2_bcuaix_96_97_98_99_100_101_102_103_104_105-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_96_97_98_99_100_101_102_103_104_105-rs
                |- Offline IBM.Application:db2_bcuaix_96_97_98_99_100_101_102_103_104_105-rs:host17
                '- Online IBM.Application:db2_bcuaix_96_97_98_99_100_101_102_103_104_105-rs:host18
HOSTS -------------------------------------------------------------------------
host10, host11, host12, host13, host14
-------------------------------------------------------------------------------
Online IBM.ResourceGroup:db2_bcuaix_46_47_48_49_50_51_52_53_54_55-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_46_47_48_49_50_51_52_53_54_55-rs
                |- Online IBM.Application:db2_bcuaix_46_47_48_49_50_51_52_53_54_55-rs:host12
                '- Offline IBM.Application:db2_bcuaix_46_47_48_49_50_51_52_53_54_55-rs:host14
Online IBM.ResourceGroup:db2_bcuaix_56_57_58_59_60_61_62_63_64_65-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_56_57_58_59_60_61_62_63_64_65-rs
                |- Online IBM.Application:db2_bcuaix_56_57_58_59_60_61_62_63_64_65-rs:host13
                '- Offline IBM.Application:db2_bcuaix_56_57_58_59_60_61_62_63_64_65-rs:host14
Online IBM.ResourceGroup:db2_bcuaix_66_67_68_69_70_71_72_73_74_75-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_66_67_68_69_70_71_72_73_74_75-rs
                |- Online IBM.Application:db2_bcuaix_66_67_68_69_70_71_72_73_74_75-rs:host11
                '- Offline IBM.Application:db2_bcuaix_66_67_68_69_70_71_72_73_74_75-rs:host14
Online IBM.ResourceGroup:db2_bcuaix_76_77_78_79_80_81_82_83_84_85-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_76_77_78_79_80_81_82_83_84_85-rs
                |- Online IBM.Application:db2_bcuaix_76_77_78_79_80_81_82_83_84_85-rs:host10
                '- Offline IBM.Application:db2_bcuaix_76_77_78_79_80_81_82_83_84_85-rs:host14
HOSTS -------------------------------------------------------------------------
host05, host06, host07, host08, host09
-------------------------------------------------------------------------------
Online IBM.ResourceGroup:db2_bcuaix_16_17_18_19_20_21_22_23_24_25-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_16_17_18_19_20_21_22_23_24_25-rs
                |- Offline IBM.Application:db2_bcuaix_16_17_18_19_20_21_22_23_24_25-rs:host05
                '- Online IBM.Application:db2_bcuaix_16_17_18_19_20_21_22_23_24_25-rs:host06
Online IBM.ResourceGroup:db2_bcuaix_26_27_28_29_30_31_32_33_34_35-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_26_27_28_29_30_31_32_33_34_35-rs
                |- Offline IBM.Application:db2_bcuaix_26_27_28_29_30_31_32_33_34_35-rs:host05
                '- Online IBM.Application:db2_bcuaix_26_27_28_29_30_31_32_33_34_35-rs:host07
Online IBM.ResourceGroup:db2_bcuaix_36_37_38_39_40_41_42_43_44_45-rg Automation=Manual Nominal=Online
Online IBM.ResourceGroup:db2_bcuaix_6_7_8_9_10_11_12_13_14_15-rg Automation=Manual Nominal=Online
        |- Online IBM.Application:db2_bcuaix_6_7_8_9_10_11_12_13_14_15-rs
                |- Offline IBM.Application:db2_bcuaix_6_7_8_9_10_11_12_13_14_15-rs:host05
                '- Online IBM.Application:db2_bcuaix_6_7_8_9_10_11_12_13_14_15-rs:host08
Workaround:
-----------
To workaround this problem there are a couple of ways to approach Stage 7.
1. Take a full outage to apply the updates. This can be time costly as the power firmware updates must be applied serially since V1.1 FP3.
2. Do not issue a 'ctrl-c' command when running the quiesce_node.sh step.
3. Prior to quiescing the node, there is a step that checks for eligible hosts. Verify that this list only includes standby hosts as eligible. This uses the same
helper script to determine if a node is eligible or not.
4. Instead of using '${ALL}' in the quiesce_node.sh call, replace ${ALL} with the specific hosts in a comma-separated format that are currently standby nodes.
This list can determined using the hals command.
 
Fixed:
----------
An update to the active node detection script will add a check for db2sysc processes in addition to the lsrg -m command. This is referenced in the updated V1.1 FP8_FP4 V229 readme along with the following technote.
KIG00599
In V1.1 FP2 and higher, an error occurs when running appl_conf to manage passwords. "The resource state should be on or online."
[Added: 2022-07-21]
General
I_V1.1.0.2
In V1.1 FP2 and higher the PDOA Console GUI and mi* layers are removed and the fixpack process was moved from full automation to a tooling model. However, some comands like appl_stop were still used in V1.1 FP2 steps. This changed some of the status for servers from 'Online' to 'On' but the process did not ensure that they were changed back.
If customer's have issues with password validation, when they attempt to modify the passwords with appl_conf, they may get this failure if the state of th server(s) is still 'On' and not 'Online'.
Symptom:
appl_conf chpw -l server0 -u root -p <pass> -o <pass>
----------------------------------------------------------
SCHEMA::CHPW::LOGICAL_NAME::STATUS(PASS/FAIL)
CHPW::server0::fail::The resource '172.23.1.3' is not in a proper state. The resource state should be on or online.
----------------------------------------------------------
The following check, run as 'root' on the management host, shows that 'server1' is 'On' instead of 'Online'.
$ appl_ls_hw -v -r server_os | egrep 'Logical_name=|Status='
Logical_name=server0
Status=Online
Logical_name=server1
Status=On
Logical_name=server2
Status=Online
Logical_name=server3
Status=Online
Logical_name=server4
Status=Online
Logical_name=server5
Status=Online
Workaround:
-----------
For each server run the following as root on the management host to update the server status to 'Online'. Be sure to replace server1 with the server to be updated.
${PL_ROOT}/bin/icmds/appl_ch_hw -l server1 -c Status=Online
Fixed:
----------
This is address in V1.1 FP3's platform layer and higher. For customers applying V1.1 FP4 to V1.1 FP2 it is advised to use the workaround or you can wait to address the issue after Stage 2 is completed.
KIG00602
When attempting configupload operations on a SAN switch, encountered "configUpload not permitted (scp failed)." [ Added: 2022-07-27 ]
General
I_V1.1.0.2
I_V1.1.0.3
I_V1.1.0.4
In Stage 2 in the V1.1 FP4 documentation, there is a step to backup the SAN configurations for all of the SAN switches in the environment. This requires logging into the SAN and running a utilty that uses scp to copy the configuration to the management node. In some cases the SAN switches may have stored an ssh public key for the host that is no longer valid. This may have been stored as part of the deployment or if the host key on the managmeent node has changed since deployment. If this key is incorrect it will cause all ssh based operations to fail.
Here is an example of the failure on a V1.1 FP4 environment.
 
$ ssh admin@172.23.1.161
san01:FID128:admin> configupload -all -p scp "172.23.1.1","root","/BCU_share/san_switch_backup/172.23.1.161_20220727155649_prefp4.cfg"
lost connection
configUpload not permitted (scp failed).
Terminated
san01:FID128:admin>
Workaround:
-----------
This solution has only been tested in V1.1 FP4, however it may be applicable on previous versions if supported by the SAN switch's firmware level. The solution is to remove the old public host key from the known_hosts file in the SAN using the sshutil delknownhost function.
The following shows how to clean the known hosts file on the san. Login as the admin user from the root account on the management host. This account from this host has ssh key based access to the SANs. A successfull configuration file backup session can also be seen.

 
$ ssh admin@172.23.1.161
san01:FID128:admin> sshutil delknownhost -all
All IP Address/Hostname deleted successfully.

$ ssh admin@172.23.1.161
san01:FID128:admin> configupload -all -p scp "172.23.1.1","root","/BCU_share/san_switch_backup/172.23.1.161_20220727161145_prefp4.cfg"
root@172.23.1.1's password:

configUpload complete: All selected config parameters are uploaded
san01:FID128:admin> exit
logout
Connection to 172.23.1.161 closed.


            
The following shows an attempt to remove the management host using delknownhost option without the '-all' option. This did not work. It is not known why this method does not work.

san01:FID128:admin> sshutil delknownhost 172.23.1.1
IP address/hostname not found.
san01:FID128:admin> sshutil delknownhost
172.23.1.1 ssh-rsa
IP Address/Hostname to be deleted:172.23.1.1
IP address/hostname not found.
san01:FID128:admin> exit
logout
Connection to 172.23.1.161 closed.


 
Fixed:
----------
N/A
KIG00604
hachkconfig hangs. [ Added 8/1/2022 ]
Fixpack
I_V1.1.0.4
In Stage 8 of V1.1 FP4 there is a step to update hatools that requires running 'hachkconfig' after unpacking and applying the updating hatools on all of the hosts. This command can hang when there are service ips defined on environments with 1.5 DNs or higher and those service IPs are not defined for all database partitions.
Typically the following output would be seen when the hang occurs:

 
Processing AC relationships: HA Group 1
Processing AC relationships: HA Group 2

Processing VLAN HA: HA Group 1
** Problem with equivalency: db2_CORP_network
Verified resource: db2ip_172_25_0_40-rs
** Problem with relationship: db2_bcuaix_0_1_2_3_4_5-rs_DependsOn_db2_CORP_network-rel

To identify if the system is at risk, examine the file /usr/IBM/analytics/ha_tools/hatools.conf. This file should exist on all hosts and by sychronized whenever it is modified. This file is rarely modified and the best practice is to only update it on the management host and then to distribute the updated file to the rest of the hosts.
In this file look for AR_VLAN entries.
This shows a 1.5 DN scenario, with VLAN001 having a service ip on the admin node (0-5) and the first data node (6-15). Note that partitions 0-5 are in the TSA domain bcudomain01 and '6-15' are in TSA domain bcudomain02.

set -A AR_VLAN001
AR_VLAN001[0]="0,1,2,3,4,5:CORP:en12:172.25.0.40:255.255.255.0"
AR_VLAN001[0]="6,7,8,9,10,11,12,13,14,15:CORP:en12:172.25.0.41:255.255.255.0"
The scenario above works because each domain has at least one service ip defined on the VLAN.
In the same environment, if one of the entries is removed as shown below. This would lead to a hang.

set -A AR_VLAN001
AR_VLAN001[0]="0,1,2,3,4,5:CORP:en12:172.25.0.40:255.255.255.0"
This would lead to a hang.
The issue is related to the processVLAN function in hafunctions. This function has while loop that checks for all VLANs VLAN00${x} where $x should be incremented. However, if a domain does not have a VLAN entry for at least one partition set, this results in an endless loop as x is not incremented.
Workaround:
-----------
First clean up the failed hachkconfig process. If you ran 'hachkconfig -repair' please stop and contact IBM support as it may have made modifications to the domains that need to be verified before proceeding.
1. Kill the current hachkconfig process. Note that domains will have resource group locks that will need to be removed.
2. Remove the resource group locks left over by the killed hachkconfig process. This command purposes uses the '-f 1' fanout to run the command on one node at time.

dsh -f 1 -n ${BCUDB2ALL} 'lsrgreq -L | grep "lock" | while read rg rest;do rgreq -o unlock ${rg};done'
3. Verify the build level. This workaround is only applicable to this version of hatools.

$ cat /usr/IBM/analytics/ha_tools/.buildinfo
hatools_2.0.8.1_20210109.081433
3. Find the file 'hafunctions' and make a backup.

cp /usr/IBM/analytics/ha_tools/hafunctions /usr/IBM/analytics/ha_tools/hafunctions.hatools_2.0.8.1_20210109.081433
4. Edit the file and find line 2029.

# if there are no hosts, then there there are no VLAN SIPs. Return
if [[ $hlc -eq 0 ]]; then
     continue
fi
5. Modify the if block to increment ${x} before continue the while loop as shown below. Save the file.

# if there are no hosts, then there there are no VLAN SIPs. Return
if [[ $hlc -eq 0 ]]; then
   ((x=x+1))
   continue
fi
6. Diff the new/old files to verify that was the only line changed.

$ diff hafunctions hafunctions.hatools_2.0.8.1_20210109.081433
2031d2030
<                                       ((x=x+1))
7. Rerun the hachkconfig command to verify it doesn't hang.
Fixed:
----------
Fix is targeted for V1.1 FP5
KIG00706
DPM (aka OPM) Fails to start on the management host after Stage 2 V1.1 FP5 [ Added 11/14/2022 ]
Fixpack I_V1.1.0.5
In V1.1 FP5 Stage 2 the management host is migrated from AIX 7.1 to AIX 7.2. After migration if DPM is started with 'hastartdpm' it will not succeed to start on the management host, nor will it fail over automatically.
The following error will be seen. This is a common error with DPM in the past due to a tight threshhold for wiating for DPM to start. In most cases prior to V1.1 FP5 DPM would eventually start.
$ hastartdpm
Starting DPM and DB2 instance..................................................................................Failed to start all resources. Check state with hals -mgmt
When checking lssam: (Note the 'Sacrificed' and 'Failed Offline' status).
$ lssam
Online IBM.ResourceGroup:db2_db2opm_0-rg Nominal=Online
        |- Online IBM.Application:db2_db2opm_0-rs
                |- Online IBM.Application:db2_db2opm_0-rs:host01
                '- Offline IBM.Application:db2_db2opm_0-rs:host03
        |- Online IBM.ServiceIP:db2ip_129_40_61_24-rs
                |- Online IBM.ServiceIP:db2ip_129_40_61_24-rs:host01
                '- Offline IBM.ServiceIP:db2ip_129_40_61_24-rs:host03
        '- Online IBM.ServiceIP:db2ip_172_24_1_41-rs
                |- Online IBM.ServiceIP:db2ip_172_24_1_41-rs:host01
                '- Offline IBM.ServiceIP:db2ip_172_24_1_41-rs:host03
Offline IBM.ResourceGroup:db2_perf-rg Binding=Sacrificed Nominal=Online
        |- Offline IBM.Application:db2_perf_monitor Control=StartInhibited
                |- Failed offline IBM.Application:db2_perf_monitor:host01
                '- Offline IBM.Application:db2_perf_monitor:host03
        '- Offline IBM.Application:db2_perf_web_app
                |- Offline IBM.Application:db2_perf_web_app:host01
                '- Offline IBM.Application:db2_perf_web_app:host03
Online IBM.ResourceGroup:db2mnt_opmfs-rg Nominal=Online
        '- Online IBM.Application:db2mnt_opmfs-rs
                |- Online IBM.Application:db2mnt_opmfs-rs:host01
                '- Online IBM.Application:db2mnt_opmfs-rs:host03
Online IBM.Equivalency:db2_FCM_network
        |- Online IBM.NetworkInterface:en11:host01
        '- Online IBM.NetworkInterface:en11:host03
Online IBM.Equivalency:db2_VLAN501_network
        |- Online IBM.NetworkInterface:en12:host01
        '- Online IBM.NetworkInterface:en12:host03
Online IBM.Equivalency:db2_db2opm_0-rg_group-equ
        |- Online IBM.PeerNode:host01:host01
        '- Online IBM.PeerNode:host03:host03
In some cases you may see this error from lssam and hals.
lssam: ERROR 2610-444 Cannot obtain values for some dynamic attributes. 0 attributes are not being monitored. 1 attributes have data pending.
The above error is most likely due to the '/tmp' file system becoming full due to file related to the java crash on the management node. Which created 'Snap*' and 'jitdump*' files and may also create 'core*' files in /tmp.
$ ls -lrt /tmp/Snap* /tmp/jitdump* /tmp/core*
ls: 0653-341 The file /tmp/core* does not exist.
-rw-r--r--    1 db2opm   db2opmgp     486376 Nov 14 11:45 /tmp/Snap.20221114.114508.3802200.0006.trc
-rw-r--r--    1 db2opm   db2opmgp  136558670 Nov 14 11:45 /tmp/jitdump.20221114.114508.3802200.0009.dmp
-rw-r--r--    1 db2opm   db2opmgp     461800 Nov 14 11:54 /tmp/Snap.20221114.115420.3474212.0004.trc
-rw-r--r--    1 db2opm   db2opmgp   10850734 Nov 14 11:54 /tmp/jitdump.20221114.115420.3474212.0005.dmp
-rw-r--r--    1 db2opm   db2opmgp     461800 Nov 14 12:03 /tmp/Snap.20221114.120313.4457430.0004.trc
-rw-r--r--    1 db2opm   db2opmgp  147200200 Nov 14 12:03 /tmp/jitdump.20221114.120313.4457430.0005.dmp
Workaround:
-----------
None.
Fixed:
----------
None.
DPM was deprecated in V1.1 FP1 and instructions and guidance to remove DPM was provided in V1.1 FP4 readme file.. The associated IBM product, OPM, is not supported. While DPM can run on the management standby, that is only possible until Stage 6 of the fixpack.
After Stage 6 is applied do not attempt to start the management domain or DPM. The instructions from DPM removal are provided in the V1.1 FP4 and V1.1 FP5 readme files as part of Stage 9.
The readme file for V1.1 FP5 will be updated in versions after version 101 to address this issue.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSH2TE","label":"PureData System for Operational Analytics A1801"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Product Synonym

PureData System for Operational Analytics;PDOA

Document Information

Modified date:
14 November 2022

UID

ibm10872628