Fix Readme
Abstract
xxx
Content
Readme file for: CSM-1.7.1.7-power-AIX
Product/Component Release: 1.7.1.7
Update Name: CSM-1.7.1.7-power-AIX
Fix ID: CSM-1.7.1.7-power-AIX
Publication Date: 29 June 2010
Last modified date: 29 June 2010
Installation information
Download location
Below is a list of components, platforms, and file names that apply to this Readme file.
Product/Component Name: | Platform: | Fix: |
---|---|---|
(CSM) IBM Cluster Systems Management | AIX 5.3 AIX 6.1 | CSM-1.7.1.7-power-AIX |
Prerequisites and co-requisites
None
Known issues
- Known Issues
- When using Live Partition Mobility (LPM) function to move CSM Managed Node from one Managed System to another, some of the node attributes (Lpar_ID, HWModel, HWType, etc...) will get out of sync as a result.
- When the rpower on/off command is issued, rpower query output can not indicate the true rpower status for the nodes managed by IVM 2.1.1 or lower. This is caused by an event subscription module, which will be fixed in IVM2.1.2.
- Stop cimserver by command: /usr/ios/sbin/climgr cimserver stop
- Delete files in /opt/freeware/cimom/pegasus/etc/repository/root#ibmsd/instances
- Start cimserver by command: /usr/ios/sbin/climgr cimserver start
- The rpower query output for the nodes managed by IVM 2.0.0.0 or higher will stuck after IVM is rebooted.
- If the GFW340 firmware version which is lower than 340_033 installed on POWER6 575 machine, the BPA connection will have some problem, only one side of BPA is shown connected.
- If the BladeCenter Advanced Management Module firmware version is BPET46C, BPET46G or BPET46H, the rpower reboot command against the blades managed by the AMM will fail with error "Communications failed".
- rfwflash command for BladeCenter JS nodes with AIX OS will exit with return code 1, and there will be a dsh timeout error in /var/log/csm/rfwfalsh_detail.log.
- When running DCEM with Korean language, if you click on Reports tab ---> select a report ----> click View button, ----> click on Command tab, The Strings "Device names", "Device groups" are not displayed and fields are truncated due to small window size.
Workaround:
Enlarge the window. - When running DCEM with a language such as Korean which has no alphabetic characters, if you click on Browse button of Name, The Column heading "Description" contains a shortcut character such as "korean_translation(D)". Some of the shortcut strings are not working. There are two "Name" strings on file. The string for column heading is without shortcut function now.
Workaround:
There isn't workaround for this problem. - When running a dsh command in which the last character is a semicolon (';') and semicolon is not a delimiter of shell command, dsh will ignore this semicolon, and may cause the command to fail. For example, the following command will fail :
"find /tmp -name RATSTempFile -exec ls -l {} \;"
Workaround:
Add an additional semicolon at the end of shell command, for example:
"find /tmp -name RATSTempFile -exec ls -l {} \;;" - The CRHS configuration does not show up on an HMC upgraded to V6.1 and later.
Upgrading an HMC in the ClusterPeerDomain requires that the addpeer command be run again for that HMC. The ClusterPeerDomain is not preserved after an upgrade.
Caution:
There is a known problem when upgrading the HMC from Version 5 and early Version 6 to Version 6.1 or later. In this scenario the hardware server resource manager on the HMC fails to start after the upgrade. This results in the HMC not displaying the hardware configuration.
To avoid this problem, ensure that the HMC resources in the ClusterPeerDomain that were created with addpeer have the Manager_Configured attribute set to 1 prior to upgrading.
On the CSM management server enter lsrhws -m . This will list all of the HMCs that are defined in the ClusterPeerDomain. The output for one HMC would be similar to:
lsrhws -m
--------------------------------
Manager_Type = "HMC"
Manager_IP_A = "20.0.0.5"
Manager_IP_B = ""
Manager_Name = "40.0.0.5"
Manager_MTMS = "7315CR2*10407DA"
Manager_Configured = If the HMC listed above was added to the ClusterPeerDomain
using addpeer and the "Manager_Configured" attribute
equals "0" then it needs to be changed to "1".
If all of the HMCs defined in the ClusterPeerDomain were added with addpeer, then all HMCs may be updated at once using the following command:
chrhws -s 'Manager_Type=="HMC"' -a Manager_Configured=1
If only one HMC resource needs to be updated, then use the following command:
chrhws -s 'Manager_Name=="10.0.0.71"' -a Manager_Configured=1
To preserve the Cluster Ready Hardware Server configuration on an HMC, be sure to save "Save upgrade data" prior to upgrading the HMC. This data should automatically be restored after the HMC is upgraded and doesn't require a reboot of the HMC. - When upgrading SLES nodes using the "you" InstallMethod , sometimes the SLES online_update tool does not upgrade all the RPMs to the latest levels, and the SPident tool reports that some of the packages are not at the correct service pack level. This is a SUSE problem, and the workaround is to use the rpm -U command to manually update the missing packages reported by SPident.
- After migrating to CSM 1.5.1.2 or later versions (from a version prior to CSM 1.5.1.2) , both an "A" and a "B" side appear in the Cluster Peer Domain for single-sided FSP servers when using commands such as frame -l or lsrhws -e . For Power5 clusters with the High Performance Switch, having both the "A" and "B" side in the Cluster Peer Domain for single-sided FSP servers could affect the function and performance of the High Performance Switch Network Manager (HPSNM).
Use the following one-time workaround after upgrading CSM to permanently remove the duplicates from the Cluster Peer Domain and refresh the High Performance Switch Network Manager (if one is present). Run all of these commands on the Management Server.
- chswnm -d (Stops the High Performance Switch Network Manager.)
- rmrhws -s 'Element_Type=="FSP"' (Removes all FSPs from the Cluster Peer Domain.)
- hwsda -s SP (Adds the FSPs back to the Cluster Peer Domain correctly.)
- wait for 12 minutes (Allows hardware server time to re-establish connections to all FSPs.)
- chswnm -a (Starts the High Performance Switch Network Manager.)
- Running installnode on an IVM-managed LPAR, the LPAR remains in and continues to boot to the Open-firmware prompt. To change the LPAR boot-mode back to "Normal", after the install is complete and the node reboots to Open-firmware, run rpower off , wait until the LPAR powers off, and then issue the rpower on command.
- In the following scenario, CSM resources remain offline after the hams move.
node1 is the Active MS, it is powered off
node2 becomes the Active MS, it is powered off
node2 is powered on and becomes the Active MS again
node1 is powered on and becomes the Inactive MS.
If you run hams -mv on node2, node1 becomes the Active MS but the Inactive* resources remain offline. The workaround is to stop hams (hams -S) and start hams (hams -s) at this point and all resources come online. - lshwstat returns inaccurate readings for CPU temperature for System x 3455 servers. When running the lshwstat command with the "temp" or "cputemp" options, an inaccurate reading may be returned for CPU temperature. For example, the command may return a temperature for CPU #1 of 3 degrees C. Also, in a single-CPU x3455 server, a temperature of -128 degrees C may be returned for the non-existent CPU.
- The CSM and AIX support for configuring secondary adapters may produce incorrect results when configuring dual port HPS switch adapters on Power5 systems.
The problem is due to the fact that the Power5 firmware causes the same location code to be assigned to both interfaces.
For example:
>lscfg | grep sn
* sni1 U787D.001.992081A-P2-C65 Switch Network Interface Adapter
* sni0 U787D.001.992081A-P2-C65 Switch Network Interface Adapter
These location codes are actually not complete. The codes should include an extension to indicate the different ports. For example,
"U787D.001.992081A-P2-C65-T1".
Neither the CSM nor the AIX support for secondary adapter configuration was designed to deal with this situation. If duplicate location codes are passed in, the result would be that IP addresses could get assigned incorrectly.
Workaround:
The CSM and AIX support will work by providing either the location code or the interface name (sn0, sn1 etc.) for the adapter interface. (If both are provided then the location code will be used by default since it is considered to be more reliable.)
A simple workaround would be to remove the location codes from the stanza files that are used to pass adapter information to the CSM/AIX commands. In this case the "interface_name" value must be provided (ex. "interface_name=sn0" ) and will be used to configure the adapter interfaces.
Workaround:
CSM won't adjust to dynamic changes, so user need to use the chnode, rmnode, and definenode commands as appropriate to manually update the CSM definitions to match what they have done with partition mobility. Otherwise, it could result in actions being performed on the wrong LPARs.
Workaround:
Restart IBM.HWCTRLRM to register CSM event again after the event subscription is cleared up on IVM. Please follow the steps below:
On IVM:On the CSM Management Server:
Restart IBM.HWCTRLRMWorkaround:
Run rpower -n < node_name > refresh or restart IBM.HWCTRLRM to refresh the rpower state.Workaround:
Update the firmware to 340_033 or higher.Workaround:
Upgrade the AMM firmware to BPET46J or higher.
1 "Linux Update Packages" refer to Linux-based BIOS update tools used by the rfwflash command. These packages are available for x86-based eServer and xSeries servers.
Known limitations
- Known Limitations
- When a IBM System P CEC has a large numbers of LPARs, for example 60 LPARs, it will take more than 4 minutes for the HMC to return the power status for each LPAR and CEC. CSM will consider it as a socket timeout and interrupt the communication to the HMC. This will result in a subscription failure for the power status change. The user will see an error message for the rpower query command ""2651-692 Power status event registration failed, retrying".
- In the CSM Cluster-Ready Hardware Server environment, beginning with HMC V7.3.5.0, Open Source software "openslp" is used to setup SLP service on the HMC. It does not support the Broadcast package for SLP discovery, instead, openslp uses the Multicase package.
As a result, CSM has changed the hardware discovery mechanism from Broadcast to Multicast, which requires Multicast support to be enabled on Ethernet switches. - SELinux is disabled in the Red Hat EL 5.3 default kickstart templates. This means that SELinux on Red Hat EL 5.3 CSM managed nodes installed with default kickstart template file is disabled. This is due to a Red Hat SELinux bug that prevents the CSM and RSCT packages from being installed after the operating system installation.
- When launching dsh command through DCEM, the dsh command will always fail with the error "Host is not responding. No command will be issued to this host".
- If BPA slot number is changed due to any reason such as firmware defect, the connection between the HMC and the frame will have "Duplicate IP" error. The error looks like:
resource_type=frame,type_model_serial_num=9A00-100*XXXXXXX,
side=unavailable,ipaddr=10.0.0.1,alt_ipaddr=unavailable,
state=Connecting,connection_error_code=Duplicate IP 0008-0005-0000801E - CSM does not support Java5 communication to HMC version7 with SNIA/SSL.
- The support of Cluster Ready Hardware Server (CRHS) redundancy working with POWER6 575/595 frame and servers requires only one active network connection to be used in the cluster service network that is connected between CSM MS, HMCs, POWER6 575/595 Frame BPA, and POWER6 575/595 server FSP.
This change is based on the design modification of the POWER6 Frame Bulk Power Controller (BPC) internal network that is now providing the redundancy support to the Frame BPA and Server FSP. The configuration of having two active network connections from the POWER6 575/595 frame and servers to the HMC and CSM MS may cause SLP network issues working with CRHS. - On an AIX management server, the copycsmpkgs for SLES 10 SP2 nodes will fail with the errors that some RPMs could not be found. This problem is caused by some SLES 10 SP2 RPMs parsing issues with AIX rpm command.
- rfwflash command can not be used to update the power code for POWER6 575 and POWER6 595.
- In the CRHS environments, frame -l does not properly show the frame IDs for the POWER6 575 and POWER6 595 frames. frame -l always returns 0 for the frame IDs though frame -i can set the frame IDs correctly. We have also found that the Frame_BPA_MTMS may also be missing in some specific scenarios.
- Can not update system firmware for multiple POWER6 575 CECs in one rfwflash invocation if the HMC version is V7R3.3.0 (lower than Service Pack 2) or system firmware within the same release for multiple System p CECs if the HMC version is V7R3.5.0. The rfwflash command may encounter unknown errors for some CECs or return an error like "HSCF0168E Side A Bulk Power Controller on 9A00-100*992003X is not ready to perform Licensed Internal Code update", and the firmware update will not succeed for all CECs.
- After installing the openssl-0.9.8 RPM file on an AIX management server, FSP node hardware control does not work and hdwr_svr daemon could not be started.
Workaround:
Install the openssl RPM file openssl-0.9.7l-2.aix5.1.ppc.rpm and restart hardware control daemon. - The SFP events monitoring function for POWER5 and POWER6 servers does not work work if the HMC version is lower than V7R3.3.0 Service Pack 2.
- The CSM management server cannot connect to the Baseboard Management Controller (BMC) of a System x3455 server while the x3455 is performing a PXE boot broadcast during power up. This is due to a limitation in the x3455 that prevents access to the LAN interface of the BMC while the PXE boot broadcast is running.
This blackout period can last anywhere from 30 seconds to one minute or more. During this time, the BMC will not respond to ping requests, and any power control or remote console requests will return a "Baseboard Management Controller is not responding" message. The behavior of a remote console session that was previously open when the PXE boot occurs depends on the setting of the csmconfig attribute "BMCConsoleKeepAlive".
If the attribute is set to 0 (the default setting), there will be no indication that the session is lost, other than the console being non-responsive. The console session must be closed and then restarted when the BMC is available.
If the attribute is set to 1, a message will be written to the console approximately 50 seconds after the IPMI session is lost, and the session will be closed. The session must be restarted when the BMC is available.
- The rpower -b option is not working for System x 3455 nodes. The "-b" option to allow selection of the boot device to be used on the next power on or reboot does not work on System x 3455 servers. Any boot device specified will be ignored and the server will boot according to the boot order set in BIOS.
- Selected Linux Update Packages1 that initiate DOS-based updates do not run under Red Hat EL 4 or SUSE LINUX Enterprise Server 9 Linux. These packages can usually be identified by their size, which is typically 2 MB or higher, compared to the standard lflash based package, which is usually under 1MB.
- Selected Linux Update Packages1 that initiate DOS-based updates cannot be run in a pre-Operating System environment. These packages modify the Master Boot Record of the target server to run DOS. In the pre-Operating System environment, the MBR is not available. These packages can usually be identified by their size, which is typically 2 MB or higher, compared to the standard lflash based package, which is usually under 1MB.
- When running commands that run rconsole such as getadapters , netboot , and installnode for an IVM-managed LPAR, other open consoles to that LPAR are forced closed. This is due to a limitation in the IVM firmware.
- reventlog -a or reventlog -e
will hang when retrieving large event logs of 65K or greater. The event log size required to reproduce this problem is a log with approximately 800 entries; although, this number is dependent on the length of each entry since the real limiting factor is retrieving 65K or greater worth of data.
As a workaround to running reventlog -a , run reventlog -e <#> , where <#> is some number (for example 500) that will not exceed the 65K limit. Since the log is returned in LIFO order, the last (most current) 500 entries will be returned.
Workaround:
Increase the socket timeout value.
To increase the socket timeout value to 10 mins:
Step 1:stopsrc -s IBM.HWCTRLRM
Step 2:startsrc -s IBM.HWCTRLRM -e "HC_SOCKET_TIMEOUT=600000"
Workaround:
For some Ethernet switches, Multicast support is enabled by default, such as Cisco Catalyst 2960 Series Switches. If it is not enabled by default, refer to your Switch Guides to enable Multicast.
Workaround:
If required, enable SELinux manually after the CSM full installation completes.
Workaround:
Before running the command, clear the option "Before running commands, verify that targets are responding" in the Options tab.
Workaround:
Rebooting the HMC can clean up the "Duplicate IP" error.
Workaround:
Use Java 1.4 (or lower) version or set environment variable HC_JAVA_PATH=/usr/java14.
Workaround: N/A
Workaround: N/A
Workaround:
Work directly with the HMC GUI to update the power code for POWER6 575 and POWER6 595.
Workaround:
After the Frame_ID is set using frame -i, use chrhws to change the Frame_ID to the correct value. If the Frame_BPA_MTMS is also missing, you need to use chrhws to link the Element_IP_A to the Frame_BPA_MTMS.
Workaround:
Work directly with the HMC GUI to update multiple CECs in the same frame. You can run rfwflash -n with only one CEC for each frame, and place multiple rfwflash -n invocations in a script if you want to execute rfwflash from the CSM Management server.
Or upgrade the HMC version V7R3.3.0 to Service Pack 2 or higher, or HMC version V7R3.5.0 to Service Pack 1 or higher.
Additional information
- Description
CSM 1.7.1.7 is a service update level package for Release 1.7.1 Systems Management Software for AIX. This package contains a full image for CSM software to support Maintenance Level (ML) 5300-07/ ML 6100-00 or later, and can be obtained from Fix Central via the following PTFs:
This package requires the following RSCT fixes; which can be obtained by these PTFs:
These PTFs can be obtained by going to the IBM Support Fix Central site and searching on the PTF referenced above or by selecting the individual PTF listed above to go directly to the Package Download site.
Note : The "conserver" update is not part of the PTFs listed above. See the Package contents below.
For complete CSM installation information, see the CSM for AIX and Linux documentation.
This release contains a full image of the CSM product. To download this image, you must accept the IBM License Agreement (ILA). A full CSM license must already have been purchased, or be purchased, for IBM software offering 5765-E88.
What's New?
- CSM 1.7.1.7 provides support for IBM Power 770(Model 9117-MMB) and Power 780(Model 9179-MHB).
- CSM 1.7.1.7 provides support for AIX61 TL5 and AIX53 TL12.
- CSM 1.7.1.7 provides support for GFW 711.
- Package contents
csm.aixREADME
CSM for AIX images:
/installp/ppc/U812753.csm.hpsnm.bff
/installp/ppc/U834202.csm.client.bff
/installp/ppc/U835515.csm.core.bff
/installp/ppc/U835513.csm.dsh.bff
/installp/ppc/U835516.csm.deploy.bff
/installp/ppc/U835517.csm.server.bff
/installp/ppc/U835514.csm.gui.dcem.bff
/installp/ppc/csm.msg.*
sam_2.3.0.3_aix5.3/sam.core
sam_2.3.5.3_aix6.1/sam.core
RPMs:
RPMS/ppc/conserver-8.1.aix5.2.ppc.rpm
- Changelog
Problems fixed in CSM 1.7.1.7 [April 29, 2010]
- lppchk error for csm.deploy inside wpar.
- rfwflash hangs for JS blade.
- rpower status stuck with HMC V7R7.
- This update addresses the following APARs: IZ72980 IZ73089
Problems fixed in CSM 1.7.1.6 [March 15, 2010]
- CRHS problem starting hdwr_svr with openssl installp package.
- This update addresses the following APARs: IZ66739 IZ69943
Problems fixed in CSM 1.7.1.5 [January 21, 2010]
- rpower query CEC status fail to work because of CIM event error.
- dsh returns errors when running some commands to Qlogic IB switches.
- cfmupdatenode -F option does not always work.
- This update addresses the following APARs: IZ67540 IZ58284
Problems fixed in CSM 1.7.1.4 [October 26, 2009]
- IBM.CSMAgentRM reports 2610-602 A session could not be established.
- IBM.HWCTRLRM socket connections remain in a CLOSE_WAIT state.
- This update addresses the following APARs: IZ55597 IZ57962 IZ58359
Problems fixed in CSM 1.7.1.3 [August 13, 2009]
- IBM.CSMAgentRM has disabled polling when running on a WPAR.
- Installing RedHat 5.3 from a POWER4 Install Server failing with an LED E143.
- This update addresses the following APARs: IZ54837 IZ54838 IZ54839
Problems fixed in CSM 1.7.1.2 [June 25, 2009]
- cfmupdatenode is not recognizing the "MinManaged-Installing" status
- This update addresses the following APARs: IZ50520 IZ52840
Problems fixed in CSM 1.7.1.1 [May 18, 2009]
- CSM Try-and-Buy license key is not allowed.
- healthCheck does not show unknown speed info on JS blade.
- HWSVRRMD core due to unreserved is being called.
- This update addresses the following APARs: IZ48411 IZ49810
Problems fixed in CSM 1.7.0.19 [April 16, 2009]
- checkpoint of WPAR is failing.
- csm.server prereq issue in multibos.
- This update addresses the following APARs: IZ42536 IZ47932 IZ47935
Problems fixed in CSM 1.7.0.18 [March 5, 2009]
- nodegrp issue when removing nodes.
- cfmupdatenode failing for filename containing ":"
- ibAdapterConfig updated to configure ml0.
- This update addresses the following APARs: IZ41004 IZ42902 IZ43362 IZ43614
Problems fixed in CSM 1.7.0.17 [January 26, 2009]
- dsh fails in non English locale.
- Performance enhancements for CSMAgentRM.
- csmsnap enhancements.
- cfmupdatenode coexistence update.
- This update addresses the following APARs: IZ38648 IZ39065 IZ39066 IZ39067
Problems fixed in CSM 1.7.0.16 [November 13, 2008]
- POWER6 FSP default IP addresses cause Version Mismatch issue in CRHS environment.
- getadapters with -D flag on POWER6 9117-MMA fails with "no adapters found".
- rpower reports wrong status when IBM.HWCTRLRM loses its connection to an HMC.
- dsh -v performance issue on large clusters.
- FSP Proxy Provider connect failures to POWER6 CECs.
- CSMAgentRM not setting certain IBM.ManagementServer attributes.
- This update addresses the following APARs: IZ31725 IZ33415 IZ34163 IZ35702 IZ34290
Problems fixed in CSM 1.7.0.15 [October 7, 2008]
- DLPAR function is unavailable in Power4 environment
- This update addresses the following APARs: IZ29205 IZ33416
Problems fixed in CSM 1.7.0.14 [September 19, 2008]
- Fix for netboot of CSP nodes.
- Update for Hardware Control Java daemon issue on large clusters
- Corrected a dsh issue where the hostname was appearing at the end of some lines.
- This update addresses the following APARs: IZ28845 IZ29741 IZ29195 IZ29199
Problems fixed in CSM 1.7.0.13 [July 24, 2008]
- Update to cfmupdatenode to correctly substitute meta variables.
- Update to cfmupdatenode to prevent files from being copied to incorrect nodes.
- Corrected a dsh issue where the hostname was appearing at the end of some lines.
- Fix for dsh -v reporting the node as not responding when LC_ALL is not C.
- SFP events monitoring updates for POWER5 and POWER6.
- This update addresses the following APARs: IZ20906 IZ21571 IZ23343 IZ23836 IZ23934 IZ26101 IZ26895
Problems fixed in CSM 1.7.0.12 [May 29, 2008]
- Update to dsh for noderanges.
- Fix for dlpar on POWER4 HMCs.
- IBM.HWSVRRM will now process updates to Element_Frame_ID and Element_BPA_MTMS in one request.
- Cluster-Ready Hardware Server updates.
- This update addresses the following APARs: IZ21571 IZ20709 IZ22352
Problems fixed in CSM 1.7.0.11 [May 2, 2008]
- Update IBM Tivoli System Automation for Multiplatforms (TSA) package to 2.3 FP3 to make the HA MS work properly.
- This update addresses the following APARs: IZ16405 IZ16865 IZ18109 IZ18204 IZ18791 IZ18792 IZ19460
Problems fixed in CSM 1.7.0.4 [February 7, 2008]
- Update to prevent hung child processes in dsh.
- RAS enhancement for cfmupdatenode.
- Correction to cfmupdatenode to maintain correct ownership of files.
- Change to certain predefined conditions to make the monitoring of the conditions more efficient.
- Fix for hwsdagent core dump.
- This update addresses the following APARs: IZ11547 IZ11548 IZ11549 IZ12233 IZ12748 IZ14050 IZ14370 IZ14374
Problems fixed in CSM 1.7.0.3 [December 6, 2007]
- Enhanced support for communication between the CSM MS and the 7.0 HMC using SSL.
- Update to predefined conditions that are shipped with CSM for more efficient monitoring.
- Corrected a condition where the ManagedNode Status attribute remains 127.
- This update addresses the following APARs: IZ08386 IZ09468 IZ10063
Problems fixed in CSM 1.7.0.1 [November 12, 2007]
- Support RHEL 5.1 GA.
- Improve updatenode command's performance.
- syslog monitoring performance enhancement.
Was this topic helpful?
Document Information
Modified date:
10 August 2010
UID
isg400000028