IBM Support

Purescale alerts are not clearing up

Question & Answer


Question

How can I manually clear alerts in a DB2 pureScale system?

Cause

When there is a host alert or alerts specifying a host name the alert will need to be cleared manually from the host in question.

Answer

Take for example the following "db2instance -list" output:


$ db2instance -list

ID       TYPE             STATE             HOME_HOST   CURRENT_HOST  ALERT   PARTITION_NUMBER      LOGICAL_PORT    NETNAME
--        -------             ---------            ----------------         ------------            ---------   --------------------------------    --------------------- --------------
0       MEMBER        STARTED       mhost23            mhost23               NO                  0                   0    mhost23-ib0
1       MEMBER        STARTED       mhost20           mhost20              YES                  0                   0    mhost20-ib0
2       MEMBER        STARTED       mhost21            mhost21               NO                  0                   0    mhost21-ib0
3       MEMBER        STARTED       mhost22            mhost22              YES                  0                   0    mhost22-ib0
128     CF               PRIMARY       cfhost1            cfhost1               NO                  -                   0    cfhost1-ib0,cfhost1-ib1
129     CF                  PEER                cfhost2            cfhost2               NO                  -                   0    cfhost2-ib0,cfhost2-ib1

HOSTNAME                   STATE                INSTANCE_STOPPED        ALERT
--------                    -----                 ----------------         ------
cf002               ACTIVE                              NO           NO
cf001              ACTIVE                              NO           NO
host22              ACTIVE                              NO          YES
host21              ACTIVE                              NO          YES
host20              ACTIVE                              NO           NO
host23              ACTIVE                              NO           NO


There is currently an alert for a member, CF, or host in the
data-sharing instance. For more information on the alert, its impact,
and how to clear it, run the following command: 'db2cluster -cm -list -alert'.

Running the "db2cluster -cm -list -alert" command displays the following three alerts:


$ db2cluster -cm -list -alert
1.
Alert: The DB2 member '3' could not be started in restart light mode on
host 'mhost21'. Check the db2diag.log for messages concerning a
restart light or database crash recovery failure on the indicated host
for DB2 member '3'.

Action: Check the cluster caching facility cfdiag log files for
messages about CF failures on the host. If there are alerts about
network adapters not responding, this alert cannot be cleared manually.
It will be cleared when a network adapter becomes available. If it is
not a problem with network adapters, this alert needs to be manually
cleared after other alerts are handled. To clear this alert run the
following command: 'db2cluster -cm -clear -alert -member 3'. For more
information, see the 'Troubleshooting options for the db2cluster
command' topic in the DB2 Information Center.

Impact: DB2 member '3' will not be able to restart light on host
'mhost21' until this alert has been cleared.

------------------------------------------------------------------------
2.
Alert: The DB2 member '1' could not be started in restart light mode on
host 'mhost22'. Check the db2diag.log for messages concerning a
restart light or database crash recovery failure on the indicated host
for DB2 member '1'.

Action: Check the cluster caching facility cfdiag log files for
messages about CF failures on the host. If there are alerts about
network adapters not responding, this alert cannot be cleared manually.
It will be cleared when a network adapter becomes available. If it is
not a problem with network adapters, this alert needs to be manually
cleared after other alerts are handled. To clear this alert run the
following command: 'db2cluster -cm -clear -alert -member 1'. For more
information, see the 'Troubleshooting options for the db2cluster
command' topic in the DB2 Information Center.

Impact: DB2 member '1' will not be able to restart light on host
'mhost22' until this alert has been cleared.

------------------------------------------------------------------------
3.
Alert: The DB2 member '1' could not be started in restart light mode on
host 'mhost21'. Check the db2diag.log for messages concerning a
restart light or database crash recovery failure on the indicated host
for DB2 member '1'.

Action: Check the cluster caching facility cfdiag log files for
messages about CF failures on the host. If there are alerts about
network adapters not responding, this alert cannot be cleared manually.
It will be cleared when a network adapter becomes available. If it is
not a problem with network adapters, this alert needs to be manually
cleared after other alerts are handled. To clear this alert run the
following command: 'db2cluster -cm -clear -alert -member 1'. For more
information, see the 'Troubleshooting options for the db2cluster
command' topic in the DB2 Information Center.

Impact: DB2 member '1' will not be able to restart light on host
'mhost21' until this alert has been cleared.

If there are no problems with any of the network adapters and the lssam output looks normal apart from the following "Failed Offline" member resources:


Online IBM.ResourceGroup:db2_db2dsg01_1-rg Control=MemberInProblemState Nominal=Online
        '- Online IBM.Application:db2_db2dsg01_1-rs Control=MemberInProblemState
                |- Online IBM.Application:db2_db2dsg01_1-rs:mhost20
                |- Failed offline IBM.Application:db2_db2dsg01_1-rs:mhost21
                |- Failed offline IBM.Application:db2_db2dsg01_1-rs:mhost22
                '- Offline IBM.Application:db2_db2dsg01_1-rs:mhost23
(...)
Online IBM.ResourceGroup:db2_db2dsg01_3-rg Control=MemberInProblemState Nominal=Online
        '- Online IBM.Application:db2_db2dsg01_3-rs Control=MemberInProblemState
                |- Offline IBM.Application:db2_db2dsg01_3-rs:mhost20
                |- Failed offline IBM.Application:db2_db2dsg01_3-rs:mhost21
                |- Online IBM.Application:db2_db2dsg01_3-rs:mhost22
                '- Offline IBM.Application:db2_db2dsg01_3-rs:mhost23

then this is a situation where the alerts will need to be manually cleared by running the following commands as the instance owner:

Issue the "db2cluster -cm -clear -alert -member 3" command from mhost21
Issue the "db2cluster -cm -clear -alert -member 1" command from mhost22
Issue the "db2cluster -cm -clear -alert -member 1" command from mhost21

This is just one example, but as a general rule of thumb the "db2cluster -cm -clear -alert -member X" command will need to be run from the host which displays the alert.

This Technote is not applicable for the similar alerts in CF.

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"High Availability - PureScale","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"10.1;10.5;9.8","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21688089