IBM Support

Core files and email alerts via the ODM & errpt system

How To


Summary

Core files part 2 - this time with the ODM and getting stack trace information.

Objective

Nigels Banner

Steps

Following on from my previous AIXpert blog: "Core files filling important file systems? Want email alerts about each core dump?" it was pointed out to me by AIX guru Mathew Accapadi that we can for many years use the ODM and AIX error reporting framework to send emails on the generation of core files. 

See the previous blog on how to force the core files to a particular directory - and not allow them to fill up important system directories.  For the alternative reporting method, see following section:

Check out the following man page for more information: Error Notification
Note: you can also trap and alert many errors detailed by the errpt command that uses this method.

1) Create an ODM statement in a file (as the root user):

# cat /tmp/odmtext

errnotify:
    en_name = "corenotify"
    en_label = "CORE_DUMP"
    en_method = "/usr/lbin/email_core_warning $1 $2 $3 $4 $5 $6 $7 $8 $9"
#

2) Add the file content to the ODM and then check the entry was accepted (as root user):

# odmadd /tmp/odmtext


# odmget -q "en_name = corenotify"  errnotify

errnotify:
        en_pid = 0
        en_name = "corenotify"
        en_persistenceflg = 0
        en_label = "CORE_DUMP"
        en_crcid = 0
        en_class = ""
        en_type = ""
        en_alertflg = ""
        en_resource = ""
        en_rtype = ""
        en_rclass = ""
        en_symptom = ""
        en_err64 = ""
        en_dup = ""
        en_method = "/usr/lbin/email_core_warning $1 $2 $3 $4 $5 $6 $7 $8 $9"

#

3) This entry runs a script called "/usr/lbin/email_core_warning "when the event with a label of "CORE_DUMP" is generated and pass lots of information parameters to it. So here is the content of my script:

# cat /usr/lbin/email_core_warning
# This part writes the information to a local log file and was good for testing
# You might leave this bit out
date    >>/tmp/core_log
hostname >>/tmp/core_log
echo $* >>/tmp/core_log
echo "----" >>/tmp/core_log

# Sent the information in email
mailx -s "Core dump on `hostname`" nigelgriffiths@blue.ibm.com <<EOF
1 Seqno = $1
2 ErrorId = $2
3 Class = $3
4 Type = $4
5 Flags = $5
6 Resource = $6
7 rType = $7
8 rClass = $8
9 Label = $9
EOF

4) Make this script executable: 

chmod u+x /usr/lbin/email_core_warning

5) Now I run my program to force a core dump and in the test log file I get:

# cat /tmp/core_log 
....
Tue Jul  9 16:42:36 BST 2013
gold6.uk.ibm.com
91 0xa924a5fc S PERM FALSE SYSPROC NONE NONE CORE_DUMP
----

6) And AIX email output using mailx:

Message 32:
From root Tue Jul  9 16:42:36 2013
Date: Tue, 9 Jul 2013 16:42:36 +0100
From: root
To: nigelgriffiths
Subject: Core dump on gold6.uk.ibm.com

1 Seqno = 91
2 ErrorId = 0xa924a5fc
3 Class = S
4 Type = PERM
5 Flags = FALSE
6 Resource = SYSPROC
7 rType = NONE
8 rClass = NONE
9 Label = CORE_DUMP

No details on the core file name or a stack trace but it is near instantaneous.

The errpt -a command output has this information and more including

  • The core file name.
  • The name of the crashed program.
  • A stack trace.  

- - - The End - - -

UPDATE from Russell Adams ( Adam's website is here http://adamssystems.nl/)

I use a similar entry to email every errpt item. This entry does not generate high numbers of emails for low numbers of servers.  Change the method line then you can email yourself the output of the 'errpt -a -l' command on that entry. The following example is the complete "email every item" entry I use as a sample.

errnotify: 
        en_pid = 0 
        en_name = "mail_err" 
        en_persistenceflg = 1 
        en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/mail -s \"Errpt $1 $4 $3 $9\" root"

Additional Information


Other places to find Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"","label":""}],"Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions"}]

Document Information

Modified date:
12 June 2023

UID

ibm11165438