Question & Answer
Question
This document describes:
- Various scenarios that prevent a core dump from being generated when a process terminates abnormally
- How to avoid the problems with the chcore and syscorepath commands.
Answer
Introduction
The default core dump facility on AIX normally creates a file named core in the current working directory for the process that terminated abnormally. If a core file is created successfully, a CORE_DUMP entry is written into the error report. Sometimes a core file is not created, and a CORE_DUMP_FAILED error might be added to the error report to log the failure. This error contains a reason code that can be used to help determine why the core file was not created. The reason code is an errno code, a system error code that is used to report errors from library functions. errno codes are listed in the AIX header file /usr/include/sys/errno.h.
Some of the causes for core dump failure can be avoided by configuring the core dump facility with the chcore command or the older syscorepath command. These commands enable a user to set up a directory where all core files will be written. If the chcore -n on option is used, the syscorepath and chcore commands will create unique core file names with the following format:
core.pid.ddhhmmss (where pid is the process ID)
dd: Day of the month,
hh: Hour in 24-hour format
mm: Minutes
ss: Seconds.
See the man pages for chcore and syscorepath for details, and the AIX Core Dump Facility technical note.
CORE_DUMP_FAILED Error
The following output is an example CORE_DUMP_FAILED error. Note the REASON CODE field near the bottom of the entry.
The SIGNAL NUMBER section contains the signal that caused the program to terminate. These signals can be listed by running the command kill -l. The CORE FILE NAME section contains the location and name of the core file that would have been written if there was no failure. The PROGRAM NAME section contains the name of the program that terminated. The REASON CODE section contains an errno constant that can be used to diagnose the cause of the core dump failure. The errno constants can be viewed in the file /usr/include/sys/errno.h. Only some of the errno codes are used as reason codes.
Note: On some older versions of AIX, the Probable Causes section contains the line "SYSTEM RUNNING OUT OF PAGING SPACE", and the Recommended Actions section contains the line "DEFINE ADDITIONAL PAGING SPACE". These messages are misleading and can be ignored.
errno Codes
Here are some of the errno codes that could be listed in a CORE_DUMP_FAILED error. The most common codes are in bold text.
Failure Scenarios
The following table contains various scenarios that can keep a core file from being created when a process terminates abnormally. For each scenario, information is provided about the CORE_DUMP_FAILED error if one is added to the error report.
CORE_DUMP_FAILED Error
The following output is an example CORE_DUMP_FAILED error. Note the REASON CODE field near the bottom of the entry.
LABEL: CORE_DUMP_FAILED
IDENTIFIER: 45C7A35B
Date/Time: Mon Jan 17 14:15:43 MST
Sequence Number: 39603
Machine Id: 0008ADAA4C00
Node Id: p620
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
INTERNAL SOFTWARE ERROR
User Causes
USER GENERATED SIGNAL
Failure Causes
CORE DUMP FAILED - SEE A REASON CODE BELOW
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
11
USER'S PROCESS ID:
57812
REASON CODE
11
USER ID
232
PROCESSOR ID
0
CORE FILE NAME
/u1/GA.PROD/core
PROGRAM NAME
uvsh
The SIGNAL NUMBER section contains the signal that caused the program to terminate. These signals can be listed by running the command kill -l. The CORE FILE NAME section contains the location and name of the core file that would have been written if there was no failure. The PROGRAM NAME section contains the name of the program that terminated. The REASON CODE section contains an errno constant that can be used to diagnose the cause of the core dump failure. The errno constants can be viewed in the file /usr/include/sys/errno.h. Only some of the errno codes are used as reason codes.
Note: On some older versions of AIX, the Probable Causes section contains the line "SYSTEM RUNNING OUT OF PAGING SPACE", and the Recommended Actions section contains the line "DEFINE ADDITIONAL PAGING SPACE". These messages are misleading and can be ignored.
errno Codes
Here are some of the errno codes that could be listed in a CORE_DUMP_FAILED error. The most common codes are in bold text.
#define EPERM 1 /* Operation not permitted */
#define EIO 5 /* I/O error */
#define EAGAIN 11 /* Resource temporarily unavailable */
#define EACCES 13 /* Permission denied */
#define EBUSY 16 /* Resource busy */
#define EEXIST 17 /* File exists */
#define ENFILE 23 /* Too many open files in system */
#define EMFILE 24 /* Too many open files */
#define EFBIG 27 /* File too large */
#define ENOSPC 28 /* No space left on device */
Failure Scenarios
The following table contains various scenarios that can keep a core file from being created when a process terminates abnormally. For each scenario, information is provided about the CORE_DUMP_FAILED error if one is added to the error report.
Scenario | CORE_DUMP_FAILED |
There is not enough space in the file system to write the core file. | REASON CODE ENOSPC 28 |
The ulimit for core is set to 0 in the account where the program is running. This disables core file creation. | REASON CODE EPERM 1 CORE FILE NAME blank |
The process sets a current working directory where it does not have write permissions. Since the core file is written into the current working directory, the core file cannot be written. Note: Use the chcore or syscorepath command to avoid this failure. |
REASON CODE EACCES 13 CORE FILE NAME path (path to where the system attempted to write the core file) |
By default, all core files that are generated on an AIX system will have the name core. If a process is core dumping and the core file is being written, and another process terminates and attempts to write a core file in the same directory, the file core will be busy and the second process will not be able to write to the file. Note: Use the chcore or syscorepath command and unique core file naming to avoid this failure. |
REASON CODE EAGAIN 11 OR EACCES 13 |
The process has set the SA_NODUMP flag in the call to sigaction(). You would need the source code for the program to verify that this is the reason for the core dump failure. Any program can prevent a core dump by setting this flag in a sigaction request. | REASON CODE EPERM 1 |
If the suid or sgid bit is set on the executable, then it is possible that a core file will not be created. This can happen if the real user or group id is not identical to the effective user or group id. Notes See Example 1 |
REASON CODE EPERM 1 CORE FILE NAME blank |
A process attempts to write a core file into a directory where a core file already exists and the ownership and permissions on the file do not allow it to be overwritten. Notes See Example 2 Use the chcore or syscorepath command to avoid this failure. |
REASON CODE EACCES 13 CORE FILE NAME path (path to where the system attempted to write the core file) |
A process attempts to write a core file into a directory where a core file already exists. This core file is owned by another user but has write permissions enabled on either group or other. The attempt to write the new core file results in the core file being zeroed out. Notes See Example 3 Use the chcore or syscorepath command to avoid this failure. |
REASON CODE EPERM 1 CORE FILE NAME path (path to where the system attempted to write the core file) Note: Some versions of AIX might not add the CORE_DUMP_FAILED entry to the error report. |
A process traps the signal whose default action is to create a core file but does not call the abort() function to actually create the core file. | None |
A process ignores a signal that would, by default, generate a core file. Notes See Example 4 |
None |
Example 1
If the suid or sgid bit is set on the executable, then a core file may not be created. This can happen if the real user or group id is not identical to the effective user or group id. According to the man pages for core, a core dump is not be created if the saved user id and the effective user id are not the same, or if the saved group id and the effective group id are not the same.
chmod +s program.exe
This command turns on both suid and sgid. This prevents creation of a core file.
chmod u+s program.exe
This command will turn on only suid.
If sgid is turned on, then the core file is not created, because the real group id and the effective group id is not the same.
- Example A
Permissions of program.exe are root:fnusr, 0755
chmod +s program.exe
Permissions of program.exe are root:fnusr, 6755
From root, execute program.exe:
Real/Saved user id : root
Effective user id : root
Real/Saved group id : system
Effective group id : fnusr
Note: The saved group id is not the same as the effective group id, so no core file is created. - Example B
Permissions of program.exe are root:fnusr, 0755
chmod u+s program.exe
Permissions of program.exe are root:fnusr, 4755
From root, execute program.exe:
Real/Saved user id : root
Effective user id : root
Real/Saved group id : system
Effective group id : system
Note: The saved and effective user ids are the same, and the saved and effective group ids are the same, so a core file is created.
Example 2
A process attempts to write a core file into a directory where a core file already exists, and the ownership and permissions on the file do not allow it to be overwritten.
$ ls -l core
-rw-r--r-- 1 rej staff 769727 Oct 04 08:59 core
$ id
uid=709(chris) gid=1(staff)
$ sleep 100 &
[1] 352458
$ kill -6 352458
$
[1] + IOT/Abort trap sleep 100 &
$ ls -l core
-rw-r--r-- 1 rej staff 769727 Oct 04 08:59 core
$ errpt -aJ CORE_DUMP_FAILED
---------------------------------------------------------------------------
LABEL: CORE_DUMP_FAILED
IDENTIFIER: FAA1D46F
Date/Time: Tue Oct 4 09:04:01 CDT 2005
Sequence Number: 543
Machine Id: 000870664C00
Node Id: vegas
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
INTERNAL SOFTWARE ERROR
User Causes
USER GENERATED SIGNAL
Failure Causes
CORE DUMP FAILED - SEE A REASON CODE BELOW
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
352458
REASON CODE
13
USER ID
709
PROCESSOR ID
-1
CORE FILE NAME
/home/chris/core
PROGRAM NAME
sleep
Example 3
A process attempts to write a core file into a directory where a core file already exists. This core file is owned by another user but has write permissions enabled on either group or other. The attempt to write the new core file results in the core file being zeroed out.
$ ls -l core
-rw-rw-r-- 1 rej staff 769727 Oct 04 08:49 core
$ id
uid=709(chris) gid=1(staff)
$ sleep 100 &
[1] 237786
$ kill -6 237786
$
[1] + IOT/Abort trap sleep 100 &
$ ls -l core
-rw-rw-r-- 1 rej staff 0 Oct 04 08:52 core
$ errpt -aJ CORE_DUMP_FAILED
---------------------------------------------------------------------------
LABEL: CORE_DUMP_FAILED
IDENTIFIER: FAA1D46F
Date/Time: Tue Oct 4 08:52:36 CDT 2005
Sequence Number: 541
Machine Id: 000870664C00
Node Id: vegas
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
INTERNAL SOFTWARE ERROR
User Causes
USER GENERATED SIGNAL
Failure Causes
CORE DUMP FAILED - SEE A REASON CODE BELOW
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
237786
REASON CODE
1
USER ID
709
PROCESSOR ID
-1
CORE FILE NAME
/home/chris/core
PROGRAM NAME
sleep
Example 4
A process ignores a signal that would, by default, generate a core file. We can determine if a signal is ignored by using the procsig command.
This command will list all signal actions defined for process 237786:
procsig 237786
The output of this command might look like this:
HUP caught
INT caught
QUIT caught
ILL caught
TRAP caught
ABRT caught
EMT caught
FPE caught
KILL default RESTART
BUS caught
SEGV default
SYS caught
PIPE caught
ALRM caught
TERM ignored
URG default
STOP default
TSTP ignored
CONT default
...
chcore and syscorepath
To avoid some of the problems which can cause a core file to not be generated, the chcore or syscorepath commands can be used to direct core files to be written into a user specified directory. In this example, the directory where the core files are copied is /tmp/corefiles.
chcore -p on -n on -l /tmp/corefiles -d
The older syscorepath command can also be used to direct core files to a central location. Unlike chcore, syscorepath can be used to generate core files from suid and sgid executable files.
syscorepath -p /tmp/corefiles
See the man pages for these commands for more details, and the AIX Core Dump Facility technical note.
Conclusion
Normally a core file is written when a process terminates abnormally. The core file can be analyzed to help determine why the process failed. However, there are a number of scenarios that will prohibit a core file from being created. In some of these cases, a CORE_DUMP_FAILED entry is written into the error report. The REASON CODE section in this entry can be used to determine why the core file was not created. For cases where a CORE_DUMP_FAILED entry is not written into the error report, the running process, the process executable file, or the process source code must be investigated to determine why a core file was not generated.
[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Support information","Platform":[{"code":"PF002","label":"AIX"}],"Version":"5.3;6.1;7.1","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]
Was this topic helpful?
Document Information
Modified date:
06 December 2019
UID
isg3T1011240