lsb.events
The LSF batch event log file lsb.events is used to display LSF batch event history and for mbatchd failure recovery.
Whenever a host, job, or queue changes status, a record is appended to the event log file. The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in the lsf.conf file, and cluster_name is the name of the LSF cluster, as returned by the tlsid command.
The bhist command searches the most current lsb.events file for its output.
lsb.events structure
The event log file is an ASCII file with one record per line. For the lsb.events file, the first line has the format # history_seek_position>, which indicates the file position of the first history event after log switch. For the lsb.events.# file, the first line has the format # timestamp_most_recent_event, which gives the timestamp of the most recent event in the file.
Limiting the size of lsb.events
Use the MAX_JOB_NUM parameter in the lsb.params file to set the maximum number of finished jobs whose events are to be stored in the lsb.events log file.
Once the limit is reached, mbatchd starts a new event log file. The old event log file is saved as lsb.events.n, with subsequent sequence number suffixes incremented by 1 each time a new log file is started. Event logging continues in the new lsb.events file.
Records and fields
The fields of a record are separated by blanks. The first string of an event record indicates its type. The following types of events are recorded:
- JOB_NEW
- JOB_FORWARD
- JOB_ACCEPT
- JOB_ACCEPTACK
- JOB_CHKPNT
- JOB_START
- JOB_START_ACCEPT
- JOB_STATUS
- JOB_SWITCH
- JOB_SWITCH2
- JOB_MOVE
- QUEUE_CTRL
- HOST_CTRL
- MBD_START
- MBD_DIE
- UNFULFILL
- LOAD_INDEX
- JOB_SIGACT
- MIG
- JOB_MODIFY2
- JOB_SIGNAL
- JOB_EXECUTE
- JOB_REQUEUE
- JOB_CLEAN
- JOB_EXCEPTION
- JOB_EXT_MSG
- JOB_ATTA_DATA
- JOB_CHUNK
- SBD_UNREPORTED_STATUS
- PRE_EXEC_START
- JOB_FORCE
- GRP_ADD
- GRP_MOD
- LOG_SWITCH
- JOB_RESIZE_NOTIFY_START
- JOB_RESIZE_NOTIFY_ACCEPT
- JOB_RESIZE_NOTIFY_DONE
- JOB_RESIZE_RELEASE
- JOB_RESIZE_CANCEL
- HOST_POWER_STATUS
- JOB_PROV_HOST
- (As of Fix Pack 10) HOST_CLOSURE_LOCK_ID_CTRL
- (As of Fix Pack 10) ATTR_CREATE
- (As of Fix Pack 10) ATTR_DELETE
- (As of Fix Pack 10) ATTR_INFO
JOB_NEW
- Version number (%s)
- The version number
- Event time (%d)
- The time of the event
- jobId (%d)
- Job ID
- userId (%d)
- UNIX user ID of the submitter
- options (%d)
- Bit flags for job processing
- numProcessors (%d)
- Number of processors requested for execution
- submitTime (%d)
- Job submission time
- beginTime (%d)
- Start time – the job should be started on or after this time
- termTime (%d)
- Termination deadline – the job should be terminated by this time (%d)
- sigValue (%d)
- Signal value
- chkpntPeriod (%d)
- Checkpointing period
- restartPid (%d)
- Restart process ID
- userName (%s)
- User name
- rLimits
- Soft CPU time limit (%d), see getrlimit(2)
- rLimits
- Soft file size limit (%d), see getrlimit(2)
- rLimits
- Soft data segment size limit (%d), see getrlimit(2)
- rLimits
- Soft stack segment size limit (%d), see getrlimit(2)
- rLimits
- Soft core file size limit (%d), see getrlimit(2)
- rLimits
- Soft memory size limit (%d), see getrlimit(2)
- rLimits
-
JOB_RESIZE_NOTIFY_START
- rLimits
- Reserved (%d)
- rLimits
- Reserved (%d)
- rLimits
- Soft run time limit (%d), see getrlimit(2)
- rLimits
- Reserved (%d)
- hostSpec (%s)
- Model or host name for normalizing CPU time and run time
- hostFactor (%f)
- CPU factor of the above host
- umask (%d)
- File creation mask for this job
- queue (%s)
- Name of job queue to which the job was submitted
- resReq (%s)
- Resource requirements
- fromHost (%s)
- Submission host name
- cwd (%s)
- Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)
- chkpntDir (%s)
- Checkpoint directory
- inFile (%s)
- Input file name (up to 4094 characters for UNIX or 255 characters for Windows)
- outFile (%s)
- Output file name (up to 4094 characters for UNIX or 255 characters for Windows)
- errFile (%s)
- Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)
- subHomeDir (%s)
- Submitter’s home directory
- jobFile (%s)
- Job file name
- numAskedHosts (%d)
- Number of candidate host names
- askedHosts (%s)
- List of names of candidate hosts for job dispatching
- dependCond (%s)
- Job dependency condition
- preExecCmd (%s)
- Job pre-execution command
- jobName (%s)
- Job name (up to 4094 characters)
- command (%s)
- Job command (up to 4094 characters for UNIX or 255 characters for Windows)
- nxf (%d)
- Number of files to transfer (%d)
- xf (%s)
- List of file transfer specifications
- mailUser (%s)
- Mail user name
- projectName (%s)
- Project name
- niosPort (%d)
- Callback port if batch interactive job
- maxNumProcessors (%d)
- Maximum number of processors
- schedHostType (%s)
- Execution host type
- loginShell (%s)
- Login shell
- timeEvent (%d)
- Time Event, for job dependency condition; specifies when time event ended
- userGroup (%s)
- User group
- exceptList (%s)
- Exception handlers for the job
- options2 (%d)
- Bit flags for job processing
- idx (%d)
- Job array index
- inFileSpool (%s)
- Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)
- commandSpool (%s)
- Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)
- jobSpoolDir (%s)
- Job spool directory (up to 4094 characters for UNIX or 255 characters for Windows)
- userPriority (%d)
- User priority
- rsvId %s
- Advance reservation ID; for example, "user2#0"
- jobGroup (%s)
- The job group under which the job runs
- sla (%s)
- SLA service class name under which the job runs
- rLimits
- Thread number limit
- extsched (%s)
- External scheduling options
- warningAction (%s)
- Job warning action
- warningTimePeriod (%d)
- Job warning time period in seconds
- SLArunLimit (%d)
- Absolute run time limit of the job for SLA service classes
- licenseProject (%s)
- IBM® Spectrum LSF License Scheduler project name
- options3 (%d)
- Bit flags for job processing
- app (%s)
- Application profile name
- postExecCmd (%s)
- Post-execution command to run on the execution host after the job finishes
- runtimeEstimation (%d)
- Estimated run time for the job
- requeueEValues (%s)
- Job exit values for automatic job requeue
- resizeNotifyCmd (%s)
- Resize notification command to run on the first execution host to inform job of a resize event.
- jobDescription (%s)
- Job description (up to 4094 characters).
- submitEXT
- Submission extension field, reserved for internal use.
- Num (%d)
- Number of elements (key-value pairs) in the structure.
- key (%s)
- Reserved for internal use.
- value (%s)
- Reserved for internal use.
- srcJobId (%d)
- The submission cluster job ID
- srcCluster (%s)
- The name of the submission cluster
- dstJobId (%d)
- The execution cluster job ID
- dstCluster (%s)
- The name of the execution cluster
- jobaffReq (%s)
- The host-level attribute affinity request or job affinity request.
- network (%s)
- Network requirements for IBM Parallel Environment (PE) jobs.
- cpu_frequency(%d)
- CPU frequency at which the job runs.
- options4 (%d)
- Bit flags for job processing
- nStinFile (%d)
- (LSF Data Manager) The number of requested input files
- stinFiles
- (LSF Data Manager) List
of input data requirement files requested. The list has the following elements:
- options (%d)
- Bit field that identifies whether the data requirement is an input file or a tag.
- host (%s)
- Source host of the input file. This field is empty if the data requirement is a tag.
- name(%s)
- Full path to the input data requirement file on the host. This field is empty if the data requirement is a tag.
- hash (%s)
- Hash key computed for the data requirement file at job submission time. This field is empty if the data requirement is a tag.
- size (%lld)
- Size of the data requirement file at job submission time in bytes.
- modifyTime (%d)
- Last modified time of the data requirement file at job submission time.
- pendTimeLimit (%d)
- Job-level pending time limit of the job, in seconds.
- eligiblePendTimeLimit (%d)
- Job-level eligible pending time limit of the job, in seconds.
JOB_FORWARD
A job has been forwarded to a remote cluster (IBM Spectrum LSF multicluster capability only).
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
- Version number (%s)
- The version number
- Event time (%d)
- The time of the event
- jobId (%d)
- Job ID
- numReserHosts (%d)
- Number of reserved hosts in the remote cluster
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the reserHosts field.
- cluster (%s)
- Remote cluster name
- reserHosts (%s)
- List of names of the reserved hosts in the remote cluster
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
- idx (%d)
- Job array index
- srcJobId (%d)
- The submission cluster job ID
- srcCluster (%s)
- The name of the submission cluster
- dstJobId (%d)
- The execution cluster job ID
- dstCluster (%s)
- The name of the execution cluster
- effectiveResReq (%s)
- The runtime resource requirements used for the job.
- ineligiblePendTime(%d)
- Time in seconds that the job has been in the ineligible pending state.
JOB_ACCEPT
- Version number (%s)
-
The version number
- Event time (%d)
-
The time of the event
- jobId (%d)
-
Job ID at the accepting cluster
- remoteJid (%d)
-
Job ID at the submission cluster
- cluster (%s)
-
Job submission cluster name
- idx (%d)
-
Job array index
- srcJobId (%d)
-
The submission cluster job ID
- srcCluster (%s)
-
The name of the submission cluster
- dstJobId (%d)
-
The execution cluster job ID
- dstCluster (%s)
-
The name of the execution cluster
JOB_ACCEPTACK
- Version number (%s)
-
The version number
- Event time (%d)
-
The time of the event
- jobId (%d)
-
The ID number of the job at the execution cluster
- idx (%d)
-
The job array index
- jobRmtAttr (%d)
-
Remote job attributes from:
-
Remote batch job on the submission side
-
Lease job on the submission side
-
Remote batch job on the execution side
-
Lease job on the execution side
-
Lease job re-syncronization during restart
-
Remote batch job re-running on the execution cluster
-
- srcCluster (%s)
-
The name of the submission cluster
- srcJobId (%d)
-
The submission cluster job ID
- dstCluster (%s)
-
The name of the execution cluster
- dstJobId (%d)
-
The execution cluster job ID
JOB_CHKPNT
- Version number (%s)
-
The version number
- Event time (%d)
-
The time of the event
- jobId (%d)
-
The ID number of the job at the execution cluster
- period (%d)
-
The new checkpointing period
- jobPid (%d)
-
The process ID of the checkpointing process, which is a child sbatchd
- ok (%d)
-
- 0 means the checkpoint started
- 1 means the checkpoint succeeded
- flags (%d)
-
Checkpoint flags, see <lsf/lsbatch.h>:
LSB_CHKPNT_KILL
: Kill the process if checkpoint is successfulLSB_CHKPNT_FORCE
: Force checkpoint even if non-checkpointable conditions existLSB_CHKPNT_MIG
: Checkpoint for the purpose of migration
- idx (%d)
-
Job array index (must be 0 in JOB_NEW)
- srcJobId (%d)
-
The submission cluster job ID
- srcCluster (%s)
-
The name of the submission cluster
- dstJobId (%d)
-
The execution cluster job ID
- dstCluster (%s)
-
The name of the execution cluster
JOB_START
A job has been dispatched.
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- jStatus (%d)
Job status, (4, indicating the RUN status of the job)
- jobPid (%d)
Job process ID
- jobPGid (%d)
Job process group ID
- hostFactor (%f)
CPU factor of the first execution host
- numExHosts (%d)
Number of processors used for execution
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.
- execHosts (%s)
List of execution host names
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
- queuePreCmd (%s)
Pre-execution command
- queuePostCmd (%s)
Post-execution command
- jFlags (%d)
Job processing flags
- userGroup (%s)
User group name
- idx (%d)
Job array index
- additionalInfo (%s)
Placement information of HPC jobs
- preemptBackfill (%d)
- How long a backfilled job can run. Used for preemption backfill jobs.
- jFlags2 (%d)
- Job flags
- srcJobId (%d)
The submission cluster job ID
- srcCluster (%s)
The name of the submission cluster
- dstJobId (%d)
The execution cluster job ID
- dstCluster (%s)
The name of the execution cluster
- effectiveResReq (%s)
The runtime resource requirements used for the job.
- num_network (%d)
The number of the allocated network for IBM Parallel Environment (PE) jobs.
- networkID (%s)
Network ID of the allocated network for IBM Parallel Environment (PE) jobs.
- num_window (%d)
Number of allocated windows for IBM Parallel Environment (PE) jobs.
- cpu_frequency(%d)
CPU frequency at which the job runs.
- numAllocSlots(%d)
Number of allocated slots.
- allocSlots(%s)
List of execution host names where the slots are allocated.
- ineligiblePendTime(%d)
-
Time in seconds that the job has been in the ineligible pending state.
- SCHEDULING_OVERHEAD(%f)
-
The scheduler overhead for a job, in milliseconds. This is the total time that is taken by the scheduler to dispatch the job and the time that is taken by the scheduler to reallocate resources to a new job.
JOB_START_ACCEPT
- Version number (%s)
-
The version number
- Event time (%d)
-
The time of the event
- jobId (%d)
-
Job ID
- jobPid (%d)
-
Job process ID
- jobPGid (%d)
-
Job process group ID
- idx (%d)
-
Job array index
- srcJobId (%d)
-
The submission cluster job ID
- srcCluster (%s)
-
The name of the submission cluster
- dstJobId (%d)
-
The execution cluster job ID
- dstCluster (%s)
-
The name of the execution cluster
JOB_STATUS
- Version number (%s)
-
The version number
- Event time (%d)
-
The time of the event
- jobId (%d)
-
Job ID
- jStatus (%d)
-
New status, see <lsf/lsbatch.h>
For JOB_STAT_EXIT (32) and JOB_STAT_DONE (64), host-based resource usage information is appended to the JOB_STATUS record in the fields numHostRusage and hostRusage.
- reason (%d)
-
Pending or suspended reason code, see <lsf/lsbatch.h>
- subreasons (%d)
-
Pending or suspended subreason code, see <lsf/lsbatch.h>
- cpuTime (%f)
-
CPU time consumed so far
- endTime (%d)
-
Job completion time
- ru (%d)
-
Resource usage flag
- lsfRusage (%s)
-
Resource usage statistics, see <lsf/lsf.h>
- exitStatus (%d)
-
Exit status of the job, see <lsf/lsbatch.h>
- idx (%d)
-
Job array index
- exitInfo (%d)
-
Job termination reason, see <lsf/lsbatch.h>
- duration4PreemptBackfill
-
How long a backfilled job can run. Used for preemption backfill jobs
- numHostRusage(%d)
-
For a jStatus of JOB_STAT_EXIT (32) or JOB_STAT_DONE (64), this field contains the number of host-based resource usage entries (hostRusage) that follow. 0 unless LSF_HPC_EXTENSIONS="HOST_RUSAGE" is set in lsf.conf.
- hostRusage
-
For a jStatus of JOB_STAT_EXIT (32) or JOB_STAT_DONE (64), these fields contain host-based resource usage information for the job for parallel jobs when LSF_HPC_EXTENSIONS="HOST_RUSAGE" is set in lsf.conf.
- hostname (%s)
-
Name of the host.
- mem(%d)
-
Total resident memory usage of all processes in the job running on this host.
- swap(%d)
-
Total virtual memory usage of all processes in the job running on this host.
- utime(%d)
-
User time used on this host.
- stime(%d)
-
System time used on this host.
- hHostExtendInfo(%d)
-
Number of following key-value pairs containing extended host information (PGIDs and PIDs). Set to 0 in lsb.events, lsb.acct, and lsb.stream files.
- maxMem
-
Peak memory usage (in Mbytes)
- avgMem
-
Average memory usage (in Mbytes)
- srcJobId (%d)
-
The submission cluster job ID
- srcCluster (%s)
-
The name of the submission cluster
- dstJobId (%d)
-
The execution cluster job ID
- dstCluster (%s)
-
The name of the execution cluster
- ineligiblePendTime(%d)
-
Time in seconds that the job has been in the ineligible pending state. This is only recorded when the job is finished (DONE or EXIT) and not for any other changes in job status.
- indexRangeCnt (%d)
-
The number of element ranges indicating successful signals
- indexRangeStart1 (%d)
-
The start of the first index range.
- indexRangeEnd1 (%d)
-
The end of the first index range.
- indexRangeStep1 (%d)
-
The step of the first index range.
- indexRangeStartN (%d)
-
The start of the last index range.
- indexRangeEndN (%d)
-
The end of the last index range.
- indexRangeStepN (%d)
-
The step of the last index range.
JOB_SWITCH
- Version number (%s)
-
The version number
- Event time (%d)
-
The time of the event
- userId (%d)
-
UNIX user ID of the user invoking the command
- jobId (%d)
-
Job ID
- queue (%s)
-
Target queue name
- idx (%d)
-
Job array index. If it is -1, the indexRangeCnt takes effect.
- userName (%s)
-
Name of the job submitter
- srcJobId (%d)
-
The submission cluster job ID
- srcCluster (%s)
-
The name of the submission cluster
- dstJobId (%d)
-
The target execution cluster job ID
- dstCluster (%s)
-
The name of the execution cluster
- rmtJobCtrlStage (%d)
-
The stage of remote job switch.
- numRmtCtrlResult (%d)
- The number of remote job switch record.
- rmtCtrlResult
-
The record of each remote job switch session.
- indexRangeCnt (%d)
-
The number of element ranges indicating successful signals
- indexRangeStart1 (%d)
-
The start of the first index range.
- indexRangeEnd1 (%d)
-
The end of the first index range.
- indexRangeStep1 (%d)
-
The step of the first index range.
- indexRangeStartN (%d)
-
The start of the last index range.
- indexRangeEndN (%d)
-
The end of the last index range.
- indexRangeStepN (%d)
-
The step of the last index range.
JOB_SWITCH2
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- userId (%d)
UNIX user ID of the user invoking the command
- jobId (%d)
Job ID
- queue (%s)
Target queue name
- userName (%s)
Name of the job submitter
- indexRangeCnt (%s)
The number of element ranges indicating successful signals
- indexRangeStart1 (%d)
The start of the first index range
- indexRangeEnd1 (%d)
The end of the first index range
- indexRangeStep1 (%d)
The step of the first index range
- indexRangeStart2 (%d)
The start of the second index range
- indexRangeEnd2 (%d)
The end of the second index range
- indexRangeStep2 (%d)
The step of the second index range
- indexRangeStartN (%d)
The start of the last index range
- indexRangeEndN (%d)
The end of the last index range
- indexRangeStepN (%d)
The step of the last index range
- srcJobId (%d)
The submission cluster job ID
- srcCluster (%s)
The name of the submission cluster
- dstJobId (%d)
The execution cluster job ID
- rmtCluster (%d)
The destination cluster to which the remote jobs belong
- rmtJobCtrlId (%d)
Unique identifier for the remote job control session in the MultiCluster.
- numSuccJobId (%d)
The number of jobs that were successful during this remote control operation.
- succJobIdArray (%d)
Contains IDs for all the jobs that were successful during this remote control operation.
- numFailJobId (%d)
The number of jobs which failed during this remote control session.
- failJobIdArray (%d)
Contains IDs for all the jobs that failed during this remote control operation.
- failReason (%d)
Contains the failure code and reason for each failed job in the failJobIdArray.
To prevent JOB_SWITCH2 from getting too long, the number of index ranges is limited to 500 per JOB_SWITCH2 event log. Therefore, if switching a large job array, several JOB_SWITCH2 events may be generated.
JOB_MOVE
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- userId (%d)
UNIX user ID of the user invoking the command
- jobId (%d)
Job ID
- position (%d)
Position number
- base (%d)
Operation code, (TO_TOP or TO_BOTTOM), see <lsf/lsbatch.h>
- idx (%d)
Job array index
- userName (%s)
Name of the job submitter
- rmtJobCtrlStage (%d)
- The stage of remote job move handling process.
- numRmtCtrlResult (%d)
- The stage of remote job move handling process.
- rmtJobCtrlRecord
- Remote job move result.
- jobArrayIndex
- Job array index.
- numRmtCtrlResult2 (%d)
- The number of records for remote job move handling.
- rmtJobCtrlRecord2
- Remote job move result.
QUEUE_CTRL
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- opCode (%d)
Operation code), see <lsf/lsbatch.h>
- queue (%s)
Queue name
- userId (%d)
UNIX user ID of the user invoking the command
- userName (%s)
Name of the user
- ctrlComments (%s)
Administrator comment text from the -C option of badmin queue control commands qclose, qopen, qact, and qinact
HOST_CTRL
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- opCode (%d)
Operation code, see <lsf/lsbatch.h>
- host (%s)
Host name
- userId (%d)
UNIX user ID of the user invoking the command
- userName (%s)
Name of the user
- ctrlComments (%s)
Administrator comment text from the -C option of badmin host control commands hclose and hopen
MBD_START
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- master (%s)
Management host name
- cluster (%s)
cluster name
- numHosts (%d)
Number of hosts in the cluster
- numQueues (%d)
Number of queues in the cluster
MBD_DIE
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- master (%s)
Management host name
- numRemoveJobs (%d)
Number of finished jobs that have been removed from the system and logged in the current event file
- exitCode (%d)
Exit code from mbatchd
- ctrlComments (%s)
Administrator comment text from the -C option of badmin mbdrestart
UNFULFILL
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- notSwitched (%d)
Not switched: the mbatchd has switched the job to a new queue, but the sbatchd has not been informed of the switch
- sig (%d)
Signal: this signal has not been sent to the job
- sig1 (%d)
Checkpoint signal: the job has not been sent this signal to checkpoint itself
- sig1Flags (%d)
Checkpoint flags, see <lsf/lsbatch.h>
- chkPeriod (%d)
New checkpoint period for job
- notModified (%s)
If set to true, then parameters for the job cannot be modified.
- idx (%d)
Job array index
LOAD_INDEX
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- nIdx (%d)
Number of index names
- name (%s)
List of index names
JOB_SIGACT
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- period (%d)
Action period
- pid (%d)
Process ID of the child sbatchd that initiated the action
- jstatus (%d)
Job status
- reasons (%d)
Job pending reasons
- flags (%d)
Action flags, see <lsf/lsbatch.h>
- actStatus (%d)
Action status:
1: Action started
2: One action preempted other actions
3: Action succeeded
4: Action Failed
- signalSymbol (%s)
Action name, accompanied by actFlags
- idx (%d)
Job array index
MIG
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- numAskedHosts (%d)
Number of candidate hosts for migration
- askedHosts (%s)
List of names of candidate hosts
- userId (%d)
UNIX user ID of the user invoking the command
- idx (%d)
Job array index
- userName (%s)
Name of the job submitter
- srcJobId (%d)
The submission cluster job ID
- srcCluster (%s)
The name of the submission cluster
- dstJobId (%d)
The execution cluster job ID
- dstCluster (%s)
The name of the execution cluster
JOB_MODIFY2
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobIdStr (%s)
Job ID
- options (%d)
Bit flags for job modification options processing
- options2 (%d)
Bit flags for job modification options processing
- delOptions (%d)
Delete options for the options field
- userId (%d)
UNIX user ID of the submitter
- userName (%s)
User name
- submitTime (%d)
Job submission time
- umask (%d)
File creation mask for this job
- numProcessors (%d)
Number of processors requested for execution. The value 2147483646 means the number of processors is undefined.
- beginTime (%d)
Start time – the job should be started on or after this time
- termTime (%d)
Termination deadline – the job should be terminated by this time
- sigValue (%d)
Signal value
- restartPid (%d)
Restart process ID for the original job
- jobName (%s)
Job name (up to 4094 characters)
- queue (%s)
Name of job queue to which the job was submitted
- numAskedHosts (%d)
Number of candidate host names
- askedHosts (%s)
List of names of candidate hosts for job dispatching; blank if the last field value is 0. If there is more than one host name, then each additional host name will be returned in its own field
- resReq (%s)
Resource requirements
- rLimits
Soft CPU time limit (%d), see getrlimit(2)
- rLimits
Soft file size limit (%d), see getrlimit(2)
- rLimits
Soft data segment size limit (%d), see getrlimit2)
- rLimits
Soft stack segment size limit (%d), see getrlimit(2)
- rLimits
Soft core file size limit (%d), see getrlimit(2)
- rLimits
Soft memory size limit (%d), see getrlimit(2)
- rLimits
Reserved (%d)
- rLimits
Reserved (%d)
- rLimits
Reserved (%d)
- rLimits
Soft run time limit (%d), see getrlimit(2)
- rLimits
Reserved (%d)
- hostSpec (%s)
Model or host name for normalizing CPU time and run time
- dependCond (%s)
Job dependency condition
- timeEvent (%d)
Time Event, for job dependency condition; specifies when time event ended
- subHomeDir (%s)
Submitter’s home directory
- inFile (%s)
Input file name (up to 4094 characters for UNIX or 255 characters for Windows)
- outFile (%s)
Output file name (up to 4094 characters for UNIX or 255 characters for Windows)
- errFile (%s)
Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)
- command (%s)
Job command (up to 4094 characters for UNIX or 255 characters for Windows)
- chkpntPeriod (%d)
Checkpointing period
- chkpntDir (%s)
Checkpoint directory
- nxf (%d)
Number of files to transfer
- xf (%s)
List of file transfer specifications
- jobFile (%s)
Job file name
- fromHost (%s)
Submission host name
- cwd (%s)
Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)
- preExecCmd (%s)
Job pre-execution command
- mailUser (%s)
Mail user name
- projectName (%s)
Project name
- niosPort (%d)
Callback port if batch interactive job
- maxNumProcessors (%d)
Maximum number of processors. The value 2147483646 means the maximum number of processors is undefined.
- loginShell (%s)
Login shell
- schedHostType (%s)
Execution host type
- userGroup (%s)
User group
- exceptList (%s)
Exception handlers for the job
- delOptions2 (%d)
Delete options for the options2 field
- inFileSpool (%s)
Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)
- commandSpool (%s)
Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)
- userPriority (%d)
User priority
- rsvId %s
Advance reservation ID; for example, "user2#0"
- extsched (%s)
External scheduling options
- warningTimePeriod (%d)
Job warning time period in seconds
- warningAction (%s)
Job warning action
- jobGroup (%s)
The job group to which the job is attached
- sla (%s)
SLA service class name that the job is to be attached to
- licenseProject (%s)
IBM Spectrum LSF License Scheduler project name
- options3 (%d)
Bit flags for job processing
- delOption3 (%d)
Delete options for the options3 field
- app (%s)
Application profile name
- apsString (%s)
Absolute priority scheduling (APS) value set by administrator
- postExecCmd (%s)
Post-execution command to run on the execution host after the job finishes
- runtimeEstimation (%d)
Estimated run time for the job
- requeueEValues (%s)
Job exit values for automatic job requeue
- resizeNotifyCmd (%s)
Resize notification command to run on the first execution host to inform job of a resize event.
- jobdescription (%s)
Job description (up to 4094 characters).
- submitEXT
- Submission extension field, reserved for internal use.
- Num (%d)
- Number of elements (key-value pairs) in the structure.
- key (%s)
- Reserved for internal use.
- value (%s)
- Reserved for internal use.
- srcJobId (%d)
- The submission cluster job ID
- srcCluster (%s)
- The name of the submission cluster
- dstJobId (%d)
- The execution cluster job ID
- dstCluster (%s)
- The name of the execution cluster
- jobaffReq (%s)
- The host-level attribute affinity request or job affinity request.
- srcJobId (%d)
The submission cluster job ID
- srcCluster (%s)
The name of the submission cluster
- dstJobId (%d)
The execution cluster job ID
- dstCluster (%s)
The name of the execution cluster
- network (%s)
Network requirements for IBM Parallel Environment (PE) jobs.
- cpu_frequency(%d)
CPU frequency at which the job runs.
- options4 (%d)
Bit flags for job processing
- nStinFile (%d)
- stinFiles
-
(LSF Data Manager) List of input data requirement files requested. The list has the following elements:
- options (%d)
-
Bit field that identifies whether the data requirement is an input file or a tag.
- host (%s)
-
Source host of the input file. This field is empty if the data requirement is a tag.
- name(%s)
-
Full path to the input data requirement file on the host. This field is empty if the data requirement is a tag.
- hash (%s)
- Hash key computed for the data requirement file at job submission time. This field is empty if the data requirement is a tag.
- size (%lld)
-
Size of the data requirement file at job submission time in bytes.
- modifyTime (%d)
-
Last modified time of the data requirement file at job submission time.
- pendTimeLimit (%d)
-
Job-level pending time limit of the job, in seconds.
- eligiblePendTimeLimit (%d)
-
Job-level eligible pending time limit of the job, in seconds.
- dataGrp (%s)
-
Data group name.
- numRmtCtrlResult2 (%s)
-
Number of remote job modification sessions that are generated by this modify request. JobIDs are recorded in an array index range format.
- rmtCtrlResult2 (%s)
-
Remote job modification records
- indexRangeCnt (%d)
-
The number of element ranges indicating successful signals
- indexRangeStart1 (%d)
-
The start of the first index range.
- indexRangeEnd1 (%d)
-
The end of the first index range.
- indexRangeStep1 (%d)
-
The step of the first index range.
- indexRangeStartN (%d)
-
The start of the last index range.
- indexRangeEndN (%d)
-
The end of the last index range.
- indexRangeStepN (%d)
-
The step of the last index range.
- jobaffReq (%s)
- The host-level attribute affinity request or job affinity request.
JOB_SIGNAL
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- userId (%d)
UNIX user ID of the user invoking the command
- runCount (%d)
Number of runs
- signalSymbol (%s)
Signal name
- idx (%d)
Job array index
- userName (%s)
Name of the job submitter
- srcJobId (%d)
The submission cluster job ID
- srcCluster (%s)
The name of the submission cluster
- dstJobId (%d)
The execution cluster job ID
- dstCluster (%s)
The name of the execution cluster
- indexRangeCnt (%d)
-
The number of element ranges indicating successful signals
- indexRangeStart1 (%d)
-
The start of the first index range.
- indexRangeEnd1 (%d)
-
The end of the first index range.
- indexRangeStep1 (%d)
-
The step of the first index range.
- indexRangeStartN (%d)
-
The start of the last index range.
- indexRangeEndN (%d)
-
The end of the last index range.
- indexRangeStepN (%d)
-
The step of the last index range.
- jStatus (&d)
-
The job status.
JOB_EXECUTE
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- execUid (%d)
Mapped UNIX user ID on execution host
- jobPGid (%d)
Job process group ID
- execCwd (%s)
Current working directory job used on execution host (up to 4094 characters for UNIX or 255 characters for Windows)
- execHome (%s)
Home directory job used on execution host
- execUsername (%s)
Mapped user name on execution host
- jobPid (%d)
Job process ID
- idx (%d)
Job array index
- additionalInfo (%s)
Placement information of HPC jobs
- SLAscaledRunLimit (%d)
Run time limit for the job scaled by the execution host
- execRusage
An internal field used by LSF.
- Position
An internal field used by LSF.
- duration4PreemptBackfill
How long a backfilled job can run; used for preemption backfill jobs
- srcJobId (%d)
The submission cluster job ID
- srcCluster (%s)
The name of the submission cluster
- dstJobId (%d)
The execution cluster job ID
- dstCluster (%s)
The name of the execution cluster
JOB_REQUEUE
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- idx (%d)
Job array index
JOB_CLEAN
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- idx (%d)
Job array index
- indexRangeCnt (%d)
-
The number of element ranges indicating successful signals
- indexRangeStart1 (%d)
-
The start of the first index range.
- indexRangeEnd1 (%d)
-
The end of the first index range.
- indexRangeStep1 (%d)
-
The step of the first index range.
- indexRangeStartN (%d)
-
The start of the last index range.
- indexRangeEndN (%d)
-
The end of the last index range.
- indexRangeStepN (%d)
-
The step of the last index range.
JOB_EXCEPTION
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- exceptMask (%d)
Exception Id
0x01: missched
0x02: overrun
0x04: underrun
0x08: abend
0x10: cantrun
0x20: hostfail
0x40: startfail
0x100:runtime_est_exceeded
- actMask (%d)
Action Id
0x01: kill
0x02: alarm
0x04: rerun
0x08: setexcept
- timeEvent (%d)
Time Event, for missched exception specifies when time event ended.
- exceptInfo (%d)
Except Info, pending reason for missched or cantrun exception, the exit code of the job for the abend exception, otherwise 0.
- idx (%d)
Job array index
JOB_EXT_MSG
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- idx (%d)
Job array index
- msgIdx (%d)
Index in the list
- userId (%d)
Unique user ID of the user invoking the command
- dataSize (%ld)
Size of the data if it has any, otherwise 0
- postTime (%ld)
Message sending time
- dataStatus (%d)
Status of the attached data
- desc (%s)
Text description of the message
- userName (%s)
Name of the author of the message
- Flags (%d)
Used for internal flow control
JOB_ATTA_DATA
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- idx (%d)
Job array index
- msgIdx (%d)
Index in the list
- dataSize (%ld)
Size of the data if is has any, otherwise 0
- dataStatus (%d)
Status of the attached data
- fileName (%s)
File name of the attached data
JOB_CHUNK
This is created when a job is inserted into a chunk.
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- membSize (%ld)
Size of array membJobId
- membJobId (%ld)
Job IDs of jobs in the chunk
- numExHosts (%ld)
Number of execution hosts
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.
- execHosts (%s)
Execution host name array
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
SBD_UNREPORTED_STATUS
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- actPid (%d)
Acting processing ID
- jobPid (%d)
Job process ID
- jobPGid (%d)
Job process group ID
- newStatus (%d)
New status of the job
- reason (%d)
Pending or suspending reason code, see <lsf/lsbatch.h>
- suspreason (%d)
Pending or suspending subreason code, see <lsf/lsbatch.h>
- lsfRusage
- The following fields contain resource usage information for the job (see getrusage(2)). If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
- ru_utime (%f)
User time used
- ru_stime (%f)
System time used
- ru_maxrss (%f)
Maximum shared text size
- ru_ixrss (%f)
Integral of the shared text size over time (in KB seconds)
- ru_ismrss (%f)
Integral of the shared memory size over time (valid only on Ultrix)
- ru_idrss (%f)
Integral of the unshared data size over time
- ru_isrss (%f)
Integral of the unshared stack size over time
- ru_minflt (%f)
Number of page reclaims
- ru_majflt (%f)
Number of page faults
- ru_nswap (%f)
Number of times the process was swapped out
- ru_inblock (%f)
Number of block input operations
- ru_oublock (%f)
Number of block output operations
- ru_ioch (%f)
Number of characters read and written (valid only on HP-UX)
- ru_msgsnd (%f)
Number of System V IPC messages sent
- ru_msgrcv (%f)
Number of messages received
- ru_nsignals (%f)
Number of signals received
- ru_nvcsw (%f)
Number of voluntary context switches
- ru_nivcsw (%f)
Number of involuntary context switches
- ru_exutime (%f)
Exact user time used (valid only on ConvexOS)
- exitStatus (%d)
Exit status of the job, see <lsf/lsbatch.h>
- execCwd (%s)
Current working directory job used on execution host (up to 4094 characters for UNIX or 255 characters for Windows)
- execHome (%s)
Home directory job used on execution host
- execUsername (%s)
Mapped user name on execution host
- msgId (%d)
ID of the message
- actStatus (%d)
Action status
1: Action started
2: One action preempted other actions
3: Action succeeded
4: Action Failed
- sigValue (%d)
Signal value
- seq (%d)
Sequence status of the job
- idx (%d)
Job array index
- jRusage
- The following fields contain resource usage information for the job. If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
- mem (%d)
Total resident memory usage in KB of all currently running processes in a given process group
- swap (%d)
Totaly virtual memory usage in KB of all currently running processes in given process groups
- utime (%d)
Cumulative total user time in seconds
- stime (%d)
Cumulative total system time in seconds
- npids (%d)
Number of currently active process in given process groups. This entry has four sub-fields:
- pid (%d)
Process ID of the child sbatchd that initiated the action
- ppid (%d)
Parent process ID
- pgid (%d)
Process group ID
- jobId (%d)
Process Job ID
- npgids (%d)
Number of currently active process groups
- exitInfo (%d)
Job termination reason, see <lsf/lsbatch.h>
PRE_EXEC_START
A pre-execution command has been started.
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- jStatus (%d)
Job status, (4, indicating the RUN status of the job)
- jobPid (%d)
Job process ID
- jobPGid (%d)
Job process group ID
- hostFactor (%f)
CPU factor of the first execution host
- numExHosts (%d)
Number of processors used for execution
- execHosts (%s)
List of execution host names
- queuePreCmd (%s)
Pre-execution command
- queuePostCmd (%s)
Post-execution command
- jFlags (%d)
Job processing flags
- userGroup (%s)
User group name
- idx (%d)
Job array index
- additionalInfo (%s)
Placement information of HPC jobs
- effectiveResReq (%s)
The runtime resource requirements used for the job.
JOB_FORCE
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
- userId (%d)
UNIX user ID of the user invoking the command
- idx (%d)
Job array index
- options (%d)
Bit flags for job processing
- numExecHosts (%ld)
Number of execution hosts
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.
- execHosts (%s)
Execution host name array
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
- userName (%s)
Name of the user
- queue (%s)
Name of queue if a remote brun job ran; otherwise, this field is empty. For MultiCluster this is the name of the receive queue at the execution cluster.
GRP_ADD
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- userId (%d)
UNIX user ID of the job group owner
- submitTime (%d)
Job submission time
- userName (%s)
User name of the job group owner
- depCond (%s)
Job dependency condition
- timeEvent (%d)
Time Event, for job dependency condition; specifies when time event ended
- groupSpec (%s)
Job group name
- delOptions (%d)
Delete options for the options field
- delOptions2 (%d)
Delete options for the options2 field
- sla (%s)
SLA service class name that the job group is to be attached to
- maxJLimit (%d)
Job group limit set by bgadd -L
- groupType (%d)
- Job group creation method:
- 0x01 - job group was created explicitly
- 0x02 - job group was created implicitly
GRP_MOD
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- userId (%d)
UNIX user ID of the job group owner
- submitTime (%d)
Job submission time
- userName (%s)
User name of the job group owner
- depCond (%s)
Job dependency condition
- timeEvent (%d)
Time Event, for job dependency condition; specifies when time event ended
- groupSpec (%s)
Job group name
- delOptions (%d)
Delete options for the options field
- delOptions2 (%d)
Delete options for the options2 field
- sla (%s)
SLA service class name that the job group is to be attached to
- maxJLimit (%d)
Job group limit set by bgmod -L
LOG_SWITCH
This is created when switching the event file lsb.events. The fields in order of occurrence are:
- Version number (%s)
The version number
- Event time (%d)
The time of the event
- jobId (%d)
Job ID
JOB_RESIZE_NOTIFY_START
- Version number (%s)
-
The version number.
- Event time (%d)
-
The time of the event.
- jobId (%d)
-
The job ID.
- idx (%d)
-
Job array index.
- notifyId (%d)
-
Identifier or handle for notification.
- numResizeHosts (%d)
-
Number of processors used for execution. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in short format.
- resizeHosts (%s)
-
List of execution host names. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
- numResizeSlots (%d)
-
Number of allocated slots for executing resize.
- resizeSlots (%s)
-
List of execution host names where slots are allocated for resizing.
- GPU_ALLOC_COMPAT (%s)
-
The string to describe resized portions of the GPU allocation.
- GPU_MEM_RSV (%s)
- The GPU memory reserved by resized tasks on execution hosts.
JOB_RESIZE_NOTIFY_ACCEPT
- Version number (%s)
The version number.
- Event time (%d)
The time of the event.
- jobId (%d)
The job ID.
- idx (%d)
Job array index.
- notifyId (%d)
Identifier or handle for notification.
- resizeNotifyCmdPid (%d)
Resize notification executable process ID. If no resize notification executable is defined, this field will be set to 0.
- resizeNotifyCmdPGid (%d)
Resize notification executable process group ID. If no resize notification executable is defined, this field will be set to 0.
- status (%d)
Status field used to indicate possible errors. 0 Success, 1 failure.
JOB_RESIZE_NOTIFY_DONE
- Version number (%s)
The version number.
- Event time (%d)
The time of the event.
- jobId (%d)
The job ID.
- idx (%d)
Job array index.
- notifyId (%d)
Identifier or handle for notification.
- status (%d)
Resize notification exit value. (0, success, 1, failure, 2 failure but cancel request.)
JOB_RESIZE_RELEASE
- Version number (%s)
The version number.
- Event time (%d)
The time of the event.
- jobId (%d)
The job ID.
- idx (%d)
Job array index.
- reqid (%d)
Request Identifier or handle.
- options (%d)
Release options.
- userId (%d)
UNIX user ID of the user invoking the command.
- userName (%s)
User name of the submitter.
- resizeNotifyCmd (%s)
Resize notification command to run on the first execution host to inform job of a resize event.
- numResizeHosts (%d)
Number of processors used for execution during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in short format.
- resizeHosts (%s)
List of execution host names during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
- numResizeSlots (%d)
- Number of allocated slots for executing resize.
- resizeSlots (%s)
- List of execution host names where slots are allocated for resizing.
JOB_RESIZE_CANCEL
- Version number (%s)
The version number.
- Event time (%d)
The time of the event.
- jobId (%d)
The job ID.
- idx (%d)
Job array index.
- userId (%d)
UNIX user ID of the user invoking the command.
- userName (%s)
User name of the submitter.
HOST_POWER_STATUS
- Version number (%s)
The version number.
- Event time (%d)
The time of the event.
- Request Id (%d)
The power operation request ID to identify a power operation.
- Op Code (%d)
Power operation type.
- Trigger (%d)
The power operation trigger: power policy, job, or badmin hpower.
- Status (%d)
The power operation status.
- Trigger Name (%s)
If the operation is triggered by power policy, this is the power policy name. If the operation is triggered by an administrator, this is the administrator user name.
- Number (%d)
Number of hosts on which the power operation occurred.
- Hosts (%s)
The hosts on which the power operation occurred.
JOB_PROV_HOST
- Version number (%s)
The version number.
- Event time (%d)
The time of the event.
- jobId (%d)
The job ID.
- idx (%d)
Job array index.
- status (%d)
Indicates if the provision has started, is done, or is failed.
- num (%d)
Number of hosts that need to be provisioned.
- hostNameList(%d)
Names of hosts that need to be provisioned.
- hostStatusList(%d)
Host status for provisioning result.
HOST_CLOSURE_LOCK_ID_CTRL
- Version number (%s)
- The version number.
- Event time (%d)
- The time of the event.
- host (%s)
- Host name.
- opCode (%d)
- Operation code, see <lsf/lsbatch.h>
- numLockIds (%d)
- The number of host closure lock IDs.
- lockIds (%s)
- Host closure lock IDs.
- userId (%d)
- UNIX user ID of the user invoking the command.
- userName (%s)
- Name of the user.
- message (%s)
- Administrator comment text from the -C option of badmin host control commands hclose and hopen.
ATTR_CREATE
- Version (%s)
- The version number.
- time (%d)
- The time of the event.
- user (%s)
- Name of the user that created the attributes.
- attrNum (%d)
- The number of attributes that are created.
- attributeList (%s)
- The list of attributes that are created.
- hostNum (%d)
- The number of hosts in which to create the attributes.
- hostList (%s)
- The list of hosts in which to create the attributes.
- desc (%s)
- A description of the attributes.
ATTR_DELETE
- Version (%s)
- The version number.
- time (%d)
- The time of the event.
- user (%s)
- Name of the user that deleted the attributes.
- attrNum (%d)
- The number of attributes that are deleted.
- attributeList (%s)
- The list of attributes that are deleted.
- hostNum (%d)
- The number of hosts from which to delete the attributes.
- hostList (%s)
- The list of hosts from which to delete the attributes.
- desc (%s)
- A description of the attributes.