IBM Spectrum LSF quick reference
Quick reference to LSF commands, daemons, configuration files, log files, and important cluster configuration parameters.
Sample UNIX and Linux installation directories
Daemon error log files
Daemon error log files are stored in the directory that is defined by LSF_LOGDIR in the lsf.conf file.
LSF base system daemon log files | LSF batch system daemon log files |
---|---|
pim.log.host_name | mbatchd.log.host_name |
res.log.host_name | sbatchd.log.host_name |
lim.log.host_name | mbschd.log.host_name |
If the EGO_LOGDIR parameter is defined in the ego.conf file, the lim.log.host_name file is stored in the directory that is defined by the EGO_LOGDIR parameter.
Configuration files
The lsf.conf, lsf.shared, and lsf.cluster.cluster_name files are located in the directory that is specified by the LSF_CONFDIR parameter in the lsf.conf file.
The lsb.params, lsb.queues, lsb.modules, and lsb.resources files are located in the LSB_CONFDIR/cluster_name/configdir/ directory.
File | Description |
---|---|
install.config | Options for LSF installation and configuration |
lsf.conf | Generic environment configuration file that describes the configuration and operation of the cluster |
lsf.shared | Definition file that is shared by all clusters. Used to define cluster name, host types, host models, and site-defined resources |
lsf.cluster.cluster_name | Cluster configuration files that are used to define hosts, administrators, and locality of site-defined shared resources |
lsb.applications | Defines application profiles to define common parameters for the same types of jobs |
lsb.params | Configures LSF batch parameters |
lsb.queues | Batch queue configuration file |
lsb.resources | Configures resource allocation limits, exports, and resource usage limits |
lsb.serviceclasses | Defines service-level agreements (SLAs) in an LSF cluster as service classes, which define the properties of the SLA |
lsb.users | Configures user groups, hierarchical fair share for users and user groups, and job slot limits for users and user groups |
Cluster configuration parameters in the lsf.conf file
Parameter | Description | UNIX Default |
---|---|---|
LSF_BINDIR | Directory containing LSF user commands, which are shared by all hosts of the same type | LSF_TOP/version/OStype/bin |
LSF_CONFDIR | Directory for all LSF configuration files | LSF_TOP/conf |
LSF_ENVDIR | Directory containing the lsf.conf file. Must be owned by root. | /etc (if LSF_CONFDIR is not defined) |
LSF_INCLUDEDIR | Directory containing LSF API header files lsf.h and lsbatch.h | LSF_TOP/version/include |
LSF_LIBDIR | LSF libraries, which are shared by all hosts of the same type | LSF_TOP/version/OStype/lib |
LSF_LOGDIR | (Optional) Directory for LSF daemon logs. Must be owned by root. | /tmp |
LSF_LOG_MASK | Logging level of error messages from LSF commands | LOG_WARNING |
LSF_MANDIR | Directory containing LSF man pages | LSF_TOP/version/man |
LSF_MISC | Sample C programs and shell scripts, and a template for an external LIM (elim) | LSF_TOP/version/misc |
LSF_SERVERDIR | Directory for all server binary files and shell scripts, and external executable files that are started by LSF daemons, must be owned by root, and shared by all hosts of the same type | LSF_TOP/version/OStype/etc |
LSF_TOP | Top-level installation directory. The path to LSF_TOP must be shared and accessible to all hosts in the cluster. It cannot be the root directory (/). | Not defined Required for installation |
LSB_CONFDIR | Directory for LSF Batch configuration directories, containing user and host lists, operation parameters, and batch queues | LSF_CONFDIR/lsbatch |
LSF_LIVE_CONFDIR | Directory for LSF live reconfiguration directories that are written by the bconf command. | LSB_SHAREDIR/cluster_name/live_confdir |
LSF_SHAREDIR | Directory for LSF batch job history and accounting log files for each cluster, must be owned by primary LSF administrator | LSF_TOP/work |
LSF_LIM_PORT | TCP service port that is used for communication with the lim daemon | 7879 |
LSF_RES_PORT | TCP service port that is used for communication with the res daemon | 6878 |
LSF_MBD_PORT | TCP service port that is used for communication with the mbatchd daemon | 6881 |
LSF_SBD_PORT | TCP service port that is used for communication with the sbatchd daemon | 6882 |
Administration and accounting commands
Only LSF administrators and root users can use these commands.
Command | Description |
---|---|
lsadmin | LSF administrator tool to control the operation of the LIM and RES daemons in an LSF cluster, lsadmin help shows all subcommands |
lsfinstall | Install LSF with the install.config input file |
lsfrestart | Restart the LSF daemons on all hosts in the local cluster |
lsfshutdown | Shut down the LSF daemons on all hosts in the local cluster |
lsfstartup | Start the LSF daemons on all hosts in the local cluster |
badmin | LSF administrative tool to control the operation of the LSF batch system (sbatchd, mbatchd, hosts, and queues) badmin help shows all subcommands |
bconf | Changes LSF configuration in active memory |
Daemons
Daemon Name | Description |
---|---|
lim | Load Information Manager (LIM): collects load and resource information about all server hosts in the cluster and provides host selection services to applications through LSLIB. LIM maintains information on static system resources and dynamic load indexes |
mbatchd | Management Batch Daemon (MBD): accepts and holds all batch jobs. MBD periodically checks load indexes on all server hosts by contacting the management host LIM. |
mbschd | Management Batch Scheduler Daemon: performs the scheduling functions of LSF and sends job scheduling decisions to MBD for dispatch. Runs on the LSF management host |
sbatchd | Server Batch Daemon (SBD): accepts job execution requests from MBD, and monitors the progress of jobs. Controls job execution, enforces batch policies, reports job status to MBD, and starts MBD. |
pim | Process Information Manager (PIM): monitors resources that are used by submitted jobs while they are running. PIM is used to enforce resource limits and load thresholds, and for fair share scheduling |
res | Remote Execution Server (RES): accepts remote execution requests from all load-sharing applications and handles I/O on the remote host for load sharing processes. |
User commands
Command | Description |
---|---|
bhosts | Displays hosts and their static and dynamic resources |
blimits | Displays information about resource allocation limits of running jobs |
bparams | Displays information about tunable batch system parameters |
bqueues | Displays information about batch queues |
busers | Displays information about users and user groups |
lshosts | Displays hosts and their static resource information |
lsid | Displays the current LSF version number, cluster name, and management host name |
lsinfo | Displays load-sharing configuration information |
lsload | Displays dynamic load indexes for hosts |
Command | Description |
---|---|
bacct | Reports accounting statistics on completed LSF jobs |
bapp | Displays information about jobs that are attached to application profiles |
bhist | Displays historical information about jobs |
bjobs | Displays information about jobs |
bpeek | Displays stdout and stderr of unfinished jobs |
bsla | Displays information about service class configuration for goal-oriented service-level agreement scheduling |
bstatus | Reads or sets external job status messages and data files |
Command | Description |
---|---|
bbot | Moves a pending job relative to the last job in the queue |
bchkpnt | Checkpoints a checkpoint-able job |
bkill | Sends a signal to a job |
bmig | Migrates a checkpoint-able or re-runnable job |
bmod | Modifies job submission options |
brequeue | Kills and re-queues a job |
bresize | Releases slots and cancels pending job resize allocation requests |
brestart | Restarts a check-pointed job |
bresume | Resumes a suspended job |
bstop | Suspends a job |
bsub | Submits a job |
bswitch | Moves unfinished jobs from one queue to another |
btop | Moves a pending job relative to the first job in the queue |
bsub command
Selected options for the bsub [options] command[arguments] command
Option | Description |
---|---|
-ar | Specifies the job is auto-resizable |
-H | Holds the job in the PSUSP state at submission |
-I|-Ip|-Is | Submits a batch interactive job. -Ip creates a pseudo-terminal. -Is creates a pseudo-terminal in shell mode. |
-K | Submits a job and waits for the job to finish |
-r | Makes a job re-runnable |
-x | Exclusive execution |
-app application_profile_name | Submits the job to the specified application profile |
-b begin_time | Dispatches the job on or after the specified date and time in the form [[month:]day:]:minute |
-C core_limit | Sets a per-process (soft) core file size limit (KB) for all the processes that belong to this job |
-c cpu_time[/host_name | /host_model] | Limits the total CPU time the job can use. CPU time is in the form [hour:]minutes |
-cwd "current_working_directory" | Specifies the current working directory for the job |
-D data_limit |
Sets the per-process (soft) data segment size limit (KB) for each process that belongs to the job |
-E "pre_exec_command [arguments]" | Runs the specified pre-exec command on the execution host before the job runs |
-Ep "post_exec_command [arguments]" | Runs the specified post-exec command on the execution host after the job finishes |
-e error_file | Appends the standard error output to a file |
-eo error_file | Overwrites the standard error output of the job to the specified file |
-F file_limit | Sets per-process (soft) file size limit (KB) for each process that belongs to the job |
-f "local_file op[remote_file]" ... | Copies a file between the local (submission) host and remote (execution) host. op is one of >, <, <<, ><, <> |
-i input_file | -is input_file | Gets the standard input for the job from specified file |
-J "job_name[index_list]%job_slot_limit" | Assigns the specified name to the job. Job array index_list has the form start[-end[:step]], and %job_slot_limit is the maximum number of jobs that can run at the same time. |
-k "chkpnt_dir [chkpnt_period][method=method_name]" | Makes a job checkpoint-able and specifies the checkpoint directory, period in minutes, and method |
-M mem_limit | Sets the per-process (soft) memory limit (KB) |
-m "host_name [@cluster_name][[!] | +[pref_level]] | host_group[[!] |+[pref_level]] | compute_unit[[!] |+[pref_level]]..." | Runs job on one of the specified hosts. Plus (+) after the names of a host or group indicates a preference. Optionally, a positive integer indicates a preference level. Higher numbers indicate a greater preference. |
-n min_proc[,max_proc] | Specifies the minimum and maximum numbers of processors that are required for a parallel job |
-o output_file | Appends the standard output to a file |
-oo output_file | Overwrites the standard output of the job to the specified file |
-p process_limit | Limit the number of processes for the whole job |
-q "queue_name ..." | Submits job to one of the specified queues |
-R "res_req" [-R "res_req" ...] | Specifies host resource requirements |
-S stack_limit | Sets a per-process (soft) stack segment size limit (KB) for each process that belongs to the job |
-sla service_class_name | Specifies the service class where the job is to run |
-T thread_limit | Sets the limit of the number of concurrent threads for the whole job |
-t term_time | Specifies the job termination deadline in the form [[month:]day:]hour:minute |
-v swap_limit | Sets the total process virtual memory limit (KB) for the whole job |
-W run_time[/host_name | /host_model] | Sets the runtime limit of the job in the form [hour:]minute |
-h | Prints command usage to stderr and exits |
-V | Prints LSF release version to stderr and exits |