lsf.shared
The lsf.shared file contains common definitions that are shared by
all load sharing clusters defined by
lsf.cluster.cluster_name files. This includes lists of
cluster names, host types, host models, the special resources available, and external load indices,
including indices required to submit jobs using JSDL files. This file is installed by default in the
directory defined by LSF_CONFDIR
.
Changing lsf.shared configuration
- lsadmin reconfig to reconfigure the
LIM
daemon - badmin mbdrestart to restart the
mbatchd
daemon
#INCLUDE
Syntax
#INCLUDE "path-to-file"
Description
Inserts a configuration setting from another file to the current location. Use this directive to dedicate control of a portion of the configuration to other users or user groups by providing write access for the included file to specific users or user groups, and to ensure consistency of configuration file settings in different clusters (if you are using the LSF multicluster capability).
For more information, see Shared configuration file content.
All #INCLUDE
lines must be inserted at the beginning of the local configuration
file. If placed within or after any other sections, LSF
reports an error.
Cluster section
(Required) Lists the cluster names recognized by the LSF system
Cluster section structure
The first line must contain the mandatory keyword ClusterName
. The other
keywords are optional.
The first line must contain the mandatory keyword ClusterName
and the keyword
Servers
when using the LSF multicluster
capability.
Each subsequent line defines one cluster.
Example Cluster section
Begin Cluster
ClusterName Servers
cluster1 (hostA hostB)
cluster2 (hostD)
End Cluster
ClusterName
Defines all cluster names recognized by the LSF system.
All cluster names referenced anywhere in the LSF system must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.
By default, if the LSF multicluster
capability is
installed, all clusters listed in this section participate in the same multicluster environment.
However, individual clusters can restrict their multicluster participation by specifying a subset of
clusters at the cluster level (lsf.cluster.cluster_name
RemoteClusters
section).
Servers
LSF multicluster
capability only.
List of hosts in this cluster that the instances of LIM
in remote clusters can
connect to and from which they can obtain information.
For other clusters to work with this cluster, one of these hosts must be running
mbatchd
daemon.
HostType section
(Required) Lists the valid host types in the cluster. All hosts that can run the same binary executable are in the same host type.
HostType
section, the functionality of lspasswd.exe is affected. The
lspasswd command registers a password for a Windows user account.HostType section structure
The first line consists of the mandatory keyword TYPENAME
.
Subsequent lines name valid host types.
Example HostType section
Begin HostType
TYPENAME
SOL64
SOLSPARC
LINUX86LINUXPPC
LINUX64
NTX86
NTX64
NTIA64
End HostType
TYPENAME
Host type names are usually based on a combination of the hardware name and operating system. If your site already has a system for naming host types, you can use the same names for LSF.
HostModel section
(Required) Lists models of machines and gives the relative CPU scaling factor for each model. All hosts of the same relative speed are assigned the same host model.
LSF uses the relative CPU scaling factor to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The CPU factor affects the calculation of job execution time limits and accounting. Using large or inaccurate values for the CPU factor can cause confusing results when CPU time limits or accounting are used.
HostModel section structure
The first line consists of the mandatory keywords MODELNAME
,
CPUFACTOR
, and ARCHITECTURE
.
Subsequent lines define a model and its CPU factor.
Example HostModel section
Begin HostModel MODELNAME CPUFACTOR ARCHITECTURE
PC400 13.0 (i86pc_400 i686_400)
PC450 13.2 (i86pc_450 i686_450)
Sparc5F 3.0 (SUNWSPARCstation5_170_sparc)
Sparc20 4.7 (SUNWSPARCstation20_151_sparc)
Ultra5S 10.3 (SUNWUltra5_270_sparcv9 SUNWUltra510_270_sparcv9)
End HostModel
ARCHITECTURE
(Reserved for system use only) Indicates automatically detected host models that correspond to the model names.
CPUFACTOR
Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.
MODELNAME
Generally, you need to identify the distinct host types in your system, such as MIPS and SPARC first, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.
Automatically detected host models and types
When you first install LSF, you
do not necessarily need to assign models and types to hosts in
lsf.cluster.cluster_name. If you do not assign models and
types to hosts in lsf.cluster.cluster_name,
LIM
automatically detects the model and type for the host.
If you have versions earlier than LSF 4.0, you may have host models and types already assigned to hosts. You can take advantage of automatic detection of host model and type also.
Automatic detection of host model and type is useful because you no longer need to make changes in the configuration files when you upgrade the operating system or hardware of a host and reconfigure the cluster. LSF will automatically detect the change.
Mapping to CPU factors
Automatically detected models are mapped to the short model names in
lsf.shared in the ARCHITECTURE
column. Model strings in the
ARCHITECTURE
column are only used for mapping to the short model names.
Begin HostModel
MODELNAME CPUFACTOR ARCHITECTURE
SparcU5 5.0 (SUNWUltra510_270_sparcv9)
PC486 2.0 (i486_33 i486_66)
PowerPC 3.0 (PowerPC12 PowerPC16 PowerPC31)
End HostModel
If an automatically detected host model cannot be matched with the short model name, it is matched to the best partial match and a warning message is generated.
If a host model cannot be detected or is not supported, it is assigned the
DEFAULT
model name and an error message is generated.
Naming convention
hardware_platform [_processor_speed[_processor_type]]
where:- hardware_platform is the only mandatory component
- processor_speed is the optional clock speed and is used to differentiate computers within a single platform
- processor_type is the optional processor manufacturer used to differentiate processors with the same speed
- Underscores (_) between hardware_platform, processor_speed, processor_type are mandatory.
Resource section
Optional. Defines resources (must be completed by the LSF administrator).
Resource section structure
The first line consists of the keywords. RESOURCENAME
and
DESCRIPTION
are mandatory. The other keywords are optional. Subsequent lines define
resources.
Example Resource section
Begin Resource
RESOURCENAME TYPE INTERVAL INCREASING CONSUMABLE DESCRIPTION # keywords
patchrev Numeric () Y () (Patch revision)
specman Numeric () N () (Specman)
switch Numeric () Y N (Network Switch)
rack String () () () (Server room rack)
owner String () () () (Owner of the host)
elimres Numeric 10 Y () (elim generated index)
ostype String () () () (Operating system and version)
lmhostid String () () () (FlexLM's lmhostid)
limversion String () () () (Version of LIM binary)
End Resource
RESOURCENAME
- A resource name cannot begin with a number.
- A resource name cannot contain any of the following special characters:
: . ( ) [ + - * / ! & | < > @ = ,
- A resource name cannot be any of the following reserved
names:
cpu cpuf io logins ls idle maxmem maxswp maxtmp type model status it mem ncpus define_ncpus_cores define_ncpus_procs define_ncpus_threads ndisks pg r15m r15s r1m swap swp tmp ut
- To avoid conflict with
inf
andnan
keywords in third-party libraries, resource names should not begin withinf
or nan (upper case or lower case). Resource requirement strings, such as-R "infra"
or-R "nano"
will cause an error. Use-R "defined(infxx)"
or-R "defined(nanxx)"
, to specify these resource names. - Resource names are case sensitive.
- Resource names can be up to 39 characters in length.
- For Solaris machines, the keyword
int
is reserved and cannot be used.
TYPE
Boolean
: Resources that have a value of one on hosts that have the resource and zero otherwise.Numeric
: Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor.String
: Resources that take string values, such as host type, host model, host status.
Default
If TYPE
is not given, the default type is Boolean
.
INTERVAL
Optional. Applies to dynamic resources only.
Defines the time interval (in seconds) at which the resource is sampled by the
ELIM
.
If INTERVAL
is defined for a numeric resource, it becomes an external load
index.
Default
If INTERVAL
is not given, the resource is considered static.
INCREASING
Applies to numeric resources only.
If a larger value means greater load, INCREASING
should be defined as
Y. If a smaller value means greater load, INCREASING
should
be defined as N.
CONSUMABLE
Explicitly control if a resource is consumable. Applies to static or dynamic numeric resources.
CONSUMABLE
is optional. The defaults for the consumable attribute are:- Built-in indexes:
- The following are consumable:
r15s
,r1m
,r15m
,ut
,pg
,io
,ls
,it
,tmp
,swp
,mem
. - All other built-in static resources are not consumable, for example:
ncpus
,ndisks
,maxmem
,maxswp
,maxtmp
,cpuf
,type
,model
,status
,rexpri
,server
,hname
).
- The following are consumable:
- External shared resources:
- All numeric resources are consumable.
- String and boolean resources are not consumable.
You should only specify consumable resources in the rusage
section of a
resource requirement string. Non-consumable resources are ignored in rusage
sections.
A non-consumable resource should
not be releasable. Non-consumable numeric resource should be able
to used in order
, select
and same
sections
of a resource requirement string.
LSF
rejects resource requirement strings where an rusage
section contains a
non-consumable resource.
DESCRIPTION
A brief description of the resource.
The information defined here will be returned by the ls_info() API call or printed out by the lsinfo command as an explanation of the meaning of the resource.
RELEASE
Applies to numeric shared resources only.
Controls whether LSF releases the resource when a job using the resource is suspended. When a job using a shared resource is suspended, the resource is held or released by the job depending on the configuration of this parameter.
Specify N to hold the resource, or specify Y to release the resource.
Default
Y