ibmcloudgen2_templates.json

The ibmcloudgen2_templates.json file defines the mapping between LSF resource demand requests and Cloud VPC Gen 2.

The template represents a set of hosts that share some attributes such as the number of CPUs, the amount of available memory, the installed software stack, operating system, and other attributes.

LSF requests resources from the resource connector by specifying the number of instances of a particular template that it requires to satisfy its demand. The resource connector uses the definitions in this file to map this demand into a set of allocation requests in Cloud VPC Gen 2.

The default location for the file is <LSF_TOP>/conf/resource_connector/ibmcloudgen2/conf/ibmcloudgen2_templates.json.

Description

LSF requests resources from the resource connector by specifying the number of instances of a particular template that it requires to satisfy its demand. The resource connector uses the definitions in this file to map this demand into a set of allocation requests in Cloud VPC Gen 2.

Important: When you define templates, you must make sure that the attribute definitions that are presented to LSF exactly match the definitions that are provided by Cloud VPC Gen 2. If, for example, the attribute definition specifies hosts with ncpus=4, but the actual hosts that are returned by Cloud VPC Gen 2 report ncpus=2, the demand calculation in LSF is not accurate.

Parameters

The file contains a JSON-defined list that is named templates. Each template in the list is an object that contains the following parameters:
templateId
The unique template name.
maxNumber
The maximum number of available instances of this template. LSF never requests more than this number of instances for this template. Set this parameter to an appropriate value according to the instance quota of the project or the maximum capacity of the Cloud VPC Gen 2 infrastructure.
attributes
A list of VM attributes that represent the hosts in the template that are used by LSF for scheduling. LSF attempts to place its pending workload on hosts that match these attributes to calculate how many instances of each template to request.
Each attribute string in the list is a JSON array of two elements. The attribute string has the following format:
"attribute_name": ["attribute_type", "attribute_value"]
attribute_name
An LSF resource name, for example, type or ncores.

The attribute name must either be a built-in resource (such r15s or type), or defined in the Resource section in the lsf.shared file on the LSF management host.

attribute_type
Can be either Boolean, String, or Numeric and must correspond to the corresponding resource definition in the lsf.shared file.
attribute_value
The value of the resource that is provided by hosts. For Boolean resources, use 1 to define the presence of the resource and 0 to define its absence. For Numeric resources, specify a range that uses [min:max].

Depending on your cloud provider, various attributes are supported in the template.

The following attributes have default values if they are not defined:
type
The default value is given by the setting of the LSB_RC_DEFAULT_HOST_TYPE in the lsf.conf file. The default value of LSB_RC_DEFAULT_HOST_TYPE is X86_64.
ncpus
The number of CPU cores per CPU in machine types described by the template. The default value is 1.
Take note of these attributes:
gpuextend
Optional. A string that represents the GPU topology on the template host.

This attribute value is in the following format:

"key1=value1;key2=value2;..."
The following keys are supported in this attribute:
ngpus
Total number of GPUs. This must be defined either as a key in gpuextend or defined as a separate attribute. If it is defined in both places, the key value in gpuextend takes precedence.
nnumas
Total number of NUMA nodes. The default value is 1.
gbrand
The GPU brand. This value is case sensitive, and supports NVIDIA GPUs. For a list of GPU brands and models, run the nvidia-sml -L command.

For example, for Tesla K80, the GPU brand is Tesla.

gmodel
The GPU model. This value is case sensitive, and supports NVIDIA GPUs. For a list of GPU brands and models, run the nvidia-sml -L command.

For example, for Tesla K80, the GPU model is K80.

gmem
The total GPU memory, in MB.
nvlink
Specifies whether the GPU supports NVLink. Valid values (case insensitive) are y, n, yes, no.
imageId
The image name of the Cloud VPC Gen 2 image that LSF is installed on. The image name is used to launch virtual machine instances of this template.
subnetId
Optional. The unique VPC subnet name. This attribute is used together with region when you use a VPC where the subnet creation mode is Custom. It is not needed if you use a VPC with Automatic subnet creation mode. Use the subnet through which the instance can communicate with the LSF cluster.
vpcId
Optional. Unique name of the virtual private cloud (VPC) that the instances use. If not specified, the default VPC is used. The default VPC is created automatically when you create a new project
resourceGroupId
Optional. The unique ID of the resource group in which the instances are created. If not specified, the default resource group is used. To get the resource group ID or to create a new resource group, go to the IBM Cloud website and navigate to Manage > Account > Account resources > Resource groups (https://cloud.ibm.com/account/resource-groups).
vmType

Unique name of Cloud VPC Gen 2 machine type.

securityGroupIds
Optional. A list of strings for Cloud VPC Gen 2 security groups that are applied to instances. If you don't specify securityGroupIds, Cloud VPC Gen 2 uses the default group.
sshkey_id
The RSA ssh key.
region
Optional. The unique name of the region. This attribute is used together with subnetId when you use a VPC where the subnet creation mode is Custom. It is not needed if you use a VPC with Automatic subnet creation mode.
zone
The unique name of the zone that the instances are created in. For example, us-south-1.
priority
By default, LSF sorts candidate template hosts by template name. However, an administrator might want to sort them by priority, so LSF favors one template to the other. The priority attribute has been added. LSF will use higher priority templates first (for example, less expensive templates should be assigned higher priorities).
The default value of priority is 0, which means the lowest priority. If template hosts have the same priority, LSF sorts them by template name.

Example ibmcloudgen2_templates.json file

{
"templates": [
{
    "templateId": "CENTOS-Template-NGVM-1",
    "maxNumber": 100,
    "attributes": {
        "type": ["String", "X86_64"],
        "ncores": ["Numeric", "4"],
        "ncpus": ["Numeric", "1"],
        "mem": ["Numeric", "8192"],
        "icgen2host": ["Boolean", "1"]
        "pricing": ["String", "ondemand"],
        "computeUnit": ["String", "encl_3"]
    },
    "imageId": "r006-931515d2-fcc3-11e9-896d-3baa2797200f",
    "subnetId": "0717-a16bb490-c5ab-4fa3-bf18-0298694df8c6",
    "vpcId": "r006-e5c03fe9-42db-4995-bff9-7b7cfbbec7e9",
    "resourceGroupId": "0d4b98578d5a474489b928cae03767d0",
    "vmType": "bx2-8x32",
    "securityGroupIds": ["r006-e2a0df77-5efc-4da3-932f-d30aa9c1cdd2"],
    "sshkey_id": "r006-73ace4ad-7fc2-4ee9-b00e-066f563e4833",
    "region": "us-south",
    "zone": "us-south-1"
}
]
}

The example defines a template named CENTOS-Template-NGVM-1. LSF attempts to place any pending workload on Cloud VPC Gen 2 instances that match the template of type X86_64 with the required ncpus, mem, and other resources. LSF uses the default VPC, and all instances are created in the us-south-1 zone. For example, if some pending workload is placed on on N number of hosts, LSF requests N instances of CENTOS-Template-NGVM-1 from the resource connector. If demand is generated for this template, the connector attempts to launch virtual machines, up to the defined maximum of 100. The launch command succeeds for any of the instances (even if there are fewer than requested), the connector informs LSF that it can use the allocated hosts.

Important: When you define templates, you must ensure that the attribute definitions presented to LSF accurately match those provided by Cloud VPC Gen 2. If, for example, the attribute definition specifies hosts with ncpus=4, but the actual hosts returned by LIM detects ncpus=2, the demand calculation in LSF might not be accurate.