Follow these steps to configure Microsoft Azure CycleCloud to create instances for
LSF resource connector to make allocation requests on behalf of LSF. The
instances launched from Azure CycleCloud join the LSF
cluster. If instances become idle, LSF
resource connector terminates them.
Before you begin
The Microsoft Azure CycleCloud provider requires you to apply IBM®
Spectrum LSF,
Version 10.1, Fix Pack 9, or later. While applying the Fix Pack, follow the instructions for the
patchinstall command and make sure to manually move the required configuration
files to the appropriate directory under
LSF_TOP/conf/resource_connector/<provider_name>/conf,
and change the ownership of those new files and directories to the cluster administrator.
For more information about applying Fix Packs to LSF
resource connector, see Use the LSF patch installer to update resource connector.
- You must have root access to the LSF management
host.
- The LSF management
host and the compute nodes (Azure instances) must be able to communicate with each other.
- You must be able to restart the LSF
cluster.
- You must be familiar with and have the ability to perform Microsoft Azure and Azure CycleCloud
administrative operations.
- The virtual network to be used by Microsoft Azure virtual instances must be configured so that
they can communicate with LSF
hosts.
Note: This is not required if you are running a simple test application.
- You must have the ability to configure the DNS or NIS server and use short host names on
Azure.
Note: The default DNS server on Azure does not work for LSF
because the Azure instance's full host name, with domain name, usually exceeds 60 characters, which
is too long for LSF.
About this task
LSF
resource connector has been tested on the following systems:
- Linux x86 Kernel 2.6, glibc 2.11 RHEL 6.x
- Linux x86 Kernel 3.10, glibc 2.17 RHEL 7.x
- Linux x86 Kernel 3.10, glibc 2.17 CentOS 7.x
- LSF
10.1.0 Fix Pack 8, or
later
In the following steps, you must perform all operations as the Azure administrator unless
otherwise stated.
For full details on installing LSF, see
Installing
IBM Spectrum
LSF on
UNIX and Linux.
To get the job submitted by a user to run on the instance, the instance
must have this user prepared or LSF user
mapping configured. For more information about user groups and user account mapping, see Managing Users
and User Groups and Between-Host User Account
Mapping.
Procedure
-
Install and set up Microsoft Azure CycleCloud.
-
Create an Azure CycleCloud administrator account for the application server.
-
If you are using a version of Azure CycleCloud that is earlier than 7.9.0, import the LSF
cluster template into Azure CycleCloud.
You do not need to import the LSF
cluster type if you are using Azure CycleCloud, Version 7.9.0, or later, because these versions
already provide the LSF
cluster type.
For more details, refer to Import the LSF
cluster template in the following website: https://github.com/Azure/cyclecloud-lsf
-
Create and configure the LSF
cluster in the Azure Portal.
-
Create an LSF
network security group in the Azure resource group for LSF.
A network security group (NSG) contains a list of security rules that allow or deny network
traffic to resources that are connected to VNets. Add customized rules to open all LSF
listening ports to the security group that launches instances. The ports must match those from the
existing LSF
cluster.
The following are the default port number values:
- LSF_LIM_PORT=7869 (TCP and UDP)
- LSF_RES_PORT=6878 (TCP)
- LSB_SBD_PORT=6882 (TCP)
You can also accept all traffic from the LSF management host by adding IP address of the LSF management
host, or you can accept all traffic across the VNet.
Note: If you allow the traffic only from the default ports, some NIOS commands might not work since
the ports are configured for them dynamically and are different from the LSF
ports.
Add the network security group to the Azure resource group that you created for LSF.
-
Create an instance and install the LSF management
host.
The LSF
cluster administrator must manually launch an Azure instance and install LSF on the
management host.
Enabling dynamic hosts is sufficient for this configuration. Set
ENABLE_DYNAMIC_HOSTS="Y" in the install.config file if
you are installing a new LSF management
host. If you have an existing LSF management
host, manually the configure LSF_DYNAMIC_HOST_WAIT_TIME parameter in the
lsf.conf file and the LSF_HOST_ADDR_RANGE parameter in the
lsf.cluster.clustername file.
Tip: You do not have to configure the resource connector parameters at this time. You
can configure these parameters after you successfully build the LSF cloud
image
For example, specify the following parameters in the install.config
file:
LSF_TOP="/opt/lsf"
LSF_ADMINS="lsfadmin"
LSF_CLUSTER_NAME="cluster1"
LSF_MASTER_LIST="lsfmanagement1 lsfmanagement2"
LSF_ENTITLEMENT_FILE="/root/platform_lsf_std_entitlement.dat"
LSF_TARDIR="/root/"
ENABLE_DYNAMIC_HOSTS="Y"
LSF_DYNAMIC_HOST_WAIT_TIME="2"
-
Build the LSF cloud
image.
To create an Azure instance image for an LSF cloud
compute host, manually launch an Azure instance and install LSF on
that instance.
-
Create the Azure instance and choose the manage disk when you launch the instance.
-
Use the ssh command to log in to the Azure instance and use the Azure Java
SDK authentication file that you created.
-
Copy the LSF
packages to Azure instance
- lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
- lsf10.1_linux2.6-glibc2.3-x86_64-442293.tar.Z (LSF 10.1,
Fix Pack 2)
- lsf10.1_lsfinstall.tar.Z
- lsf_std_entitlement.dat
-
Check that the ed.x86_64 software is installed.
If this package is not installed, use the yum install command to install
it.
yum install ed
-
Install LSF as a
server host on the Azure instance.
For example, edit the server.config file with
the installation options you need:
LSF_TOP="/opt/lsf"
LSF_ADMINS="lsfadmin"
LSF_TARDIR="/opt/install/"
LSF_LICENSE="/opt/install/lsf_std_entitlement.dat"
LSF_SERVER_HOSTS="management1 management2"
LSF_LOCAL_RESOURCES="[resource cyclecloudhost]"
LSF_LIM_PORT="7869"
Run the ./lsfinstall -s -f server.config command to
install the LSF server
host.
After installation, make sure that the
LSF_TOP/conf/lsf.conf file contains the
cyclecloudhost resource.
LSF_CONFDIR=/opt/lsf/conf
LSF_LIM_PORT=7869
LSF_SERVER_HOSTS="management.myserver.com"
LSF_VERSION=10.1
LSF_LOCAL_RESOURCES="[resource cyclecloudhost]"
LSF_TOP=/opt/lsf/
LSF_LOGDIR=/opt/lsf/log
LSF_LOG_MASK=LOG_WARNING
LSF_ENABLE_EGO=N
LSB_ENABLE_HPC_ALLOCATION=Y
LSF_EGO_DAEMON_CONTROL=N
- LSF_LIM_PORT
- The port number must be the same as the one that is defined on the LSF management host.
- LSF_LOCAL_RESOURCES
- The new resource name cyclecloudhost is used by LSF to
identify Azure instances. Use the bhosts -a command to see instances that are
used by LSF.
- Optional:
If required, update the /etc/hosts file to add the management host name to the
/etc/hosts file on the Azure instance, or configure the DNS/NIS client on the
instance when they are used.
You must use the short host name when specifying the management host. The host names of the new
instances that the LSF resource connector creates on Azure follows the
"host-a-b-c-d" format, where "a-b-c-d"
corresponds to the instance's IPv4 address (a.b.c.d). Add these entries
into your custom DNS/NIS server or /etc/hosts file.
-
Manually start the LSF
daemons on the instance and make sure that the instance can join the LSF
cluster as a dynamic host.
Note: Disable LSF_LOCAL_RESOURCES="[resource cyclecloudhost]" in the
lsf.conf before your test, as hosts with LSF
resource connector flagged resources ("cyclecloudhost") are omitted by LSF if it
is not dynamically created by the LSF
resource connector. You must enable this parameter before creating the image.
If the instance cannot join the LSF
cluster as a dynamic host, check the VPN, firewall, or security group settings. Check whether the
management host can ping the instance (and whether the instance can ping the management host) using its
private IP address. If the management host can ping the instance using its IP address but not using the
host name, configure the host name resolution properly.
-
Shut down the LSF
daemons and log out from the instance.
-
Log in to the Azure instance.
-
Deprovision and shut down the instance.
sudo waagent -deprovision+user -force && halt
-
Capture the image.
Run the following Azure commands:
az vm deallocate -g "resource_group_name" -n "instance_name"
az vm generalize -g "resource_group_name" -n "instance_name"
az image create -n "image_name" -g "resource_group_name" --os-type Linux --source "instance_name"
The name of the image (imageId) is required in the
azureccprov_templates.json file for LSF to
decide when borrowing happens, and when instances are launched from which image.
For more details on capturing an image, refer to the Microsoft Azure documentation: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/capture-image
-
Add the image to the Azure resource group that you created for LSF.
After capturing the image, the Azure instance cannot start and use this image again. If you need
re-create the image, create a new instance using the image that you created, or create a new
LSF cloud image.
-
Install the LSF
resource connector for Azure.
-
Log in to the LSF management
host as root.
-
Install Java, Version 1.8, or later.
-
Source the LSF
environment.
- For csh or tsch: source
LSF_TOP/conf/cshrc.lsf
- For sh, ksh, or bash: .
LSF_TOP/conf/profile.lsf
-
Copy the LSF
resource connector package to the LSF management
host and extract the package.
-
From the extracted package directory, copy the contents of the
LSF_VERSION/resource_connector/cyclecloud directory to the
LSF_TOP/LSF_VERSION/resource_connector
directory
-
Create the cyclecloud subdirectory in the
LSF_TOP/conf/resource_connector directory.
-
From the extracted package directory, copy the contents of the
LSF_VERSION/resource_connector/cyclecloud/conf directory to
the LSF_TOP/conf/resource_connector/cyclecloud
directory.
-
Edit the
LSF_TOP/conf/resource_connector/hostProviders.json file and
add an entry for the Azure Cycle Cloud provider.
-
Replace the ebrokerd executable file in the LSF_SERVERDIR
directory with the ebrokerd executable file in the extracted package.
-
Run the badmin mbdrestart command to apply the changes