Configuring Microsoft Azure CycleCloud for LSF resource connector

Follow these steps to configure Microsoft Azure CycleCloud to create instances for LSF resource connector to make allocation requests on behalf of LSF. The instances launched from Azure CycleCloud join the LSF cluster. If instances become idle, LSF resource connector terminates them.

Before you begin

The Microsoft Azure CycleCloud provider requires you to apply IBM® Spectrum LSF, Version 10.1, Fix Pack 9, or later. While applying the Fix Pack, follow the instructions for the patchinstall command and make sure to manually move the required configuration files to the appropriate directory under LSF_TOP/conf/resource_connector/<provider_name>/conf, and change the ownership of those new files and directories to the cluster administrator.

For more information about applying Fix Packs to LSF resource connector, see Use the LSF patch installer to update resource connector.

  • You must have root access to the LSF management host.
  • The LSF management host and the compute nodes (Azure instances) must be able to communicate with each other.
  • You must be able to restart the LSF cluster.
  • You must be familiar with and have the ability to perform Microsoft Azure and Azure CycleCloud administrative operations.
  • The virtual network to be used by Microsoft Azure virtual instances must be configured so that they can communicate with LSF hosts.
    Note: This is not required if you are running a simple test application.
  • You must have the ability to configure the DNS or NIS server and use short host names on Azure.
    Note: The default DNS server on Azure does not work for LSF because the Azure instance's full host name, with domain name, usually exceeds 60 characters, which is too long for LSF.

About this task

LSF resource connector has been tested on the following systems:
  • Linux x86 Kernel 2.6, glibc 2.11 RHEL 6.x
  • Linux x86 Kernel 3.10, glibc 2.17 RHEL 7.x
  • Linux x86 Kernel 3.10, glibc 2.17 CentOS 7.x
  • LSF 10.1.0 Fix Pack 8, or later

In the following steps, you must perform all operations as the Azure administrator unless otherwise stated.

For full details on installing LSF, see Installing IBM Spectrum LSF on UNIX and Linux.

To get the job submitted by a user to run on the instance, the instance must have this user prepared or LSF user mapping configured. For more information about user groups and user account mapping, see Managing Users and User Groups and Between-Host User Account Mapping.

Procedure

  1. Install and set up Microsoft Azure CycleCloud.

    For more information on installing Azure CycleCloud, refer to Install and Setup Azure CycleClour in the following website: https://docs.microsoft.com/en-us/azure/cyclecloud/quickstart-install-cyclecloud.

    The virtual network and SSH Keypair must be ready and available for LSF.

  2. Create an Azure CycleCloud administrator account for the application server.

    For more information on creating an Azure CycleCloud administrator account, refer to Log into the CycleCloud Application Server in the following website: https://docs.microsoft.com/en-us/azure/cyclecloud/quickstart-install-cyclecloud.

    The administrator account must have permission to access the CycleCloud RESTful API.

  3. If you are using a version of Azure CycleCloud that is earlier than 7.9.0, import the LSF cluster template into Azure CycleCloud.

    You do not need to import the LSF cluster type if you are using Azure CycleCloud, Version 7.9.0, or later, because these versions already provide the LSF cluster type.

    For more details, refer to Import the LSF cluster template in the following website: https://github.com/Azure/cyclecloud-lsf

  4. Create and configure the LSF cluster in the Azure Portal.

    For more details, refer to Configure the cluster in the UI in the following website: https://github.com/Azure/cyclecloud-lsf

    When creating the new LSF cluster, complete the Required Settings and Advanced Settings pages as described in the following website: https://docs.microsoft.com/en-us/azure/cyclecloud/quickstart-create-and-run-cluster

  5. Create an LSF network security group in the Azure resource group for LSF.

    A network security group (NSG) contains a list of security rules that allow or deny network traffic to resources that are connected to VNets. Add customized rules to open all LSF listening ports to the security group that launches instances. The ports must match those from the existing LSF cluster.

    The following are the default port number values:
    • LSF_LIM_PORT=7869 (TCP and UDP)
    • LSF_RES_PORT=6878 (TCP)
    • LSB_SBD_PORT=6882 (TCP)

    You can also accept all traffic from the LSF management host by adding IP address of the LSF management host, or you can accept all traffic across the VNet.

    Note: If you allow the traffic only from the default ports, some NIOS commands might not work since the ports are configured for them dynamically and are different from the LSF ports.

    Add the network security group to the Azure resource group that you created for LSF.

  6. Create an instance and install the LSF management host.

    The LSF cluster administrator must manually launch an Azure instance and install LSF on the management host.

    Enabling dynamic hosts is sufficient for this configuration. Set ENABLE_DYNAMIC_HOSTS="Y" in the install.config file if you are installing a new LSF management host. If you have an existing LSF management host, manually the configure LSF_DYNAMIC_HOST_WAIT_TIME parameter in the lsf.conf file and the LSF_HOST_ADDR_RANGE parameter in the lsf.cluster.clustername file.

    Tip: You do not have to configure the resource connector parameters at this time. You can configure these parameters after you successfully build the LSF cloud image

    For example, specify the following parameters in the install.config file:

    LSF_TOP="/opt/lsf"
    LSF_ADMINS="lsfadmin"
    LSF_CLUSTER_NAME="cluster1"
    LSF_MASTER_LIST="lsfmanagement1 lsfmanagement2"                  
    LSF_ENTITLEMENT_FILE="/root/platform_lsf_std_entitlement.dat"
    LSF_TARDIR="/root/"
    ENABLE_DYNAMIC_HOSTS="Y"
    LSF_DYNAMIC_HOST_WAIT_TIME="2"
  7. Build the LSF cloud image.

    To create an Azure instance image for an LSF cloud compute host, manually launch an Azure instance and install LSF on that instance.

    1. Create the Azure instance and choose the manage disk when you launch the instance.

      For more information on how to create a Linux virtual machine in the Azure portal, refer to the Microsoft Azure documentation: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal.

    2. Use the ssh command to log in to the Azure instance and use the Azure Java SDK authentication file that you created.
    3. Copy the LSF packages to Azure instance
      • lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
      • lsf10.1_linux2.6-glibc2.3-x86_64-442293.tar.Z (LSF 10.1, Fix Pack 2)
      • lsf10.1_lsfinstall.tar.Z
      • lsf_std_entitlement.dat
    4. Check that the ed.x86_64 software is installed.

      If this package is not installed, use the yum install command to install it.

      yum install ed
    5. Install LSF as a server host on the Azure instance.

      For example, edit the server.config file with the installation options you need:

      LSF_TOP="/opt/lsf"
      LSF_ADMINS="lsfadmin"
      LSF_TARDIR="/opt/install/"
      LSF_LICENSE="/opt/install/lsf_std_entitlement.dat"
      LSF_SERVER_HOSTS="management1 management2"
      LSF_LOCAL_RESOURCES="[resource cyclecloudhost]"
      LSF_LIM_PORT="7869"

      Run the ./lsfinstall -s -f server.config command to install the LSF server host.

      After installation, make sure that the LSF_TOP/conf/lsf.conf file contains the cyclecloudhost resource.
      LSF_CONFDIR=/opt/lsf/conf
      LSF_LIM_PORT=7869
      LSF_SERVER_HOSTS="management.myserver.com"
      LSF_VERSION=10.1
      LSF_LOCAL_RESOURCES="[resource cyclecloudhost]"
      LSF_TOP=/opt/lsf/
      LSF_LOGDIR=/opt/lsf/log
      LSF_LOG_MASK=LOG_WARNING
      LSF_ENABLE_EGO=N
      LSB_ENABLE_HPC_ALLOCATION=Y
      LSF_EGO_DAEMON_CONTROL=N
      
      LSF_LIM_PORT
      The port number must be the same as the one that is defined on the LSF management host.
      LSF_LOCAL_RESOURCES
      The new resource name cyclecloudhost is used by LSF to identify Azure instances. Use the bhosts -a command to see instances that are used by LSF.
    6. Optional: If required, update the /etc/hosts file to add the management host name to the /etc/hosts file on the Azure instance, or configure the DNS/NIS client on the instance when they are used.

      You must use the short host name when specifying the management host. The host names of the new instances that the LSF resource connector creates on Azure follows the "host-a-b-c-d" format, where "a-b-c-d" corresponds to the instance's IPv4 address (a.b.c.d). Add these entries into your custom DNS/NIS server or /etc/hosts file.

    7. Manually start the LSF daemons on the instance and make sure that the instance can join the LSF cluster as a dynamic host.
      Note: Disable LSF_LOCAL_RESOURCES="[resource cyclecloudhost]" in the lsf.conf before your test, as hosts with LSF resource connector flagged resources ("cyclecloudhost") are omitted by LSF if it is not dynamically created by the LSF resource connector. You must enable this parameter before creating the image.

      If the instance cannot join the LSF cluster as a dynamic host, check the VPN, firewall, or security group settings. Check whether the management host can ping the instance (and whether the instance can ping the management host) using its private IP address. If the management host can ping the instance using its IP address but not using the host name, configure the host name resolution properly.

    8. Shut down the LSF daemons and log out from the instance.
    9. Log in to the Azure instance.
    10. Deprovision and shut down the instance.
      sudo waagent -deprovision+user -force && halt
    11. Capture the image.

      Run the following Azure commands:

      az vm deallocate -g "resource_group_name" -n "instance_name"
      az vm generalize -g "resource_group_name" -n "instance_name"
      az image create -n "image_name" -g "resource_group_name" --os-type Linux --source "instance_name"

      The name of the image (imageId) is required in the azureccprov_templates.json file for LSF to decide when borrowing happens, and when instances are launched from which image.

      For more details on capturing an image, refer to the Microsoft Azure documentation: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/capture-image

    12. Add the image to the Azure resource group that you created for LSF.

      After capturing the image, the Azure instance cannot start and use this image again. If you need re-create the image, create a new instance using the image that you created, or create a new LSF cloud image.

  8. Install the LSF resource connector for Azure.
    1. Log in to the LSF management host as root.
    2. Install Java, Version 1.8, or later.
    3. Source the LSF environment.
      • For csh or tsch: source LSF_TOP/conf/cshrc.lsf
      • For sh, ksh, or bash: . LSF_TOP/conf/profile.lsf
    4. Copy the LSF resource connector package to the LSF management host and extract the package.
    5. From the extracted package directory, copy the contents of the LSF_VERSION/resource_connector/cyclecloud directory to the LSF_TOP/LSF_VERSION/resource_connector directory
    6. Create the cyclecloud subdirectory in the LSF_TOP/conf/resource_connector directory.
    7. From the extracted package directory, copy the contents of the LSF_VERSION/resource_connector/cyclecloud/conf directory to the LSF_TOP/conf/resource_connector/cyclecloud directory.
    8. Edit the LSF_TOP/conf/resource_connector/hostProviders.json file and add an entry for the Azure Cycle Cloud provider.
    9. Replace the ebrokerd executable file in the LSF_SERVERDIR directory with the ebrokerd executable file in the extracted package.
    10. Run the badmin mbdrestart command to apply the changes