About IBM Spectrum LSF Data Manager

When large amounts of data are required to complete computations, it is desirable that your applications access required data unhindered by the location of the data in relation to the application execution environment. LSF Data Manager solves the problem of data locality by staging the required data as closely as possible to the site of the application.

Many applications in several domains require large amounts of data: fluid dynamics models for industrial manufacturing, seismic sensory data for oil and gas exploration, gene sequences for life sciences, among others. Locating these large data sets as close as possible to the application runtime environment is crucial to maintain optimal utilization of compute resources.

Whether you're running these data-intensive applications in a single cluster or you want to share data and compute resources across geographically separated clusters, LSF Data Manager provides the following key features.

  • Input data can be staged from an external source storage repository to a cache that is accessible to the cluster execution hosts.
  • Output data is staged asynchronously (dependency-free) from the cache after job completion.
  • Data transfers run separately from the job allocation, which means more jobs can request data without consuming resources while they wait for large data transfers.
  • Remote execution cluster selection and cluster affinity are based on data availability in a IBM® Spectrum LSF multicluster capability environment. LSF Data Manager transfers the required data to the cluster that the job was forwarded to.