Enabling adaptive scheduling
Optionally, enable adaptive scheduling for efficient usage of mixed resources (CPU and GPU) in your cluster. With adaptive scheduling, tasks run first on a portion of GPU resources in the cluster. When GPU resources are no longer available in the cluster, the remaining tasks run on CPU resources.
Before you begin
- You can only enable adaptive scheduling with certain Spark versions. Spark versions not supported: 1.5.2.
- You require separate resource groups for CPU and GPU hosts. See Using resource groups with GPU hosts.
- Your instance group must be enabled to use GPU resources for applications. See Enabling GPUs.
About this task
CPU and GPU tasks in many applications, especially traditional machine-learning applications, and deep-learning frameworks, are convertible. With adaptive scheduling, when your cluster is short on GPU resources, CPU resources can help speed up large-scale machine-learning applications and improve resource usage.
Procedure
Follow these steps to enable adaptive scheduling for an instance group. This task calls out only the steps to configure adaptive scheduling when you create an instance group. For more information on how to create an instance group, see Creating instance groups.
Results
What to do next
- Deploy the instance group; then, start it. See Starting instance groups.
- Submit a Spark application with GPU to the instance group. See either Submitting a Spark application with GPU RDD or Submitting a Spark application without GPU RDD. Note: For adaptive scheduling, the SPARK_EGO_WORKLOAD_TYPE environment variable is set internally when the task is run to indicate workload type (either GPU or CPU). You can define different logic for GPU and CPU processing in the application task logic. For example:
def feature_extractor(path): if (os.environ.has_key("SPARK_EGO_WORKLOAD_TYPE”)) and (os.environ[‘SPARK_EGO_WORKLOAD_TYPE’] == ‘GPU’): feature = runGPULogical() else: feature = runCPULogical() return feature sc.parallelize(...).gpu().map(lambda path: feature_extractor(path)).collect()
After the application is submitted, drill down from the Spark master web UI to monitor task details. Additionally, you can use the Workload Type column in the task list to check whether tasks are running on GPU or CPU hosts.