High Performance Spark: Best Practices For Scaling and Optimizing Apache Spark
Apache Spark
Apache Spark Monitoring and Performance Management

Apache Spark is the largest open source data processing project, providing a fast data processing tool for big data and deep analytics. Instana’s Apache Spark Monitoring includes the ability to monitor Spark deployed through AWS EMR, but can also monitor Spark Standalone Cluster Manager. Spark performance monitoring revolves around monitoring the Spark Driver instance. Instana’s Spark Monitoring Sensor supports both Driver deployment methods.

Start your FREE TRIAL today!

14 days, no credit card, full version

Spark Performance and Health Monitoring

Depending on the type of application that has been deployed (EMR, Standalone), different data is collected and used for monitoring.

Spark Performance and Configuration Monitoring

For Spark instances running on AWS EMR, install the Instana agent on the Amazon EC2 instances withing the EMR cluster. If you want automated deployment of the Spark monitoring sensor, the Instana agent must be placed on all nodes in the EMR cluster.

Instana’s Spark Monitoring includes an automatically built summary dashboard that centers around application KPIs – including response time and load. The dashboard also includes key infrastructure configuration and performance metrics, as well as specific Spark processing data metrics. The dashboard allows DevOps and IT Ops to see all relevant Spark data on one screen, making it easy to understand the state of their Spark instances.

Monitoring the health and performance of Apache Spark instances requires both an understanding of Spark, itself, as well as the ability to see the interactions and dependencies between clustered spark instances and the interactions with other microservices (both upstream and downstream). Instana’s Spark monitoring sensor automatically identifies and collects those relevant metrics.

Spark Monitoring Data

 

Batch Applications

  • Jobs
  • Stages
  • Longest Completed Steps
  • Executors

    Streaming Applications

    • Batching
    • Scheduling Delay
    • Total Delay
    • Processing Time
    • Output Operations
    • Input Records
    • Receivers
    • Executors

      Configuration

      • Host
      • Port
      • Rest URI
      • Version
      • Status

        Metrics

        • Alive Wokers
        • Dead Workers
        • Decommissioned Workers
        • Workers in Unknown State
        • Used Memory
        • Total Memory
        • Used Cores
        • Total Cores
        • Data and Metrics per Worker
        • Most Recent Apps
        • Most Recent Drivers
          Spark Monitoring Sensor Installation: Getting Started

          Ready to start monitoring Spark? Begin by signing up for a free Instana trial. Once you have an account, hit the Spark Management Documentation for details on how to configure different Spark driver and deployment types.

          Start a free trial Spark Management Documentation