April 20, 2022 By Powell Quiring 3 min read

How to start monitoring your infrastructure using IBM Cloud Monitoring.

The foundation of continuous software improvement is measurement. Businesses have found that additional latency will affect customer satisfaction and sales. What are the latencies observed by your customers? Are servers overloaded? Underloaded? How do you know if you do not measure? A proven way to improve the services you provide is to choose metrics that are affecting your business and measure, observe and improve over time.

IBM Cloud has great out-of-the-box support for observing account resources in real-time. As an example, enable platform metrics in the same region as an existing VPC. Open a VPC Virtual Server Instance and look at the Monitoring preview:

Click on Launch monitoring to see a comprehensive metrics view focused on the instance:

Metrics can also be collected in application source code. A monitoring agent process running on the same VPC instance as the application will push the metrics to the IBM Cloud Monitoring instance. The following example was implemented on a Linux host:

In the diagram, notice the following:

  1. Program example.py is running on a VPC instance.
  2. Program sends StatsD metrics to the agent.
  3. Agent scrapes Prometheus metrics from the program.
  4. Monitoring agent sends metrics to the Monitoring instance.

You can find a companion repository that contains the source code for Python examples and instructions for creating the resources.

The Monitoring agent is a StatsD server and can be configured as a Prometheus forwarder. Modern programming languages have open-source libraries for both the StatsD client and Prometheus client exporter. For Python, check out the following:

Most programs will use either Prometheus or StatsD, but the code example.py has both:

h = prometheus_client.Histogram('custom_histogram', 'application prometheus example', buckets=buckets)
statsd = statsd.StatsClient()

@ h.time()
def prometheus_example(i):
  ... # do somethng

@statsd.timer("custom_timing")
def statsd_example(i):
  ... # do somethng

The Python annotations @h.time() and @statsd.timer("custom_timing") can be prepended to a function and will capture the execution time as a metric. The metrics can then be visualized in the Monitoring instance. Here is a snapshot from the example:

Prometheus exporters

There are open-source Prometheus exporters that can be installed on the VPC instance to gather metrics directly from the environment. These can be installed on the instance and configured to be scraped/forwarded by the dragent:

The node_exporter can capture some addition metrics directly from the VPC instance operating system (NFS metrics, for example).

The statsd_exporter will capture additional timing quantiles and acceptable error metrics. 

Application dashboard

I put together the dashboard to monitor my application:

Alerts

The App latency outlier in my dashboard looks like a problem that I need to look into. The Monitoring service supports alerts to notify my team of these anomalies:

The repository explains how to create the alert.

Try it yourself

Start monitoring your infrastructure using the IBM Cloud Monitoring service. Create custom metrics to get visibility into your software. Be alerted when systems are not in your defined parameters. Continuously improve outcomes by observing metrics over time and driving change.

The source code for this blog post can be found here, along with instructions.

Was this article helpful?
YesNo

More from Cloud

The power of embracing distributed hybrid infrastructure

2 min read - Data is the greatest asset to help organizations improve decision-making, fuel growth and boost competitiveness in the marketplace. But today’s organizations face the challenge of managing vast amounts of data across multiple environments. This is why understanding the uniqueness of your IT processes, workloads and applications demands a workload placement strategy based on key factors such as the type of data, necessary compute capacity and performance needed and meeting your regulatory security and compliance requirements. While hybrid cloud has become…

Serverless vs. microservices: Which architecture is best for your business?

7 min read - When enterprises need to build an application, one of the most important decisions their leaders must make is what kind of software development to use. While there are many software architectures to choose from, serverless and microservices architectures are increasingly popular due to their scalability, flexibility and performance. Also, with spending on cloud services expected to double in the next four years, both serverless and microservices instances should grow rapidly since they are widely used in cloud computing environments. While…

Seamless cloud migration and modernization: overcoming common challenges with generative AI assets and innovative commercial models

3 min read - As organizations continue to adopt cloud-based services, it’s more pressing to migrate and modernize infrastructure, applications and data to the cloud to stay competitive. Traditional migration and modernization approach often involve manual processes, leading to increased costs, delayed time-to-value and increased risk. Cloud migration and modernization can be complex and time-consuming processes that come with unique challenges; meanwhile there are many benefits to gen AI assets and assistants and innovative commercial models. Cloud Migration and Modernization Factory from IBM Consulting®…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters