April 20, 2022 By Powell Quiring 3 min read

How to start monitoring your infrastructure using IBM Cloud Monitoring.

The foundation of continuous software improvement is measurement. Businesses have found that additional latency will affect customer satisfaction and sales. What are the latencies observed by your customers? Are servers overloaded? Underloaded? How do you know if you do not measure? A proven way to improve the services you provide is to choose metrics that are affecting your business and measure, observe and improve over time.

IBM Cloud has great out-of-the-box support for observing account resources in real-time. As an example, enable platform metrics in the same region as an existing VPC. Open a VPC Virtual Server Instance and look at the Monitoring preview:

Click on Launch monitoring to see a comprehensive metrics view focused on the instance:

Metrics can also be collected in application source code. A monitoring agent process running on the same VPC instance as the application will push the metrics to the IBM Cloud Monitoring instance. The following example was implemented on a Linux host:

In the diagram, notice the following:

  1. Program example.py is running on a VPC instance.
  2. Program sends StatsD metrics to the agent.
  3. Agent scrapes Prometheus metrics from the program.
  4. Monitoring agent sends metrics to the Monitoring instance.

You can find a companion repository that contains the source code for Python examples and instructions for creating the resources.

The Monitoring agent is a StatsD server and can be configured as a Prometheus forwarder. Modern programming languages have open-source libraries for both the StatsD client and Prometheus client exporter. For Python, check out the following:

Most programs will use either Prometheus or StatsD, but the code example.py has both:

h = prometheus_client.Histogram('custom_histogram', 'application prometheus example', buckets=buckets)
statsd = statsd.StatsClient()

@ h.time()
def prometheus_example(i):
  ... # do somethng

@statsd.timer("custom_timing")
def statsd_example(i):
  ... # do somethng

The Python annotations @h.time() and @statsd.timer("custom_timing") can be prepended to a function and will capture the execution time as a metric. The metrics can then be visualized in the Monitoring instance. Here is a snapshot from the example:

Prometheus exporters

There are open-source Prometheus exporters that can be installed on the VPC instance to gather metrics directly from the environment. These can be installed on the instance and configured to be scraped/forwarded by the dragent:

The node_exporter can capture some addition metrics directly from the VPC instance operating system (NFS metrics, for example).

The statsd_exporter will capture additional timing quantiles and acceptable error metrics. 

Application dashboard

I put together the dashboard to monitor my application:

Alerts

The App latency outlier in my dashboard looks like a problem that I need to look into. The Monitoring service supports alerts to notify my team of these anomalies:

The repository explains how to create the alert.

Try it yourself

Start monitoring your infrastructure using the IBM Cloud Monitoring service. Create custom metrics to get visibility into your software. Be alerted when systems are not in your defined parameters. Continuously improve outcomes by observing metrics over time and driving change.

The source code for this blog post can be found here, along with instructions.

Was this article helpful?
YesNo

More from Cloud

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights into…

The power of the mainframe and cloud-native applications 

4 min read - Mainframe modernization refers to the process of transforming legacy mainframe systems, applications and infrastructure to align with modern technology and business standards. This process unlocks the power of mainframe systems, enabling organizations to use their existing investments in mainframe technology and capitalize on the benefits of modernization. By modernizing mainframe systems, organizations can improve agility, increase efficiency, reduce costs, and enhance customer experience.  Mainframe modernization empowers organizations to harness the latest technologies and tools, such as cloud computing, artificial intelligence,…

Modernize your mainframe applications with Azure

4 min read - Mainframes continue to play a vital role in many businesses' core operations. According to new research from IBM's Institute for Business Value, a significant 7 out of 10 IT executives believe that mainframe-based applications are crucial to their business and technology strategies. However, the rapid pace of digital transformation is forcing companies to modernize across their IT landscape, and as the pace of innovation continuously accelerates, organizations must react and adapt to these changes or risk being left behind. Mainframe…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters