IBM Support

FAQ2: Analyzing Large Volumes of nmon Data

How To


Summary

Frequently Asked Questions from 2013 and still asked today.

Objective

Nigels Banner

Steps

I regularly get asked a question like: I have 4 months of data from 25 machines and have to develop a Capacity Planning model to size these LPARs on to new machines but I am having problems with having so much data. What can you recommend?  We need graphs of

  • CPU compared to Entitlement
  • CPU Physical CPU use
  • Maximum real memory use
  • Network MB/s
  • Disk MB/s
  • Disk IOPS

Sometimes this data is needed as input in to the Workload Estimator tool or Server Consolidation tools.

My standard response is: You now understand Performance Monitoring and Tuning level data is NOT what you really need if you are doing Capacity Planning!

Followed by: Have you also collected the nmon -x data? This collect 15 minute sample rate data and so just 96 samples a day - this is what you need for Capacity Planning. Note: below I assume this reduced data is not available.

You have roughly 3000 nmon files of (let us guess here) 10 MB each = 30 GB of raw or something like 300 GB of Excel data. Microsoft Excel would explode with a handful of these files.  The nmon consolidator has limits on the number of files and size of files and in particular the number of snapshots in the files.

There is also hidden traps:

  1. If you average out the CPU, Net, RAM, disks etc statistics you will dilute the peaks to meaningless low averages - and its the peaks you really want.
  2. If you take just the peaks then you will find every LPAR has tiny periods of peaks at 100% CPU and RAM and you will not see if the peaks happen at the same time of day across machines or not.
  3. I do NOT like the use of  anything relative to Entitlement - please use Physical CPU used - otherwise you can't add up the CPU requirement.

So what can I recommend - well it is a complex question to answer and there is no simple answer. If there was I would take a patient out and retire!

Some Approaches are - I attempt to make some of these humorous:

  1. Just do one day - Ask the users for the busiest day in the past 4 months and just look at that - silently ditch the rest of the data.
  2. Just do them all - By hand look at each days worth of nmon files (25 of them) and work though all 120 days. If you work hard that is roughly 180 hours work = roughly 5 weeks work plus two weeks off sick due to headaches.
  3. Fix the world - Fix the nmon Consolidator and Microsoft Excel while you are there and run it on a PC with 300GB of RAM. Your fellow nmon specialist will thank you for for a long time to come - just don't expect any cash from them!
  4. Script to pick out the data you need - If the data is highly consistent (same OS level, similar numbers of CPU and disk drives) then you could attempt awk, sed, grep scripts to start reducing the data set so you could aggregate the 25 machines in to data files one for each statistics for CPU, data for disks etc. then load the CSV data in to Excel or similar tool for graphing.
  5. Use nmon2web - Pour the data in to nmon2web (if you have not used it before this would take some time to setup - it creates rrdtool databases on a web server and displays the results on a browser) and get the nmon2web front end to aggregate the days and machines as you want. This is a workable solution but has some "up front" costs in setup and assumes you have a webserver and hands-on skills.
  6. Make it someone elses problem - Hire IBM or IBM Business Partner Services and make it their problem! 
  7. Do one week and cross check - Ask the users for a busy week then look at just the 175 graphs and create a spreadsheet of those for the core numbers you want. Then go sanity check the peaks are roughly the same in other weeks. For example, go check that the online peaks normally Friday at 2 pm  but also other machines busy on Wednesday mornings and the heavy batch jobs on specific servers between 10 pm and 1 am and is the network backup always done by 7 am?
  8. RDBMS - Stuff all the nmon data into a RDBMS and use RDBMS tools to extract the data in the format you need - thankfully not my (Mr nmon's) problem.
  9. Go for 3rd party Tools - There are third party performance monitoring (for example Midrange Performance Group their default collector for AIX, VIOS and Linux is nmon) and capacity planing tools that will take nmon data. That is a very nice option if you already have the tools available but most come at a cost due to the marvelous data handling and modeling functions they supply.
  10. Use rrdtool to consolidate the data - Lastly, the nmon2rrd tool could be used to extract the data from each machine into longer term rrdtool databases and the graphs generated from there. This would require some rrdtool and scripting knowledge.

I don't think any of these are a simple solution to the large problem here and there is no missing a trick - this is a genuine large problem faced by many.

Final thoughts:

  1. If you have some magic bullet for fixing this problem that I have missed then please lets have them.
  2. Perhaps you already have script to extract the key global stats (as seen at the top of this blog entry) from nmon data.
  3. Or a simple way to produce say 15 minute stats from 10 second nmon data.
  4. An nmon user group supported project for method 10 would be good. Any one got 4 months of data from one machine? My machines are pretty boring and they are not production and regularly busy.
  5. I don't want answers like - install my favourite performance monitoring tool and wait 4 months for it to collect data.
UPDATE for 2020
 
Seven years later we now have better answers:
  1. nsum is a few Korn shell scripts for analysing lots of nmon files take a look at the article
  2. nmon data can be changed to JSON and injected a time -series database like InfluxDB and graphed by Grafana, also elastic (elasticsearch or ELK) or Splunk.
    • Find out more at the njmon website (nmon outputing JSON data). See these two websites for more informatio: nmon2json and njmon

Additional Information


Other places to find content from Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power -\u003EPowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
14 June 2023

UID

ibm11117587