Windows: Monitoring with the Guardium Agent Monitor

The Guardium Agent Monitor (GAM) process monitors Guardium agent performance and responsiveness. Use GAM for detailed analysis during troubleshooting.

Note: Set the GAM service to off by default as it requires configuration specific to the environment in which it is installed. Improper configuration can cause serious operational issues. GAM is a tool to aid in troubleshooting and otherwise is not required.

Monitoring covers the following services:

  • CPU usage
  • Memory
  • Handles
  • Number of threads
  • Alive
If a monitored agent exceeds a configured threshold, or if it does not respond to the console request, the following actions can be taken, in any combination:
  • Automatically run diag.bat (or the name of your diagnostics application file).
  • Automatically stop or restart the service..
  • Automatically perform a core dump.

Guardium Agent Monitor is installed when S-TAP® is installed but is not enabled by default. When S-TAP is uninstalled, GAM is uninstalled.

Note: Just like S-TAP, GAM requires administrative privileges. When you install GAM, run with "Run as Administrator" as an administrative user. If you run it as a non-admin user, GAM returns an Access Denied error.

The default installation location for GAM is the parent folder of S-TAP (C:\Program Files\IBM\Guardium Agent Monitor\).

The default location for GAM output is the \Bin\ subfolder.

After you enable GAM, make sure that the process is running on the database server (resmon.exe).

Guardium Activity Monitor (GAM) is listed in the Services as IBM Security Guardium Resource Monitor Service, which has a Service Name property of Guardium Resource Monitor.

For an example of how GAM works, see Resource monitoring example.

Global level configuration

The following parameters pertain to the GAM service process and defined in [Global] section.
resmon.ini Default value Description
NUMBER_OF_SERVICES 1 Number of services being monitored. The minimum is 0, there is no maximum.
UPDATE_INTERVAL 1 The length, in seconds, of the interval between polling metrics.
DEBUG 1 Deprecated.
NUMBER_BYTES_IN_LOG 200 Maximum number of KB for the GAM log. There is no maximum size.
ACTION 1 Determine whether to generate a dump when your system exceeds certain thresholds such as THREAD_COUNT_LIMIT or MEM_USAGE_LIMIT. Valid values:
  • 0: Do not generate a dump.
  • 1: Generate a dump.
FULLDUMP 0 Valid values:
  • 0: Generate a mini-dump.
  • 1: Generate a full dump when dump is generated.
Note: A full dump takes more time.
CPUAVE 1 Defines the way to calculate the average CPU time.
  • 0: Percentage of one core.
  • 1: Average percentage of all cores in system.
MDTIMEOUT 1000 Timeout of generating a dump in milliseconds. A dump is not generated if the time is exceeded.

Service level configuration

The following parameters apply to each service and are defined in the [Service_N] section. The name of the section can be anything except [Global]. For [Service1], Name=GUARD_STAP is defined by default.

resmon.ini Default value Description
Name GUARDIUM_STAP The name of the Windows Service for GAM to monitor.
NAMEDPIPE_INTERVAL 30 For supported Windows S-TAPS only. The interval, in seconds, to check aliveness (supported agents only). Set to 0 to disable.

For more information about named pipes, see Protocols 7 and 8 Inspection engine parameters.

DIAGACTION 0 Run diagnostic on action.
  • 0: Do not run diagnostics.
  • 1: Run diagnostics (usually diag.bat) when the monitored service exceeds the limit in specified intervals.
DIAGNAME diag.bat Diagnostic file name. When set to diag.bat, GAM calls the application from the same directory as the service process.
DIAG_PARAMETER (none) Diagnostic parameters. If the parameter has spaces, the parameter must be enclosed with quotation marks (").
CPU Threshold Configuration
resmon.ini Default value Description
CPU_LOAD_LIMIT 10 Percentage CPU threshold at which either action is taken, or UPDATE_INTERVAL starts counting occurrences of reaching threshold.

The minimum is 1. Maximum is 100.

CPU_INTERVALS_ALLOWED 10 Number of intervals the CPU can be above the threshold before it triggers an action (used with UPDATE_INTERVAL to set a time limit).
UPDATE_INTERVAL 1 Valid values:
  • 0: Take action when CPU reaches its load limit.
  • 1: Take action when CPU reaches its load limit the number of times that are specified by CPU_INTERVALS_ALLOWED.
Memory Usage, Handle Count and Thread Count Thresholds Configuration
resmon.withi Default value Description
MEM_USAGE_LIMIT 150000 Lower-level threshold in KB. An action is triggered if this limit is exceeded for more intervals than MEM_USAGE_INTERVALS_ALLOWED.
MEM_USAGE_INTERVALS_ALLOWED 30 Number of intervals allowed for the lower limit threshold before an action is triggered (used with UPDATE_INTERVAL for time limit).
MEM_USAGE_PEAK_LIMIT 200000 Upper level threshold in KB. An action is triggered if this threshold is exceeded once.
HANDLE_COUNT_LIMIT 500 Lower-level threshold. An action is triggered if this limit is exceeded for more intervals than HANDLE_COUNT_INTERVALS_ALLOWED.
HANDLE_COUNT_INTERVALS_ALLOWED 20 Number of intervals allowed for the lower limit threshold before an action is triggered (used with UPDATE_INTERVAL for time limit).
HANDLE_COUNT_PEAK_LIMIT 1000 Upper level threshold. An action is triggered if this threshold is exceeded once.
THREAD_COUNT_LIMIT 200 Lower-level threshold. An action is triggered if this limit is exceeded for more intervals than THREAD_COUNT_INTERVALS_ALLOWED.
THREAD_COUNT_INTERVALS_ALLOWED 20 Number of intervals allowed for the lower limit threshold before an action is triggered (used with UPDATE_INTERVAL for time limit).
THREAD_COUNT_PEAK_LIMIT 300 Upper level threshold. An action is triggered if this threshold is exceeded one time.

Action Configuration

The actions that can be triggered are described under Core Dump Configuration and Diagnostic Configuration. The second and third actions are only initiated if they are triggered within the ACTION_RESET_INTERVAL of the previous action. If the ACTION_RESET_INTERVAL time has elapsed with no new triggers, then the next trigger starts a new cycle starts with the FIRST_ACTION.
resmon.ini Default value Description
FIRST_ACTION 1 Valid values:
  • 0: No action.
  • 1: Take action then restart the service.
  • 2: Take action then stop the service without restarting.
SECOND_ACTION 1 Valid values:
  • 0: No action.
  • 1: Take action then restart the service.
  • 2: Take action, then stop the service without restarting.
THIRD_ACTION 2 Valid values:
  • 0: No action.
  • 1: Take action then restart the service.
  • 2: Take action, then stop the service without restarting.
ACTION_RESET_INTERVALS 60 Number of seconds before resetting the action count. For example, if an action is triggered after more than 60 seconds since the previous action, FIRST_ACTION is applied.

Resource monitoring example

This example shows how the Guardium Agent Monitor (GAM) settings interact to provide meaningful information. This example tracks memory usage settings. However, the same pattern applies to other resources.

Let's say that you have the following settings:
  • NAME=GUARDIUM_STAP
  • UPDATE_INTERVAL=1
  • MEM_USAGE_LIMIT=150000
  • MEM_USAGE_INTERVALS_ALLOWED=30
  • MEM_USAGE_PEAK_LIMIT=200000
  • ACTION=1
  • FULLDUMP=0
  • DIAGACTION=1
  • DIAGNAME=diag.bat
  • FIRST_ACTION=1
  • ACTION_RESET_INTERVALS=60
  • SECOND_ACTION=2
  • THIRD_ACTION=2
GAM monitors memory usage of GUARDIUM_STAP service (as specified in NAME) every second (UPDATE_INTERVAL). If MEM_USAGE-LIMIT exceeds 150 MB 30 consecutive times (MEM_USAGE_INTERVALS_ALLOWED) or if MEM_USAGE_PEAK_LIMIT exceeds 200 MB once, GAM takes the following actions:
  • Generate a mini-dump (ACTION and FULLDUMP)
  • Run diag.bat (DIAGACTION and DIAGNAME)
  • Restart the service (FIRST_ACTION).
  • If the same symptoms occur within 60 seconds (ACTION_RESET_INTERVALS), GAM takes the same actions (SECOND_ACTION).
  • If the same symptoms occur again within 60 seconds, GAM generates a mini-dump, runs diag.bat, and stops the service without restarting (THIRD_ACTION).