IBM Support

How to troubleshoot Guardium missing DB User problems

How To


Summary

Guardium data activity monitor captures DB user information for monitored sessions.
Reports with sessions missing the DB user should be investigated by the Guardium administrator.
A new dashboard has been created to show important information for missing DB user problems.
Use the dashboard and guide in this technote to troubleshoot and resolve missing DB user problems.

Objective

1. Understand the possible causes of missing DB users

2. Import the Missing DB User Dashboard into your appliance and configure

3. Use the dashboard to identify the root cause of missing DB users in your environment

4. Take actions to resolve missing DB users

Steps

Recommended viewing
Review the video in this course on the Security Learning Academy:

1. Understand the possible causes of missing DB users

Missing DB user is a symptom that can have many possible causes. Depending on the exact circumstances, the 'missing' DB user may appear in reports as blank, ? or an incorrect string of random characters. There are three main groups of problems that can cause missing DB user.

i) Performance problems

DB user is contained in the login packets at the start of a database session. These packets must be captured and logged by Guardium to see the DB user. If they are missed there is no way to get them later in a session and DB user will be missing. If Guardium components are not performing well packets may be dropped, which will likely include some login packets.

The main characteristic of performance problems is that the missing DB User sessions are random. There may be some general trends, but it is not possible to predict exactly when and from what source they will come. Often with performance problems, other information that comes at the start of the session is missing as well, e.g. source program.

Sniffer performance - The sniffer queues packets for analysis and logging, if performance is bad these packets may be dropped. Users will be randomly missing across all sessions on the collector.

S-TAP performance -The S-TAP and KTAP queue packets for sending to collector, if performance is bad these packets may be dropped. Users will be randomly missing across all sessions from the S-TAP.

Network performance - Packets may be dropped in the network between S-TAP and collector or SPAN port / network tap and collector. For SPAN or network tap, users will be randomly missing from those specific servers. Other network performance problems may cause random missing users across one or more collectors. Network performance problems are less common than sniffer and S-TAP.

ii) Specific protocol problems

In rarer cases, a specific type of connection may be a problem for Guardium to log. The login packets are captured, but the sniffer can not process them as expected. There are some known conditions where missing DB user is expected. If it is not expected a new sniffer patch is often required to resolve the problem.

The main characteristic of specific protocol problems is that they are predictable. Every time the same kind of session is initiated, the DB user is always missing.

iii) DB connection established before S-TAP started

If the database connection was established before the S-TAP started then the DB user will not be captured. Long running sessions for example from an application with connection pool are a common cause of this type of missing user. It is also possible that after server reboot, the database starts before the S-TAP and the first sessions may have missing DB user. S-TAP is designed to avoid the problem after reboot, but it may happen in rare cases.

2. Import and configure the Missing DB User Dashboard

The dashboard is included by default in v11 and above, if you have a new v11 or above appliance go to step 3.

  1. Download the below dashboard definition export. This dashboard can be imported into v10.1.3 and above. For lower versions it is recommended to upgrade to latest GPU. Dashboards are not supported in v9.
    Missing_DB_User_Dashboard_v10.sql
  2. On a Central Manager or standalone appliance, import the definitions from Manage > Data Management > Definitions Import.
  3. On a collector with missing DB users open My Dashboards > My Custom Dashboards > Missing DB User Dashboard.
  4.  When imported, the run time parameters will be set by default to query from 'now -3 hours' to 'now'. It is recommended to change each report to use 'now -1 week' to 'now'. The parameters can be changed to check further back if required. Use 'Edit mode' on the dashboard to change them. Reports 'Latest Sessions Missing DB User' and 'Latest Exceptions Missing DB User' make take a long time to run if there is a lot of data on the appliance. If they run slowly, reduce the time period as required e.g. to 'now -1 day' to 'now'.

3. Use the dashboard to identify the root cause of missing users in your environment

The dashboard contains 12 reports used to troubleshoot. It should be used to answer the following questions about sessions missing DB user on a collector:

  • What percentage of the sessions are missing DB user?
  • How many S-TAPs are the sessions coming in from?
  • Are there indicators of performance problems in the sniffer?
  • Are there indicators of performance problems in the S-TAPs?
  • Are all the sessions coming in of the same specific type?
  • Are there specific exceptions correlated to missing DB users?
  • Are the sessions starting before S-TAP was running?

With answers to these questions the problem can be categorized as one or more of sniffer performance, S-TAP performance, network performance, specific protocol problem or DB connection established before S-TAP started.

Report(s) Analysis

Sessions Missing DB User Per Day

Sessions Per Day

Percentage of sessions missing DB user should inform the rest of the troubleshooting and clarify the severity of the issue. You can compare missing user sessions to total sessions each day. If 0.01% of sessions are missing the user it will be a different problem than if 50% are missing.

'Sessions Missing DB User Per Day' shows if the problem is consistent or intermittent. It shows if the problem has been resolved after a certain date.

Continue to monitor the frequency of the problem with these reports, especially after changes are made in the environment.

Local Sessions Missing User

Remote Sessions Missing User

Identify if the problem is confined to local or remote sessions and how many S-TAPs or network taps are impacted. This answers whether the missing user sessions are from many S-TAPs or just one. It gives more information to understand if only specific kinds of sessions are missing user.
Session Types Missing User

Identify if there are patterns in what sessions are missing the user based on server type, server ip, client ip, source program and os user.

Specific session types always missing user indicates a specific protocol problem. If there is no pattern to what type of sessions are missing user that indicates a performance problem.

Latest Sessions Missing User

Identify the session start times and details of the most recent sessions missing user. If slon capture is required this report can be used to verify that a missing user sessions was captured while slon was running.

In 'Edit Mode' use run time parameters to narrow down on specific conditions.

Exception Types Missing User

Identify what (if any) exceptions are associated with missing user. Certain exception types are common:

LOGIN_MISSED - Indicates that the sniffer did not get the login packets for the session. This could be due to packet loss from performance problem or because the DB connection was established before S-TAP started.

SQL_ERROR - Oracle errors ORA-12505 and ORA-12514 is a specific protocol problem expected to result in missing users, see actions below for more details.

Latest Exceptions Missing User Identify the exact times of exceptions. If there are sniffer or S-TAP performance problems, the timing can be correlated to this report.
Flat Log Requests

Flat log requests are the key indicator of analyzer queue overflow problem in the sniffer. When the sniffers analyzer buffer reaches its limit, packets are dropped and logged as flat log requests. Any increase in flat log requests is likely to cause missing users.

The report shows all cases when flat log requests were more than 0. Note that only an increase in flat log requests indicates a problem. Restarting sniffer resets flat log requests to 0.

Increasing flat log requests indicate sniffer performance problem.

Sniffer Restarts

Sniffer restarts may indicate logger queue overflow problem in sniffer or sniffer crash problem. Sniffer may also restart without indicating a problem e.g. after patch install, or by manually restarting from CLI.

The report shows the time of all sniffer restarts. Further investigation based on the above links is required to identify the exact cause.

High level of sniffer restarts indicate sniffer performance problem.

S-TAP Events

Identify if S-TAP buffer is overflowing by searching in run time parameters for events with 'buffer' in the message. Identify if the S-TAP is restarting unexpectedly by searching for events like 'connected to primary server'. S-TAP buffer overflow and/or frequent S-TAP restarts indicate S-TAP performance problem.

Identify when S-TAP started, for Windows S-TAP message is 'Guardium_STAP Started'. For Unix S-TAP message is 'Guardium TAP starting'

If there are no S-TAP events, then SPAN or network tap may be used. Check in S-TAP control page to see if any S-TAPs are connected.

KTAP Dropped Packets

For UNIX S-TAP only, identify if KTAP is dropping packets. This report is only populated if stap_statistic parameter is set in the guard_tap.ini, by default it is not. Update the S-TAP configuration to activate this feature.

Increase in KTAP dropped packets indicates S-TAP performance problem.

KTAP stats can be checked and reset on the db server by running <guardium install dir>/KTAP/current/guard_ktap_stats <get | reset>. It may be useful to reset the stats before investigating so it starts from 0 bytes dropped.

4. Take actions to resolve missing DB users

All problems

  • Guardium development are continuously working to improve sniffer and S-TAP performance and resolve defects. Upgrading to latest GPU, sniffer patch and S-TAP version is always recommended.
  • If you need to contact Guardium support for missing DB user issue, in all cases provide output of dashboard reports, your analysis so far and sniffer must gather. Additional specific actions for each kind of problem are detailed below.


Performance problems

  • V10.1.4 S-TAP and sniffer contains a new feature called 'priority queue'. It puts priority on first packets in a database session to try and avoid dropping those packets due to performance problem. This does not guarantee performance problem will not cause missing user, but it does help. It is highly recommended to upgrade both S-TAP and sniffer to a version V10.1.4 or above.
  • Long running sessions can cause difficulty for troubleshooting. There may be no performance problem now, but there was one days or weeks ago when sessions started. Where possible it is recommended to reset long running connections to the database after confirming there is currently no performance problem.

Sniffer performance

S-TAP performance

  • Review "Factors that affect S-TAP performance" section in Guardium Redbook and configure S-TAP to improve performance. Although the book was written for v9, the same advice applies to v10.
  • For UNIX S-TAP only S-TAP throughput can be increased by increasing number of S-TAP and KTAP threads. This is recommended if there are KTAP dropped packets but no S-TAP buffer overflows and no sniffer performance problem.
  • If S-TAP performance problems can not be resolved by the above Guardium support can assist. Before contacting support collect:

Network performance

  • In case of SPAN or network tap, if packets were dropped before reaching Guardium the problem must be resolved in the network. Guardium support can not provide network troubleshooting services if the problem is not on the appliance. Support can help to prove that the packets are not reaching the appliance. Before contacting support collect:
    • Tcpdump filtered on eth1 interface using support store tcpdump on cli command. Use the 'Latest Sessions Missing User' report to confirm a session missing DB user was logged when tcpdump was running
    • Slon capture, run at the same time as tcpdump
    • Sniffer must gather taken after slon and tcpdump were captured
    • Output of dashboard reports after slon and tcpdump were captured

Specific protocol problems

  • If local Oracle sqlplus sessions are missing user it may be expected. Check on the database for Oracle OS User Authentication.
  • If ORA-12505 and ORA-12514 exceptions are seen, confirm if the missing users are expected due to known Oracle errors.
  • If other specific sessions are always missing the DB user Guardium support can assist. Before contacting support collect:
    • Capture a slon containing the session start of a problem session. Use the 'Latest Sessions Missing User' report to confirm the problem was reproduced when slon was running
    • As much detail about the session login as possible, preferably the exact login string used by the client
    • Sniffer must gather taken after slon was captured
    • Output of dashboard reports after slon was captured

DB connection established before S-TAP started

  • For long running sessions or connection pools, reset the connection to the database if possible.
  • If sessions are missing soon after server reboot, the boot order on the server could be the cause. Guardium support can assist if needed. Before contacting support collect:
    • For UNIX S-TAP collect guard_diag
    • For Windows S-TAP collect 'diag' files
    • Analysis from the server administrator indicating any potential boot order problem

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSMPHH","label":"IBM Security Guardium"},"Component":"Sniffer;S-TAP;Policy;Reports","Platform":[{"code":"PF004","label":"Appliance"}],"Version":"10;11","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
03 February 2021

UID

ibm10719941