Question & Answer
Question
Cause
Reason being KTAP could still be forwarding tcp packets to STAP and the moment CA is restarted it creates orphaned handles.
When restarting CA is really necessary, choose REBOOT server instead. Reboot starts the server from clean state and services/processes are startup via /etc/init so that will never go wrong.
Answer
The timing at which one (CA or STAP) hooks to the kernel first is important.
We recommend the following startup sequence. Reverse sequence will lead to server crash.
- START CA
- START STAP
When stopping services, stop STAP first, then stop CA.
- STOP STAP
- STOP CA
Apply the concept of first-in, last out in OS startup and shutdown scripts.
HOW TO CHECK THE SEQUENCE ?
Do the following command to check if the sequence is correct:
AIX: genkex | grep -E "ktap|SEOS"
Solaris: modinfo |grep -E "ktap|SEOS"
If you run above command soon after step 5, KTAP and SEOS modules should appear in the output.
** Note **
Most recently loaded module would appear at the top in the output. Since the right sequence is START CA then START STAP, the KTAP module should appear the top of the list followed by SEOS further down the list.
SEOS will hook the system call at the load time, whereas KTAP will only hook the system call when STAP starts.
So as long as STAP starts after CA and stops before CA, Both modules will be able to work together.
This principle applies to Trend Micro also
(in early 2020 it was noted that Guardium STAP and Trend Micro together can cause similar probelms - eg Trend Micro Deep Security Agent )
UPGRADE ACTIONS
At some point you may require to upgrade STAP, CA or install OS patches, the recommended actions:
1) stop STAP first
2) then stop CA last
3) proceed next to upgrade OS or CA or STAP
4) finally reboot server
HOW TO UNLOAD PREVIOUSLY LOADED KERNEL MODULES
Stopping STAP will not unload previously loaded kernel files. Recommended action is reboot DB server to start clean.
Currently a RFE has been raised to enable removal of kernel modules after stopping STAP. Some cases DB server is administered by separate team and would require different levels of managerial approval. It is often the reason why administrator attempt restart CA instead.
WHEN SERVER CRASH PERSISTS
If you applied the recommended resolution and still see crash, open a support ticket and include the following information in the PMR.
1. OS crash dump
When and how to force a system dump in AIX ?
Managing System Dump Devices
2. stap guard diag
IBM MustGather: Collecting data for Guardium STAP
3. central_logger.log - this file is found on the STAP host and you may use find command to locate it
4. provide inittab file
Related information
No traffic in Guardium report after OS and CA upgrade
Was this topic helpful?
Document Information
Modified date:
28 February 2020
UID
swg22001446