Technical Blog Post
Abstract
Unix OS Agent in VIOS unexpectedly hangs
Body
Virtual I/O Server is a software located into a SystemP logical partition that it used to configure and share physical I/O resources between all the others logical partitions
of the server.
This means it has visibility of all the physical and logical storage resources defined on the machine.
In the VIOS, you can run a pre-installed ITM agent called VIOS Agent, but you can also run Unix OS Agent like any other AIX LPAR.
It can happen that Unix OS Agent, once started, shows no data on TEP despite it is correctly registered and online to TEMS/TEPS.
When something similar happens for Unix OS agents running in a VIOS, this likely depends because of an unhandled exception that occurs in the aixdp_daemon process.
If you look at the agent logs (log having name like <hostname>_ux_<epochtime>.log, for example myvios_ux_1495095330.log) you can notice that aixdp_daemon generates a stack trace like this one:
**** Fatal Error (11) Detected in kuxagent or a helper binary ****
**** Stacktrace in standard logs. Enable KBB_SIG1=-dumpoff in ini for core dumps ****
+++PARALLEL TOOLS CONSORTIUM LIGHTWEIGHT COREFILE FORMAT version 1.0
+++LCB 1.0 Thu May 18 10:30:50 2017 Generated by IBM AIX 6.1
#
+++ID Node 0 Process 19988686 Thread 1
***FAULT "SIGSEGV - Segmentation violation"
+++STACK
leftmost : 0x0000000c
malloc_y : 0x00000534
malloc_common@AF104_86 : 0x00000028
get_cu_vtargets : 0x00000228
init_odm : 0x0000016c
create_cu_hashtbl : 0x000001a4
get_diskstats : 0x000004d8
adp_get_diskstats : 0x00000108
dt_CollectDiskData : 0x0000010c
dt_CollectData : 0x000002b4
ux_CollectData : 0x00000110
reply_disks_request__FiPv : 1361 # in file <aixdp_daemon.cpp>
reply_data_request__FiPv : 349 # in file <aixdp_daemon.cpp>
process_data__FiPv : 298 # in file <aixdp_daemon.cpp>
main : 192 # in file <aixdp_daemon.cpp>
In this condition, the aixdp_daemon did not crash and causes the remaining subprocesses and agent threads to hang.
If you set KBB_SIG1=-dumpoff, a core is generated and the aixdp_daemon process is closed, freeing the remaining UNIX OS Agent processes and allowing
them to work and return data.
Continuing with the analysis, if we look at the last row written by the crashing process, we can see:
(592FC9B1.07A2-1:dkstats.c,1984,"get_cu_vtargets") Entry
Considering that the code was trying to allocate more memory, there is a meaningful chance the problem occurs because of the big amount of disk entries.
Of course, being this one a VIOS, the aixdp_daemon will discover a lot of I/O resources and this could hit system resource and process limits,
that are then highlighted in the above segment violation exception.
Considering that we are dealing with a VIOS, there is likely a VIOS agent already running on the same system.
The simplest fix is to turn off the AIX collection in the ux agent, since this is collecting redundant data that the VIOS agent (va) is already
collecting. The Unix OS Agent (ux) and VIOS agent (va) share a common code base for AIX called aixDataProvider.
The va agent runs this code, the same as the AIX Premium agent on AIX LPARs that are not VIOS,but the va agent also collects more data specific to VIOS.
So you don't really need to collect this data for both the va and ux agent, and the va agent is specifically designed for VIOS.
It is unusual for users to have both the va and ux agents running on the VIOS, but if you need UX agent on VIOS for any reason, you can disable aixdp_daemon process in case you experience the problem described in this blog article.
You can do it by editing the /opt/IBM/ITM/config/ux.ini file and changing:
KUX_AIXDP=true
to:
KUX_AIXDP=false
Then restart the ux agent.
In this way Unix OS Agent will be initialized without aixdp_daemon process and will be then able to collect and show all the other metrics in TEP.
Thanks for reading.
Subscribe and follow us for all the latest information directly on your social feeds:
|
|
|
Check out all our other posts and updates: | |
Academy Blogs: | https://goo.gl/U7cYYY |
Academy Videos: | https://goo.gl/FE7F59 |
Academy Google+: | https://goo.gl/Kj2mvZ |
Academy Twitter : | https://goo.gl/GsVecH |
UID
ibm11277068