Technical Blog Post
Abstract
APM v8.1.2: No data in "Slow SQL Statements " for UD agent
Body
Hello everyone.
I have recently worked on an a problem regarding UD agent data collection where the DB2 agent was connected to an APM v8.1.2 infrastructure.
This was one of my first experience on APM v8 but I noticed something that I liked very much if compared with the ITM v6 world : the agent logs are really helpful even without activating specific diagnostic traces and this resulted in a quick and effective problem determination.
For example, in this case, after having installed the UD agent on the target server, we were able to see data for all the expected widgets, but one:
the "Slow SQL Statements - Top 5" was returning no data, just an empty box with message "No items to display".
For this kind of scenarios, the first component that needs to be investigated is the agent itself, to verify whether it is able to collect monitoring data or not.
First of all, let's have a look at the asfActivity log file.
It is a file having name like ud_db2_asfActivity_<date>_<time>-0x.log
In this file we can find rows containing data that is sent by the agent to the APM servers.
The rows look like:
1160530095418000 SEND: <ENVELOPE><SUBSCRIBER>KUD_defaultSubscription</SUBSCRIBER><REPORTDATA><WRITETIME>1160602095418000</WRITETIME><TMZDIFF>-10800</TMZDIFF><SQLTABLE><TABLENAME>KUDDBASE00</TABLENAME><COLUMNS><NAME>ORIGINNODE</NAME><NAME>ACSRT</NAME><NAME>APCCN</NAME><NAME>AVLWT</NAME><NAME>AVST</NAME><NAME>CCHRAT</NAME><NAME>DBCPUP</NAME><NAME>DBMEMP</NAME><NAME>DBNM</NAME><NAME>PRTNNO</NAME><NAME>DBSTAT</NAME><NAME>DDLK</NAME><NAME>DIRRD</NAME><NAME>DWRIT</NAME><NAME>FSTM</NAME><NAME>INAME</NAME><NAME>LESC</NAME><NAME>LTIO</NAME><NAME>LWTI</NAME><NAME>LKWT</NAME><NAME>LHLD</NAME><NAME>LWTNG</NAME><NAME>LNGLCKWT</NAME><NAME>MAXCON</NAME><NAME>PCHRAT</NAME><NAME>PLHR</NAME><NAME>PLIPR</NAME><NAME>PLIW</NAME><NAME>PLTLR</NAME><NAME>PLTLW</NAME><NAME>SSTIME</NAME><NAME>SHALLC</NAME><NAME>SOFL</NAME><NAME>SOFP</NAME><NAME>STMFP</NAME><NAME>SMTRLP</NAME><NAME>TOTLUPCT</NAME><NAME>TLSTI</NAME><NAME>TLSRT</NAME><NAME>TRANPERMIN</NAME></COLUMNS><ROWCOUNT>1</ROWCOUNT><ROW><![CDATA[0500018db2:DB2NODE1:UD000014000
What should we look for into this file ?
For example, we can verify whether the agent tried to send data for the attribute group used by the failing widget or if instead it sent no rows.
In our case, the data set that is used in the widget is "DB2 Slow SQL Stmts", and the internal name for this dataset is KUDSLSQL00.
So we could make a find for KUDSLSQL00 into the asfActivity log file.
In my case, I found that the SEND row related to table KUDSLSQL00 was actually indicating 0 rows sent:
1160530095418000 SEND: <ENVELOPE><SUBSCRIBER>KUD_defaultSubscription</SUBSCRIBER><REPORTDATA><WRITETIME>1160530095418000</WRITETIME><TMZDIFF>-10800</TMZDIFF><SQLTABLE><TABLENAME>KUDSLSQL00</TABLENAME><COLUMNS><NAME>ORIGINNODE</NAME><NAME>ACTSTAT</NAME><NAME>DBNM</NAME><NAME>PRTNNO</NAME><NAME>SQLDUR</NAME><NAME>EXECUTEID</NAME><NAME>LCKWAIT</NAME><NAME>STIME</NAME><NAME>STMTTXT</NAME><NAME>SQLTYPE</NAME></COLUMNS><ROWCOUNT>0</ROWCOUNT></SQLTABLE></REPORTDATA></ENVELOPE>
The rowcount is 0.
So the agent is not sending this information at all. The problem is with the agent.
What to do next ?
We can now look at the Agent log, the one having name like:
<hostname>_ud_db2_kudcma_<hex_timestamp>-0x.log
If the agent is failing in collecting a specific metric, there is a chance we can find in this file useful information about the root cause.
In this case, the following errors can explain the reason why the data set is not collected:
(573B16C6.0000-1E00:globalsql.cpp,433,"ExecuteStatementMultiRow") ERROR
returned from kud00_DB2LIB_Lib::pMyDB2LIB_SQLExecDirect(): rc = -1;
_hDbc 1; SQL Stmt 'SELECT T.STMT_TEXT,T.SECTION_TYPE,
T.INSERT_TIMESTAMP, T.STMT_EXEC_TIME,
T.TOTAL_CPU_TIME/T.NUM_EXEC_WITH_METRICS as
AVG_CPU_TIME,T.TOTAL_CPU_TIME, T.EXECUTABLE_ID, T.LOCK_WAITS FROM
TABLE(MON_GET_PKG_CACHE_STMT (NULL, NULL, NULL, -1)) as T WHERE
T.NUM_EXEC_WITH_METRICS <> 0 ORDER BY T.TOTAL_CPU_TIME DESC FETCH FIRST
5 ROWS ONLY'
and
(573B747A.0002-1E00:globalsql.cpp,323,"getSQLConnection") ERROR
returned from kud00_DB2LIB_Lib::pMyDB2LIB_SQLExecDirect(): rc = -1;
_hDbc 1; SQL Stmt 'select MAX(LOCK_WAIT_TIME) as LONG_WAIT_TIME from
TABLE(MON_GET_PKG_CACHE_STMT(NULL,NULL,NULL,-1))'
---------------
This error is caused by missing authorization.
As described in this link:
in order to collect the KPI included in the DB2 Slow SQL Stmts data set, we must grant the agent UserID the EXECUTE privilege on the
SYSPROC.MON_GET_PKG_CACHE_STMT function.
So in order to resolve the error message, we must run:
GRANT EXECUTE ON FUNCTION SYSPROC.MON_GET_PKG_CACHE_STMT(CHAR(),VARCHAR(), CLOB(),INTEGER) TO
USER <user>
and restart the agent.
After this step, also dashboard widget "Slow SQL Statements - Top 5Slow SQL Statements - Top 5" has been populated with expected data.
Best Regards
Subscribe and follow us for all the latest information directly on your social feeds:
|
|
|
Check out all our other posts and updates: | |
Academy Blogs: | https://goo.gl/U7cYYY |
Academy Videos: | https://goo.gl/FE7F59 |
Academy Google+: | https://goo.gl/Kj2mvZ |
Academy Twitter : | https://goo.gl/GsVecH |
UID
ibm11277164