Event and real-time database and system monitoring in a Db2 pureScale environment
Db2 9.7 introduced a number of enhancements to the monitoring infrastructure for the Db2 product. One of these enhancements was a set of table functions that provide access to hundreds of in-memory monitor elements that you can use query the state of your database environment at a specific point in time. Other enhancements included improved event monitors for capturing information about such things as locking, units of work, and activities as they occur.
The ability to monitor CFs in addition to Db2 members
CFs, with the different role they play as compared to members in a Db2 pureScale environment introduce additional monitoring needs. For example, in Db2 instances other than Db2 pureScale instances, you might be interested in monitoring for buffer pool hit ratios, which represents the number of pages that are found in memory, as compared to the number of pages that must be read from disk. Higher buffer pool hit ratios are, generally speaking, a reflection of better performance. The higher performance is because there is less I/O involved in bringing needed pages into memory. In a Db2 pureScale environment, all physical page reads from disk are performed by the members, but only after they first check with the CF to see if the group buffer pool has a record of any other member with a valid page that they can use. Thus, whereas you might be accustomed to tuning only local buffer pools in a Db2 environment other than a Db2 pureScale environment, monitoring buffer pool hit ratios in the group buffer pool of the CF is also important in a Db2 pureScale environment. The more times pages can be found in either a local or group buffer pool (GBP), the fewer times they must be read in from disk.
In addition to the GBP, the global lock manager (GLM) is another component of the CF that you can monitor. The GLM manages locking of objects across all the members in a Db2 pureScale instance. The Db2 pureScale Feature adds monitor elements that you can use to monitor locking between members.
How monitor elements in a Db2 pureScale instance are reported
SELECT VARCHAR(TBSP_NAME, 30) AS TBSP_NAME,
MEMBER, POOL_DATA_L_READS,
TBSP_TOTAL_PAGES
FROM TABLE(MON_GET_TABLESPACE('USERSPACE1',-2))
The results of this query look like the following example:
TBSP_NAME MEMBER POOL_DATA_L_READS TBSP_TOTAL_PAGES
------------------------------ ------ -------------------- --------------------
USERSPACE1 1 0 4096
USERSPACE1 2 0 4096
USERSPACE1 3 0 4096
USERSPACE1 0 36 4096
4 record(s) selected.
In this example, the number of
logical reads from the local buffer pool for each member is different
because each member performs its reads independently of other members;
however the total pages for the table space is the same across all
members, because all members are working from the same instance of
USERSPACE1. Effects of component failure on monitor element reporting
If a host, member or CF in a Db2 pureScale environment fails, unless the entire Db2 pureScale instance is taken down, you can still retrieve monitor elements from the instance. However, the components that fail do not generate statistics. This fact is apparent if you are running a query such as the first example shown in How monitor elements in a Db2 pureScale instance are reported, where data from each member is shown individually. If you use a query that aggregates information across members, though, you might not notice that data from a member is missing.
If a member fails while monitor element data collection is taking place, the data collection process pauses until the communications problem with the failed member has been detected, or the TCP/IP timeout period has passed. In this situation, the data is still reported, however, there is no information from the failed member.
Finally, keep in mind that if a member fails, all the statistics accumulated in the monitor elements are reset to 0.