Running ProbeVue
Dynamic tracing is only allowed for users with privileges or for the superuser.
Authorizations and privileges
This is unlike the static tracing facilities in AIX®, which enforce relatively limited privilege checking. There is a reason for requiring privileges to run the probevue command. A Vue script can potentially produce more severe impacts on system performance than a static tracing facility like AIX system trace. This is because probe points for system trace are pre-defined and restricted. ProbeVue can potentially support many more probe points and the probe locations can potentially be defined almost anywhere. Further, ProbeVue trace actions at a probe point can take much longer to issue than the system trace actions at a probe point since those are limited to explicit data capture.
In addition, ProbeVue allows you to trace processes and read kernel global variables, both of which need to be controlled to prevent security exposures. A ProbeVue session can also consume a lot of pinned memory and restricting usage of ProbeVue to users with privilege reduces the risk of denial of service attacks. ProbeVue also allows administrators to control the memory usage of ProbeVue sessions through the SMIT interface.
Privileges for dynamic tracing are obtained differently depending upon whether role-based access control (RBAC) is enabled or not. Please refer to the AIX man pages for more information about enabling and disabling RBAC.
Note that in legacy or RBAC-disabled mode, there are no authorizations. Regular users cannot acquire privileges to run the probevue command to start a dynamic tracing session or run the probevctrl command to administer ProbeVue. Only the superuser can have privileges for both these functions. Do not disable RBAC when using ProbeVue unless you prefer to restrict this facility to root users only.
RBAC-enabled mode
Privileges in an RBAC system are obtained through authorizations. An authorization is a text string associated with security-related functions or commands. Authorizations provide the mechanism to grant rights to you to perform privileged actions. Only a user with sufficient authorization can issue the probevue command and start a dynamic tracing session.
- aix.ras.probevue.trace.user.self
- This authorization allows you to trace their applications in user space. The user ID of the process to be traced must be equal to the real user ID of the user invoking the probevue command. This authorization allows you to enable probe points provided by the uft probe manager for your processes. However, the effective, real and saved user IDs of the process to be traced must be equal. Thus, you cannot trace setuid programs with just this authorization.
- aix.ras.probevue.trace.user
- This authorization allows you to trace any application in user space including setuid programs and applications started by the superuser. Be careful when handing out this authorization. This authorization allows you to issue the probevue command and enable probe points provided by the uft probe manager for any process on the system.
- aix.ras.probevue.trace.syscall.self
- This authorization allows you to trace system calls made by their applications. The effective, real and saved user IDs of the process making the system call must be the same and equal to the real user ID of the user invoking the probevue command. This authorization allows you to enable probe points provided by the syscall probe manager for your processes. The second field of the probe specification must indicate the process ID for a process started by you.
- aix.ras.probevue.trace.syscall
- This authorization allows you to trace system calls made by any application on the system including setuid programs and applications started by the superuser. Be careful when handing out this authorization. This authorization allows you to issue the probevue command and enable probe points provided by the syscall probe manager for any process. The second field of the probe specification can either be set to a process ID to probe a specific process or to * to probe all processes.
- aix.ras.probevue.trace
- This authorization allows you to trace the entire system and includes all the authorizations defined in the preceding sections. You can also access and read kernel variables when running the probevue command, as well as to trace system trace events using the systrace probe manager. Be careful when handing out this authorization.
- aix.ras.probevue.manage
- This authorization allows you to administer ProbeVue. This includes changing the values of the different ProbeVue parameters, starting or stopping ProbeVue and viewing details of dynamic tracing sessions of all users when running the probevctrl command. Without this authorization, you can use the probevctrl command to view session data for dynamic tracing sessions started by you or view the current values for ProbeVue parameters.
- aix.ras.probevue.rase
- This authorization allows you to access to a highly privileged set of "RAS events" Vue functions which can produce system and LMT trace records, create live dumps, and even lead to the system abend. This privilege must be very carefully controlled.
- aix.ras.probevue
- This authorization grants all dynamic tracing privileges and is equivalent to all the preceding authorizations combined.
The superuser (or root) has all these authorizations assigned by default. Other users will need to have authorizations assigned to them by first creating a role with a set of authorizations and assigning the role to the user. The user will also need to switch roles to a role that has the required authorizations defined for dynamic tracing before invoking the probevue command. The following script is an example of how to provide user "joe" authorization to enable user space and system call probes for processes started by "joe".
mkrole authorizations=
"aix.ras.probevue.trace.user.self,aix.ras.probevue.trace.syscall.self"
apptrace
chuser roles=apptrace joe
setkst -t role # Copy roles to kernel (Or wait until system reboots)
User "joe" can be set up to always have all roles acquired by default when logging in or can switch to the role as needed using the following command:
swrole apptrace
ProbeVue privileges
The privileges that are available for ProbeVue are listed in the following table. A description of each privilege and the authorizations that map to that privilege is provided. Privileges form a hierarchy where the parent privilege contains all of the rights that are associated with the privileges of its children, but it can include additional privileges also.
Privilege | Description | Authorizations | Associated command |
---|---|---|---|
PV_PROBEVUE_ TRC_USER_SELF | Allows a process to enable dynamic user space probe points on another process with the same real user ID. | aix.ras.probevue.trace.user.self aix.ras.probevue.trace.user aix.ras.probevue.trace aix.ras.probevue | probevue |
PV_PROBEVUE_ TRC_USER | Allows a process to enable dynamic user space probe points. Includes the PV_PROBEVUE_ TRC_USER_SELF privilege. | aix.ras.probevue.trace.user aix.ras.probevue.trace aix.ras.probevue | probevue |
PV_PROBEVUE_ TRC_SYSCALL_SELF | Allows a process to enable dynamic system call probe points on another process with the same real user ID. | aix.ras.probevue.trace.syscall.self aix.ras.probevue.trace.syscall aix.ras.probevue.trace aix.ras.probevue | probevue |
PV_PROBEVUE_ TRC_SYSCALL | Allows a process to enable dynamic system call
space probe points. Includes the PV_PROBEVUE_ TRC_SYSCALL_ SELF privilege. |
aix.ras.probevue.trace.syscall aix.ras.probevue.trace aix.ras.probevue | probevue |
PV_PROBEVUE _TRC_KERNEL | Allows a process to access kernel data when dynamic tracing. | aix.ras.probevue.trace aix.ras.probevue | probevue |
PV_PROBEVUE_ MANAGE | Allows a process to administer ProbeVue. | aix.ras.probevue.manage aix.ras.probevue | probevctrl |
PV_PROBEVUE_ RASE | Authorizes the use of the restricted "RAS events" functions. | aix.ras.probevue.rase aix.ras.probevue | probevue |
PV_PROBEVUE_ | Equivalent to all the preceding privileges (PV_PROBEVUE_*) combined. | aix.ras.probevue | probevue probevctrl |
ProbeVue parameters
All ProbeVue parameters can be modified through the SMIT interface (use the "smit probevue" fast path) or directly through the probevctrl command. ProbeVue can be stopped if there are no active dynamic tracing sessions and it can be restarted after stopping it without requiring a reboot. ProbeVue can fail to stop if any sessions that used thread-local variables had been previously active.
The following table summarizes the parameters defined for dynamic tracing sessions. In the description, a privileged user refers to the superuser or a user with the aix.ras.probevue.trace authorization and a non-privileged user is one who does not have this authorization.
Description as in SMIT | Maximum value | Initial high configuration value | Initial low configuration value | Minimum value | Associated command |
---|---|---|---|---|---|
MAX pinned memory for ProbeVue framework | 64 GB | 10% of available memory or the maximum value, whichever is smaller. | 16 MB | 3 MB | Maximum pinned memory in MB that is allocated for ProbeVue
data structures, including per-CPU stacks and per-CPU local table
regions and by all dynamic tracing sessions. It does not include any
memory allocated by Probe Managers. Note: Although, this parameter
can be modified at any time, the value takes effect only the next
time ProbeVue is started.
|
Default per-CPU trace buffer size | 256 MB | 128 KB | 8 KB | 4 KB | Default size in KB of per-CPU trace buffer. Two trace buffers are allocated per CPU for each dynamic tracing session by ProbeVue, one active and used by the writer or the Vue program when it captures trace data and one inactive and used by the reader or the trace consumer. For example, on an 8-way with per-CPU trace buffer size set to 16 KB, the total memory consumed by the trace buffers for a ProbeVue session is 256 KB. You can specify a different buffer size (larger or smaller) when you start the probevue command until it is within the session memory limits. |
MAX pinned memory for regular user sessions | 64 GB | 2 MB | 2 MB | 0 MB | Maximum pinned memory allocated for a non-privileged user ProbeVue session including memory for the per-CPU trace buffers. A value of 0 effectively disables all non-privileged users. Privileged users have no limits on the memory used by their ProbeVue sessions. However, they are still limited by the maximum pinned memory allowed for the ProbeVue framework. |
MIN trace buffer read rate for regular user | 5000 ms | 100 ms | 100 ms | 10 ms | The minimum period, in milliseconds, that a non-privileged user can request the trace consumer to check for trace data. This value is internally rounded to the next highest multiple of 10 milliseconds. Privileged users are not limited by this parameter, but the fastest read rate that they can specify is 10 milliseconds. |
Default trace buffer read rate | 5000 ms | 100 ms | 100 ms | 10 ms | The default period in milliseconds that the in-memory trace buffers are checked for trace data by the trace consumer. You can specify a different read rate (larger or smaller) when starting the probevue command until it is larger than the minimum buffer read rate. |
MAX concurrent sessions for regular user | 8 | 1 | 1 | 0 | Number of concurrent ProbeVue sessions allowed for a non-privileged user. A value of zero effectively disables all non-privileged users. |
Size of per-CPU computation stack | 256 KB | 20 KB | 12 KB | 8 KB | The size of the per-CPU computation stack used by ProbeVue
when issuing the Vue script. The value is rounded to the next highest
multiple of 8 KB. ProbeVue allocates a single stack per-CPU for all
ProbeVue sessions. The memory consumed for the stacks is not included
in the per-session limits. Note: Although, this parameter
can be modified at any time, the value takes effect only after AIX kernel boot image is rebuilt
and rebooted. You have to configure ProbeVue stack to use 96K virtual
memory to get the current directory listing.
|
Size of per-CPU local table size | 256 KB | 32 KB | 4 KB | 4 KB | The size of the per-CPU local table used by ProbeVue for saving
variables of automatic class and for saving temporary variables. ProbeVue
uses half of this area for automatic variables and the remaining half
for saving temporary variables. The value is always rounded to the
next highest multiple of 4 KB. ProbeVue allocates a single local table
and a single temporary table per-CPU used by all ProbeVue sessions.
The memory consumed for the local tables is not included in the per-session
limits. Note: Although, this parameter can be modified at any time,
the value takes effect only the next time ProbeVue is started.
|
MIN interval allowed in an interval probe | N/A | 1 | 1 | Minimum timer interval, in milliseconds, allowed for global root user in interval probes. | |
Number of threads to be traced | N/A | 32 | 32 | 1 | Maximum number of threads that a ProbeVue session can support when it has thread-local variables. The ProbeVue framework allocates the thread-local variables to the maximum number of threads that are specified with this attribute, at the start of the session. If more than the specified number of threads hit the probe that has a thread-local variable, the ProbeVue session is abruptly stopped. |
Number of page faults to be handled | 1024 | 0 | 0 | 0 | Number of page fault contexts for handling page faults for the entire framework. A page fault context includes stack and local table for saving automatic class variables and temporary variables. A page fault context is required to access the paged-out data. If there are no page fault context that is free at the time of a page fault, ProbeVue does not fetch the paged-out data. |
Maximum probe execution time for systrace probes when fired in interrupt context | N/A | 0 | 0 | 0 | This number limits the maximum time, in milliseconds, a systrace probe executing in interrupt context can take. By default, the value is zero, which means the systrace probe can any time. |
Maximum probe execution time for io probes when fired in interrupt context | N/A | 0 | 0 | 0 | This number limits the maximum time, in milliseconds, an io probe executing in interrupt context can take. By default, the value is zero, which means it can any time |
Maximum probe execution time for sysproc probes when fired in interrupt context | N/A | 0 | 0 | 0 | This number limits the maximum time, in milliseconds, a sysproc probe executing in interrupt context can take. By default, the value is zero, which means it can any time. |
Maximum probe execution time for network probes when fired in interrupt context | N/A | 0 | 0 | 0 | This number limits the maximum time, in milliseconds, a network probe executing in interrupt context can take. By default, the value is zero, which means it can any time. |
Max network buffer size | 64 KB | 64 bytes | 96 bytes | 96 bytes | This value is a pre-allocated buffer size (in bytes) used by network probe manager for bpf probe points. This value is allocated when the first bpf probe is enabled and exists in the system till the last bpf probe is disabled. When the last bpf probe type is disabled, this buffer is released. This buffer is used to copy the data when packet data is spanned across multiple packet buffers. |
Profiling ProbeVue Session
The ProbeVue framework provides a profiling facility that can be turned on or off to estimate the impact of enabled probes on the application. This facility accumulates the time taken by probe actions when they are started and reports when requested or when the session ends.
The profiling report displays the probe string and the time taken by the action corresponding to that probe string. The time that is consumed by the probe action is maintained as a list where the data collected is total, minimum, maximum, and average time taken by probe action. Profiling data also displays number of times that the probe action was timed. When you are looking up the profile for multiple functions through one probe string (by using regular expression or * in place of function name), profiling data provides an accumulated data of probes started for all such functions. It does not provide timing details for functions that are probed separately but only per-probe action.
The BEGIN
and END
probe
actions are not profiled with this facility. These profiling details
are session-specific details. You can enable probevue session profiling
along with session start by using the probevue command
or probevctrl command.
For more information, see the probevue and probevctrl commands.
Sample programs
Example 1
The following canonical "Hello World" program prints "Hello World" into the trace buffer and exits:
#!/usr/bin/probevue
/* Hello World in probevue */
/* Program name: hello.e */
@@BEGIN
{
printf("Hello World\n");
exit();
}
Example 2
The following "Hello World" program prints "Hello World" when you types Ctrl-C on the keyboard:
#!/usr/bin/probevue
/* Hello World 2 in probevue */
/* Program name: hello2.e */
@@END
{
printf("Hello World\n");
}
Example 3
The following program shows how to use thread-local variables. This Vue script counts the number of bytes written to a particular file. It assumes that the processes are single-threaded or those threads that open files are the same ones that write to them. It also assumes that all write operations are successful. The script can be terminated at any time and you can obtain the current count of bytes written by typing Ctrl-C on the terminal.
#!/usr/bin/probevue
/* Program name: countbytes.e */
int open( char * Path, int OFlag, int mode );
int write( int fd, char * buf, int sz);
int done;
@@syscall:*:open:entry
when (done != 0 )
{
if (get_userstring(__arg1, -1) == "/tmp/foo") {
thread:trace = 1;
done = 1;
}
}
@@syscall:*:open:exit
when (thread:trace)
{
thread:fd = __rv;
}
@@syscall:*:write:entry
when (thread:trace && __arg1 == thread:fd)
{
bytes += __arg3; /* number of bytes is third arg */
}
@@END
{
printf("Bytes written = %d\n", bytes);
}
Example 4
The following tentative tracing program shows how to trace the arguments passed to the read system call only if it returns zero bytes when reading the foo.data file:
#!/usr/bin/probevue
/* File: ttrace.e */
/* Example of tentative tracing */
/* Capture parameters to read system call only if read fails */
int open ( char* Path, int OFlag , int mode );
int read ( int fd, char * buf, int sz);
@@syscall:*:open:entry
{
filename = get_userstring(__arg1, -1);
if (filename == "foo.data") {
thread:open = 1;
start_tentative("read");
printf("File foo.data opened\n");
}
}
@@syscall:*:open:exit
when (thread:open == 1)
{
thread:fd = __rv;
start_tentative("read");
printf("fd = %d\n", thread:fd);
thread:open = 0;
}
@@syscall:*:read:entry
when (__arg1 == thread:fd)
{
start_tentative("read");
printf("Read fd = %d, input buffer = 0x%08x, bytes = %d,",
__arg1, __arg2, __arg3);
end_tentative("read");
thread:read = 1;
}
@@syscall:*:read:exit
when (thread:read == 1)
{
if (__rv < 0) {
/* The printf below, even though non-tentative, is only
* executed in error cases and merges with the
* previously printed tentative data
*/
printf(" errno = %d\n", __errno);
commit_tentative("read");
}
else
discard_tentative("read");
thread:read = 0;
}
A possible output if the read failed because a bad address (say 0x1000) was passed as input buffer pointer could look like the following output:
#probevue ttrace.e
File foo.data opened
fd = 4
Read fd = 4, input buffer = 0x00001000, bytes = 256, errno = 14
Example 5
The following Vue script prints the values of some kernel variables and exits immediately. Pay attention to the exit function in the @@BEGIN probe:
/* File: kernel.e */
/* Example of accessing kernel variables */
/* System configuration structure from /usr/include/sys/systemcfg.h */
struct system_configuration {
int architecture; /* processor architecture */
int implementation; /* processor implementation */
int version; /* processor version */
int width; /* width (32 || 64) */
int ncpus; /* 1 = UP, n = n-way MP */
int cache_attrib; /* L1 cache attributes (bit flags) */
/* bit 0/1 meaning */
/* -------------------------------------*/
/* 31 no cache / cache present */
/* 30 separate I and D / combined */
int icache_size; /* size of L1 instruction cache */
int dcache_size; /* size of L1 data cache */
int icache_asc; /* L1 instruction cache associativity */
int dcache_asc; /* L1 data cache associativity */
int icache_block; /* L1 instruction cache block size */
int dcache_block; /* L1 data cache block size */
int icache_line; /* L1 instruction cache line size */
int dcache_line; /* L1 data cache line size */
int L2_cache_size; /* size of L2 cache, 0 = No L2 cache */
int L2_cache_asc; /* L2 cache associativity */
int tlb_attrib; /* TLB attributes (bit flags) */
/* bit 0/1 meaning */
/* -------------------------------------*/
/* 31 no TLB / TLB present */
/* 30 separate I and D / combined */
int itlb_size; /* entries in instruction TLB */
int dtlb_size; /* entries in data TLB */
int itlb_asc; /* instruction tlb associativity */
int dtlb_asc; /* data tlb associativity */
int resv_size; /* size of reservation */
int priv_lck_cnt; /* spin lock count in supevisor mode */
int prob_lck_cnt; /* spin lock count in problem state */
int rtc_type; /* RTC type */
int virt_alias; /* 1 if hardware aliasing is supported */
int cach_cong; /* number of page bits for cache synonym */
int model_arch; /* used by system for model determination */
int model_impl; /* used by system for model determination */
int Xint; /* used by system for time base conversion */
int Xfrac; /* used by system for time base conversion */
int kernel; /* kernel attributes */
/* bit 0/1 meaning */
/* -----------------------------------------*/
/* 31 32-bit kernel / 64-bit kernel */
/* 30 non-LPAR / LPAR */
/* 29 old 64bit ABI / 64bit Large ABI */
/* 28 non-NUMA / NUMA */
/* 27 UP / MP */
/* 26 no DR CPU add / DR CPU add support */
/* 25 no DR CPU rm / DR CPU rm support */
/* 24 no DR MEM add / DR MEM add support */
/* 23 no DR MEM rm / DR MEM rm support */
/* 22 kernel keys disabled / enabled */
/* 21 no recovery / recovery enabled */
/* 20 non-MLS / MLS enabled */
long long physmem; /* bytes of OS available memory */
int slb_attr; /* SLB attributes */
/* bit 0/1 meaning */
/* -----------------------------------------*/
/* 31 Software Managed */
int slb_size; /* size of slb (0 = no slb) */
int original_ncpus; /* original number of CPUs */
int max_ncpus; /* max cpus supported by this AIX image */
long long maxrealaddr; /* max supported real memory address +1 */
long long original_entitled_capacity;
/* configured entitled processor capacity */
/* at boot required by cross-partition LPAR */
/* tools. */
long long entitled_capacity; /* entitled processor capacity */
long long dispatch_wheel; /* Dispatch wheel time period (TB units) */
int capacity_increment; /* delta by which capacity can change */
int variable_capacity_weight; /* priority weight for idle capacity*/
/* distribution */
int splpar_status; /* State of SPLPAR enablement */
/* 0x1 => 1=SPLPAR capable; 0=not */
/* 0x2 => SPLPAR enabled 0=dedicated; */
/* 1=shared */
int smt_status; /* State of SMT enablement */
/* 0x1 = SMT Capable 0=no/1=yes */
/* 0x2 = SMT Enabled 0=no/1=yes */
/* 0x4 = SMT threads bound true 0=no/1=yes */
int smt_threads; /* Number of SMT Threads per Physical CPU */
int vmx_version; /* RPA defined VMX version, 0=none/disabled */
long long sys_lmbsize; /* Size of an LMB on this system. */
int num_xcpus; /* Number of exclusive cpus on line */
signed char errchecklevel;/* Kernel error checking level */
char pad[3]; /* pad to word boundary */
int dfp_version; /* RPA defined DFP version, 0=none/disabled */
/* if MSbit is set, DFP is emulated */
};
__kernel struct system_configuration _system_configuration;
@@BEGIN
{
String s[40];
int j;
__kernel int max_sdl; /* Atomic RAD system decomposition level */
__kernel long lbolt; /* Ticks since boot */
printf("No. of online CPUs\t\t= %d\n", _system_configuration.ncpus);
/* Print SMT status */
printf("SMT status\t\t\t=");
if (_system_configuration.smt_status == 0)
printf(" None");
else {
if (_system_configuration.smt_status & 0x01)
printf(" Capable");
if (_system_configuration.smt_status & 0x02)
printf(" Enabled");
if (_system_configuration.smt_status & 0x04)
printf(" BoundThreads");
}
printf("\n");
/* Print error checking level */
if (_system_configuration.errchecklevel == 1)
s = "Minimal";
else if (_system_configuration.errchecklevel == 3)
s = "Normal";
else if (_system_configuration.errchecklevel == 7)
s = "Detail";
else if (_system_configuration.errchecklevel == 9)
s = "Maximal";
printf("Error checking level\t\t= %s\n",s);
printf("Atomic RAD system detail level\t= %d\n", max_sdl);
/* Long in the kernel is 64-bit, so we use %lld below */
printf("Number of ticks since boot\t= %lld\n", lbolt);
exit();
}
The following output is a possible output when you run the preceding script on a Power 5 dedicated partition with default kernel attributes:
# probevue kernel.e
No. of online CPUs = 4
SMT status = Capable Enabled BoundThreads
Error checking level = Normal
Atomic RAD system detail level = 2
Number of ticks since boot = 34855934
Probe managers
The probe manager is not part of the basic ProbeVue framework, but is, nevertheless, an essential component of dynamic tracing. Probe Managers are the providers of the probe points that can be instrumented by ProbeVue.
Probe managers generally support a set of probe points that belong to some common domain and share some common feature or attribute that distinguishes them from other probe points. Probe points are useful at points where control flow changes significantly, at points of state change or at other points of significant interest. Probe managers are careful to select probe points only in locations that are safe to instrument.
Probe managers can choose to define their own distinct rules for the probe specifications within the common style that must be followed for all probe specifications.
ProbeVue supports the following probe managers:
- System call probe manager
- User function probe manager
- Interval probe manager
- System trace probe manager
- Extended System Call probe manager
- I/O probe manager
- Network probe manager
- Sysproc probe manager
System call probe manager
The syscall probe manager supports probes at the entry and exit of well-defined and documented base AIX system calls. These are the system calls that have the same interface at the libc.a (or C library) entry point and in the kernel entry point. Either the system call is a pass-through (the C library simply imports the symbol from the kernel and the exports it with no code in the library) or there is trivial code for the interface inside the library.
The syscall probe manager accepts a 4-tuple probe specification in one of the following formats:
-
syscall:*:<system_call_name>:entry
-
syscall:*:<system_call_name>:exit
Additionally, the syscall probe manager also accepts a 4-tuple probe specification in one of the following formats:
-
syscall:<process_ID>:<system_call_name>:entry
-
syscall:<process_ID>:<system_call_name>:exit
where a process ID can be specified as the second field of the probe specification to support probing of specific processes.
The system call names accepted by the syscall probe manager are the names of the libc.a interfaces and not the kernel's internal system call names. For example, the read subroutine is exported by libc.a, but the actual system call name or kernel entry point is kread. The syscall probe manager will internally translate a libc interface to its kernel entry point and enable the probe at entry into the kread kernel routine. Because of this, if multiple C library interfaces invoke the kread routine, the probe pointfires for those interfaces also. Generally, this is not a problem because for most of the system calls supported by the syscall probe manager, there is a 1-to-1 mapping between the libc interface and the kernel routine.
For each syscall probe, there is an equivalent probe point in the library code provided by the uft probe manager. The uft probe manager does support all library interfaces (unless it is a passthrough interface and there is no code for the call or references to it in the library at all) including those not supported by the syscall probe manager. However, the syscall probe manager has two advantages:
- The syscall probe manager can probe every process in the system by specifying asterisk as the second field.
- The syscall probe manager is more efficient than the uft probe manager because it does not need to switch from user mode to kernel mode and back to run the probe actions.
For more information about the full list of system calls supported by the syscall probe manager see ProbeVue.
UFT probe manager
The uft or the user function tracing probe manager supports probing user space functions that are visible in the XCOFF symbol table of a process. The uft probe manager supports probe points that are at entry and exit points of functions whose source is a C or FORTRAN language text file even though the symbol table can contain symbols whose sources are from a language other than C or FORTRAN.
The tracing of Java™ applications in a way identical to the existing tracing mechanism from the users point of view and the JVM is one that performs most of the real tasks on behalf of Probevue.
For probing java application see "Java Applications Probe Manager" below.
The uft probe manager accepts a 5-tuple probe specification in the following format:
uft:<processID>:*:<function_name>:<entry|exit>
When the third field is set to *, the UFT probe manager searches the function in all of the modules loaded into the process address space including the main executable and shared modules. This implies that if a program contains more than one C function with this name (for example, functions with static class that are contained in different object modules), then probes will be applied to the entry point of every one of these functions.
If a function name in a specific module needs to be probed, the module name needs to be specified in the third field. The probe specification syntax to provide the library module name is illustrated below:
# Function foo in any module
@@uft:<pid>:*:foo:entry
# Function foo in any module in any archive named libc.a
@@uft:<pid>:libc.a:foo:entry
# Function foo in the shr.o module in any archive named libc.a
@@uft:<pid>:libc.a(shr.o):foo:entry
The function name in the fourth tuple can be specified as an Extended Regular Expression (ERE). The ERE should be enclosed between "/ and /" like "/<ERE>/".
/* Probe entry of all libc.a functions starting with “malloc” word */
@@uft:$__CPID:libc.a: “/^malloc.*/”:entry
/* Probe exit of all functions in the executable a.out */
@@uft:$__CPID:a.out:”/.*/”:exit
In the entry probes, where a function name is specified as a regular expression, individual arguments cannot be accessed. However, probevue function print_args can be used to print the function name and its arguments. The argument values is printed based on the argument type information available in the traceback table of the function.
In the exit probes, where a function name is specified as a regular expression, return value cannot be accessed.
Probevue supports enabling probes in more than one process at the same time. However, you will need privileges even for probing processes that belong to you.
Probevue enforces a restriction that prevents processes with user-space probes from being debugged using the ptrace or procfs based APIs.
As indicated above, the uft probe manager supports probes in shared modules like shared library modules. The following script shows an example that traces mutex activity by enabling probes in the thread library's mutex lock and unlock subroutines.
/* pthreadlocks.e */
/* Trace pthread mutex activity for a given multithreaded process */
/* The following defines are from /usr/include/sys/types.h */
typedef long long pid_t;
typedef long long thread_t;
typedef struct {
int __pt_mutexattr_status;
int __pt_mutexattr_pshared;
int __pt_mutexattr_type;
} pthread_mutexattr_t;
typedef struct __thrq_elt thrq_elt_t;
struct __thrq_elt {
thrq_elt_t *__thrq_next;
thrq_elt_t *__thrq_prev;
};
typedef volatile unsigned char _simplelock_t;
typedef struct __lwp_mutex {
char __wanted;
_simplelock_t __lock;
} lwp_mutex_t;
typedef struct {
lwp_mutex_t __m_lmutex;
lwp_mutex_t __m_sync_lock;
int __m_type;
thrq_elt_t __m_sleepq;
int __filler[2];
} mutex_t;
typedef struct {
mutex_t __pt_mutex_mutex;
pid_t __pt_mutex_pid;
thread_t __pt_mutex_owner;
int __pt_mutex_depth;
pthread_mutexattr_t __pt_mutex_attr;
} pthread_mutex_t;
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
@@uft:$__CPID:*:pthread_mutex_lock:entry
{
printf("thread %d: mutex 0x%08x locked\n", __tid, __arg1);
}
@@uft:$__CPID:*:pthread_mutex_unlock:entry
{
printf("thread %d: mutex 0x%08x unlocked\n", __tid, __arg1);
}
The probe specification, argument access and ProbeVue functions usage in probe actions for Fortran function probes is similar to other uft probes with the following differences:
- User has to map the Fortran data types to ProbeVue data types and
use the same in the script. The mapping of Fortran basic data types
to ProbeVue data types
is listed in the below table.
Table 3. Fortran to ProveVue data types mapping Fortran data-type ProbeVue data-type INTEGER * 2 short INTEGER * 4 int/long INTEGER * 8 long long REAL float DOUBLE PRECISION double COMPLEX No equivalent basic data type. This needs to be mapped to a structure as shown below: typedef struct complex { float a; float b; } COMPLEX;
LOGICAL int (The Fortran standard requires logical variables to be the same size as INTEGER/REAL variables) CHARACTER char BYTE signed char - Fortran passes IN scalar arguments of internal procedures by value, and other arguments by reference. Arguments passed by reference should be accessed with copy_userdata(). More information on argument association in fortran can be found in the Argument association topic.
- Routine names in a Fortran program is case in-sensitive. But,
while specifying them in a ProbeVue script, they should
be in lower-case . The following sample script illustrates how to map Fortran data types to ProbeVue data types:
/* cmp_calc.e */ /* Trace fortran routines cmp_calc(COMPLEX, INTEGER) and cmplxd(void) */ typedef struct complex{ float a; float b; } COMPLEX; typedef int INTEGER; /* arguments are indicated to be of pointer type as they are passed by reference */ void cmp_calc(COMPLEX *, INTEGER *); void cmplxd(); @@uft:$__CPID:*:cmplxd:entry { printf("In cmplxd entry \n"); } @@uft:$__CPID:*:cmp_calc:entry { COMPLEX c; int i; copy_userdata(__arg1, c); copy_userdata(__arg2, i); printf("%10.7f+j%9.7f %d \n", c.a,c.b,i); }
- Fortran stores arrays in column-major form, whereas ProbeVue stores in row-major
form and the below script shows how users can retrieve the array elements.
/* array.e*/ /* ProbeVue script to probe fortran program array.f */ void displayarray(int **, int, int); @@uft:$__CPID:*:displayarray:entry { int a[5][4]; /* row and column sizes are interchanged */ copy_userdata(__arg1, a); /* to print the first row */ printf("%d %d %d \n”, a[0][0], a[1][0], a[2][0]); /* to print the second row */ printf(“%d %d %d\n", a[0][1], a[1][1], a[2][1]); } /* Fortran program array.f */ PROGRAM ARRAY_PGM IMPLICIT NONE INTEGER, DIMENSION(1:4,1:5) :: Array INTEGER :: RowSize, ColumnSize CALL ReadArray(Array, RowSize, ColumnSize) CALL DisplayArray(Array, RowSize, ColumnSize) CONTAINS SUBROUTINE ReadArray(Array, Rows, Columns) IMPLICIT NONE INTEGER, DIMENSION(1:,1:), INTENT(OUT) :: Array INTEGER, INTENT(OUT) :: Rows, Columns INTEGER :: i, j READ(*,*) Rows, Columns DO i = 1, Rows READ(*,*) (Array(i,j), j=1, Columns) END DO END SUBROUTINE ReadArray SUBROUTINE DisplayArray(Array, Rows, Columns) IMPLICIT NONE INTEGER, DIMENSION(1:,1:), INTENT(IN) :: Array INTEGER, INTENT(IN) :: Rows, Columns INTEGER :: i, j DO i = 1, Rows WRITE(*,*) (Array(i,j), j=1, Columns ) END DO END SUBROUTINE DisplayArray END PROGRAM ARRAY_PGM
- Intrinsic or built-in functions cannot be probed with ProbeVue . All FORTRAN routines as listed in the XCOFF symbol
table of the executable/linked libraries can be probed. ProbeVue uses the XCOFF symbol table to
identify the location of these routines. However, the prototype for the routine has to be provided
by the user and ProbeVue tries to access the arguments
according to the prototype provided. For routines where the compiler mangles the routine names, the
mangled name should be provided. Since Vue is a C-style language, user should ensure that the
FORTRAN function/subroutine prototype is appropriately mapped to C language style function
prototype. Please refer to the linkage conventions for argument passing and function return values
in the Passing data from one language to another topic. The below example illustrates
this:
/* Fortran program ext_op.f */ /* Operator “*” is extended for rational multiplication */ MODULE rational_arithmetic IMPLICIT NONE TYPE RATNUM INTEGER :: num, den END TYPE RATNUM INTERFACE OPERATOR (*) MODULE PROCEDURE rat_rat, int_rat, rat_int END INTERFACE CONTAINS FUNCTION rat_rat(l,r) ! rat * rat TYPE(RATNUM), INTENT(IN) :: l,r TYPE(RATNUM) :: val,rat_rat val.num=l.num*r.num val.den=l.den*r.den rat_rat=val END FUNCTION rat_rat FUNCTION int_rat(l,r) ! int * rat INTEGER, INTENT(IN) :: l TYPE(RATNUM), INTENT(IN) :: r TYPE(RATNUM) :: val,int_rat val.num=l*r.num val.den=r.den int_rat=val END FUNCTION int_rat FUNCTION rat_int(l,r) ! rat * int TYPE(RATNUM), INTENT(IN) :: l INTEGER, INTENT(IN) :: r TYPE(RATNUM) :: val,rat_int val.num=l.num*r val.den=l.den rat_int=val END FUNCTION rat_int END MODULE rational_arithmetic PROGRAM Main1 Use rational_arithmetic IMPLICIT NONE TYPE(RATNUM) :: l,r,l1 l.num=10 l.den=11 r.num=3 r.den=4 L1=l*r END PROGRAM Main1 /* ext_op.e */ /* ProbeVue script to probe routine that gets called when “*” is used to multiply rational numbers in ext_op.f */ struct rat { int num; int den; }; struct rat rat; void __rational_arithmetic_NMOD_rat_rat(struct rat*, struct rat*,struct rat*); /* Note that the mangled function name is provided. */ /* Also, the structure to be returned is sent in the buffer whose address is provided as the first argument. */ /* The first explicit parameter is in the second argument. */ @@BEGIN { struct rat* rat3; } @@uft:$__CPID:*:__rational_arithmetic_NMOD_rat_rat:entry { struct rat rat1,rat2; copy_userdata((struct rat *)__arg2,rat1); copy_userdata((struct rat *)__arg3,rat2); rat3=__arg1; /* The address of the buffer where the returned structure will be stored is saved at the function entry */ printf("Argument Passed rat_rat = %d:%d,%d:%d\n",rat1.num,rat1.den,rat2.num,rat2.den); } @@uft:$__CPID:*:__rational_arithmetic_NMOD_rat_rat:exit { struct rat rrat; copy_userdata((struct rat *)rat3,rrat); /* The saved buffer address is used to fetch the returned structure */ printf("Return from rat_rat = %d:%d\n",rrat.num,rrat.den); exit(); }
- ProbeVue won’t support direct inclusion of Fortran header files in the script. However, a mapping of Fortran data types to ProbeVue data types can be provided in a ProbeVue header file and included with the “-I’’ option.
C++ applications probe manager
C++ Probe Manager supports probing of C++ applications in a way identical to C probe managers. Support for "uft" style entry/exit probes on any C++ function, including member, overloaded, operator, and template functions in the core executable. A function entry/exit probe in C++ must use the @@uftxlc++ probe manager.
All tuples in the @@uftxlc++ style probe specifications have the same usage and format as for the @@uft style probe strings, with the exception of the function name. Because C++ allows a single function name to be overloaded, the function name specified in the probe string may have to include the function's argument types to uniquely identify the function being probed.
For example:
@@uftxlc++:12345:*:"foobar(int, char *)":entry
@@uftxlc++:12345:*:void foobar<int>(int, char *):entry
When probing a class member function or a function defined in a namespace, the fully qualified function name must be used in the probe string. To avoid any ambiguity between the single colon (:) tuple separator in probe strings and the double colon (::) scope resolution operator in a fully qualified C++ name, the entire function name tuple in the probe string must be quoted.
@@uftxlc++:12345:*:"Foo::bar(int)":entry
- Access to data fields that are inherited from a virtual base class is not supported.
- Template classes are not supported and must not be included in the C++ header.
- Pointers to members are not supported.
- To probe a class with the class definition, an object of the class is instantiated in the header file either as a global object or in a dummy function.
Example:
Below is c++ application
#include "header.cc"
main()
{
int i = 10;
incr_num(i);
float a = 3.14;
incr_num(a);
char ch = 'A';
incr_num(ch);
double d = 1.11;
incr_num(d);
}
Content of the "header.cc"
# cat header.cc
#include <iostream.h>
template <class T>
T incr_num( T a)
{
return (++a);
}
int dummy()
{
int i=10,j=20;
incr_num(i);
float a=3.14;
incr_num(a);
char ch ='A',dh='Z';
incr_num(ch);
double d=1.1,e=1.11;
incr_num(d);
return 0;
}
Content of the Vue script vue_cpp.e
##C++
#include "header.cc"
##Vue
@@uftxlc++:$__CPID:*:"incr_num<int>(int)":entry
{
printf("Hello1_%d\n",__arg1 );
}
@@uftxlc++:$__CPID:*:"incr_num < float > (float)" :entry
{
printf("Hello2_%f\n",__arg1 );
}
@@uftxlc++:$__CPID:*:"incr_num < char > ( char )":entry
{
printf("Hello3_%c\n",__arg1 );
}
@@uftxlc++:$__CPID:*:"incr_num < double > ( double )":entry
{
printf("Hello4_%lf\n",__arg1 );
exit();
}
Execution :
/usr/vacpp/bin/xlC app.c++
# probevue -X ./a.out vue_cpp.e
Hello1_10
Hello2_3.140000
Hello3_A
Hello4_1.110000
/* Probe entry of all the C++ functions in the executable a.out */
@@uftxlc++:$__CPID:a.out:”/.*/”:entry
/* Probe exit of all the C++ functions with ‘foo’ word in it */
@@uftxlc++:$__CPID:*:”/foo/”:exit
In the entry probes, where a function name is specified as a regular expression, individual arguments cannot be accessed. However, probevue function print_args() can be used to print the function name and its arguments. The argument values is printed based on the argument type information available in the traceback table of the function.
In the exit probes, where a function name is specified as a regular expression, return value cannot be accessed.
Java applications probe manager
Java Probe Manager (JPM) supports probing of Java applications in a way identical to C and C++ probe managers. A single Vue script should be able to trace multiple java applications at the same time by using different process IDs of the JVMs. The same script can be used to probe syscalls or C/C++ applications along with Java applications and can use other probe managers.
Like uft (user function tracing) probe manager java probe manager also accepts 5-tuple probe specification in the following format:
uftjava :< process_ID> :*:< _qualified_function_name >: entry
Where the second tuple is the process ID of JVM process corresponding to the Java application that is being traced.
Third field: reserved for future use.
Fourth field: where the java method needs to be specified.
This name is a completely qualified name as used in java applications like Mypackage.Myclass.Mymethod.
Some of the restrictions that may apply are
- Only pure java methods can be probed, Native (shared library calls) or encrypted codes are not traceable.
- Only entry probes are supported.
- Can support only JVM v 1.5 and above that supports JVMTI interface.
- At any given point of time, no two Probevue sessions can probe the same Java application with @@uftjava.
- Polymorphic/Overloaded methods are not supported.
- Tracing/accessing external variables with same name as any of the Probevue keywords or built-in names are not supported. This may need those external symbols (Java application variable names) to be renamed.
- Accessing arrays of java applications is not supported in this release.
- Accessing arrays of java applications is not supported in this release.
- get_function () built-in for java language is not supported in this release.
Data Access: The action blocks of java probes can access the following data similar to existing behavior.
- Action block can access global, local and kernel script variables.
- Action block can access method arguments (Entry class variables) of primitive types.
- Action block can access the built-in variables.
- Action block can access Java application
variables through fully qualified names, only static (class members).
x = some_package.app.class.var_x; //Access static/class member.
- Accessing java application primitive types variables is supported; they must be converted/promoted/casted implicitly without losing value to equivalent types in Vue language. But the actual memory usage (size) may differ from that of Java language.
The functions supported in the context of Java probe manager are listed in the following table:
Function | Description |
---|---|
stktrace() | Provides the Stack trace of the Java application (running thread) that is being traced. |
copy_userdata() | Copy data from java application into script variables. |
get_probe() | Returns the probe string. |
get_stktrace | Returns the runtime stack trace. |
get_location_point() | Returns the current probe location. |
get_userstring() | Copy string data from java application. |
exit() | exits from the probevue trace session. |
Changes to Probevue command:
Command | Description |
---|---|
-X option | This option can be used (along with -A option) to launch Java application, in the current release the user has to manually pass an additional optional string agentlib:probevuejava along with all the other options that are needed to run the java application. |
For Example:
probevue -X /usr/java5/bin/java -A -agentlib:probevuejava myjavaapp myscript.e
When running the 64 bit JVM, we have to use "agentlib:probevuejava64" as in:
probevue -X /usr/java5_64/bin/java -A -agentlib:probevuejava64 myjavaapp myscript.e
where myjavaapp is the java class of myjavaapp.java application
Example ExtendedClass.java Source:
class BaseClass
{
static int i=10;
public static void test(int x)
{
i += x;
}
}
public class ExtendedClass extends BaseClass
{
public static void test(int x, String msg)
{
i += x;
System.out.print("Java: " + msg + "\n\n");
BaseClass.test(x);
}
public static void main(String[] args)
{
BaseClass.test(5);
ExtendedClass.test(10, "hello");
}
}
Example test.e script for above Java application:
@@uftjava:$__CPID:*:"BaseClass.test":entry
{
printf("BaseClass.i: %d\n", BaseClass.i);
printf("BaseClass.test: %d\n", __arg1);
stktrace(0, -1);
printf("\n");
}
@@uftjava:$__CPID:*:"ExtendedClass.test":entry
{
printf("BaseClass.i: %d\n", BaseClass.i);
printf("ExtendedClass.test: %d, %s\n", __arg1, __arg2);
stktrace(0, -1);
printf("\n");
}
Example ProbeVue session with above script:
# probevue -X /usr/java5/jre/bin/java \
-A "-agentlib:probevuejava ExtendedClass" test.e
Java: hello
BaseClass.i: 10
BaseClass.test: 5
BaseClass.test()+0
ExtendedClass.main()+1
BaseClass.i: 15
ExtendedClass.test: 10, hello
ExtendedClass.test()+0
ExtendedClass.main()+8
BaseClass.i: 25
BaseClass.test: 10
BaseClass.test()+0
ExtendedClass.test()+39
ExtendedClass.main()+8
Interval probe manager
The interval probe manager provides probe points that fire at a user-defined time-interval. The probe points are not located in kernel or application code, but instead are based on wall clock time interval based probe events.
The interval probe manager is useful for summarizing statistics collected over an interval of time. It accepts a 4-tuple probe specification in the following format:
@@interval:*:clock:<# milliseconds>
The interval probe manager will filter probe events by process ID if it is provided in the second field. Assigning the * to the second field indicates that the probe will be fired for all processes. Further, the only value supported by the interval probe manager for the third field is the clock keyword that identifies the probe specification as being for a wall clock probe. The fourth or last field, that is the <# milliseconds> field, identifies the number of milliseconds between firings of the probe. The interval probe manager requires that the value for this field consist only of digits 0-9. For interval probes without process Id, intervals should be exactly divisible by 100. Thus, probe events that are apart by 100ms, 200ms, 300ms, and so on, are allowed in non-profiling interval probes. For interval probes with process Id specified, intervals should be greater or equal to minimum interval allowed for global root user or exactly divisible by 10 for other users. Thus, probe events that are apart by 10ms, 20ms, 30ms, and so on, are allowed for normal users in profiling interval probes. Only one profiling interval probe can be active for a process.
The interval probe manager requires only basic dynamic tracing privileges. The interval probe manager enforces the following limits on the number of probes it supports to prevent malicious users from running the kernel out of memory by creating huge numbers of interval probes.
Interval | Count |
---|---|
Maximum number of interval probes per user | 32 |
Maximum number of interval probes in system | 1024 |
The interval probe manager does not support the following functions. If used inside an interval manager probe point, these functions will generate an empty string or zero as output.
- get_function
- get_probe
- get_location_point
When process ID is not specified, an interval probe can trigger in the context of any process depending upon when the probe fires since the probe event is based on wall clock time. Because of this, the ProbeVue framework does not allow the use of any of the following functions inside the interval probe manager's action block to prevent unauthorized access to a process's internal data. This security violation is caught only in the kernel. The Vue script will successfully compile but the session will fail to initialize.
- stktrace
- get_userstring
These functions provide no value when used from the probe manager. Even if you are the root user, you cannot call these functions inside the interval probe manager.
When the process ID is specified, the interval probe is triggered for all the threads within the process at the specified time interval. As the probe is fired in the context of the process, stktrace() function and __pname built-in is allowed inside the interval probe manager’s action block, unlike when process ID is not specified.
System trace probe manager
The system trace probe manager provides probe points wherever existing system trace hooks to trace channel zero (system event channel) occur, both within the kernel and within applications. To use this probe manager, you must have the kernel access privilege, and not be running in a WPAR.
The system trace probe manager accepts a 3-tuple probe specification in the following format:
@@systrace:*:<hookid>
where the hookid argument specifies the ID for the specific
system trace hook of interest. The hookid argument consists
of 4 hex digits typically of the form hhh0
. For example,
to specify the hookid argument for the fork system call,
specify 1390. See the /usr/include/sys/trchkid.h file for examples,
such as HKWD_SYSC_FORK. The entries in this file are hook words, where
the hookid value is in the upper halfword. Because hook words
can be arbitrary, no validation of the hookid argument beyond
checking that it is a valid hex string of up to 4 hex digits is performed.
It is not an error to specify a hookid value that never occurs.
As a convenience, you can specify the hookid argument with
fewer than 4 hex digits. In this case, first a trailing zero is assumed,
and then additional leading zeroes as necessary to implicitly define
the required 4 digits. For example, you can use 139
as
an abbreviation of 1390
. Similarly, 0100
, 010
,
and 10
all specify the same hookid value,
taken from HKWD_USER1.
You can specify the hookid argument with the * wildcard character. This will probe all system tracing, with likely unacceptable performance implications. Hence, such a specification must be used only when absolutely necessary.
The second tuple is reserved, and must be specified as an asterisk, as shown.
Only system trace events that actually occur and record system trace data trigger probes. In particular, a system trace probe can only occur when system trace is active. The systrace probe manager is an event-based probe manager. Hence, probe name, function name, and location point are not available. As the hookword is passed to the script, this is not a significant restriction.
A non-root user is limited to at most 64 systrace probes simultaneously enabled. No more than 128 explicit systrace probes can be enabled system-wide.
ProbeVue built-in register variables allow access to the data traced. You cannot use the __arg* variables for this purpose. There are two general styles for system tracing.
The following style is for the trchook(64)/utrchook(64) (or the equivalent TRCHKLx macros in C) hooks:
- __r3 contains the 16 bit hookid.
- __r4 contains the subhookid.
- __r5 contains traced data word D1.
- __r6 contains traced data word D2.
- __r7 contains traced data word D3.
- __r8 contains traced data word D4.
- __r9 contains traced data word D5.
Not all trace hooks contain all 5 data words. Undefined data words from a given trace hook will appear as zero. The Vue clause for a given hook ID must know exactly what and how much data its hook ID traces.
If the trace record was produced by one of the functions in the trcgen or trcgent family, use the following style:
- __r3 contains the 16 bit hookid.
- __r4 contains the subhookid.
- __r5 contains traced data word D1.
- __r6 contains the length of the traced data.
- __r7 contains the address of the traced data.
The following script shows a simple example of the systrace probe manager:
@@systrace:*:1390
{
if (__r4 == 0) { /* normal fork is traced with subhookid zero */
printf(“HKWD_SYSC_FORK: %d forks child %d\n”, __pid, __r5);
exit();
}
}
System trace must be active for the systrace probe to be triggered.
With appropriate privilege, a Vue script can itself generate system trace records using the "RAS events" Vue functions. However, the systrace probe manager does not detect trace records produced through a Vue script.
Extended system call probe manager (syscallx)
The syscallx probe manager, on the other hand, allows all base system calls to be traced. Base system calls is the set of system calls exported by the kernel and base kernel extensions, which are available immediately after boot-up. System calls that are exported from kernel extensions that may loaded later are not supported. Either a specific system call or all system calls can be specified through the probe point tuple. However, unlike the syscall probe manager, the third field of the probe point tuple for the syscallx must identify the actual kernel entry point function. The syscallx probe manager also limit probes to fire in a specific process if the process ID is specified as the second field of the probe point tuple.
The following are some examples:
/* Probe point tuple to probe the read system call entry for all processes */
@@syscallx:*:kread:entry
/* Probe point tuple to probe the fork system call exit for process with ID 434 */
@@syscallx:434:kfork:exit
/* Probe point tuple to probe entry for all base system calls */
@@syscallx:*:*:entry
/* Probe point tuple to probe exit for all base system calls for process 744 */
@@syscallx:744:*:exit
System calls supported by the syscall probe manager
System call name | Kernel entry name |
---|---|
absinterval | absinterval |
accept | accept1 |
bind | bind |
close | close |
creat | creat |
execve | execve |
exit | _exit |
fork | kfork |
getgidx | getgidx |
getgroups | getgroups |
getinterval | getinterval |
getpeername | getpeername |
getpid | _getpid |
getppid | _getppid |
getpri | _getpri |
getpriority | _getpriority |
getsockname | getsockname |
getsockopt | getsockopt |
getuidx | getuidx |
incinterval | incinterval |
kill | kill |
listen | listen |
lseek | klseek |
mknod | mknod |
mmap | mmap |
mq_close | mq_close |
mq_getattr | mq_getattr |
mq_notify | mq_notify |
mq_open | mq_open |
mq_receive | mq_receive |
mq_send | mq_send |
mq_setattr | mq_setattr |
mq_unlink | mq_unlink |
msgctl | msgctl |
msgget | msgget |
msgrcv | __msgrcv |
msgsnd | __msgsnd |
nsleep | _nsleep |
open | kopen |
pause | _pause |
pipe | pipe |
plock | plock |
poll | _poll |
read | kread |
reboot | reboot |
recv | _erecv |
recvfrom | _enrecvfrom |
recvmsg | _erecvmsg |
select | _select |
sem_close | _sem_close |
sem_destroy | sem_destroy |
sem_getvalue | sem_getvalue |
sem_init | sem_init |
sem_open | _sem_open |
sem_post | sem_post |
sem_unlink | sem_unlink |
sem_wait | _sem_wait |
semctl | semctl |
semget | semget |
semop | __semop |
semtimedop | __semtimedop |
send | _esend |
sendmsg | _esendmsg |
sendto | _esendto |
setpri | _setpri |
setpriority | _setpriority |
setsockopt | setsockopt |
setuidx | setuidx |
shmat | shmat |
shmctl | shmctl |
shmdt | shmdt |
shmget | shmget |
shutdown | shutdown |
sigaction | _sigaction |
sigpending | _sigpending |
sigprocmask | sigprocmask |
sigsuspend | _sigsuspend |
socket | socket |
socketpair | socketpair |
stat | statx |
waitpid | kwaitpid |
write | kwrite |
Running in a WPAR
Workload partitions or WPARs are virtualized operating system environments within a single instance of the AIX operating system. The WPAR environment is somewhat different from the standard AIX operating system environment.
Dynamic tracing is supported in the WPAR environment. By default, when creating a WPAR, only the PV_PROBEVUE_TRC_USER_SELF and the PV_PROBEVUE_TRC_USER privileges are assigned to the WPAR and the superuser (root) on a WPAR system will be granted these privileges. An admin user from the global partition can change the value of the default WPAR privilege set or can explicitly assign additional privileges when creating the WPAR.
Privileges on WPAR have generally the same meanings as on a global partition. Be careful when assigning PV_PROBEVUE_TRC_KERNEL or the PV_PROBEVUE_TRC_MANAGE to a WPAR. Any user with PV_PROBEVUE_TRC_KERNEL privilege can access global kernel variables while a user with PV_PROBEVUE_TRC_MANAGE privilege can change the values of ProbeVue parameters or shutdown ProbeVue. These changes affect all users even those in other partitions.
When you issue the probevue command in a WPAR, processes running in other WPARs or in the global partition are not visible to it. Because of this, you can only probe processes in your same WPAR. The probevue command will fail if the probe specification contains a process ID that is outside its partition. The PV_PROBEVUE_TRC_USER and PV_PROBEVUE_TRC_SYSCALL privileges in a WPAR only allow you to probe user space functions or system calls of processes that are in your WPAR. When probing system calls, the second field of the syscall probe specification must be set to a valid WPAR-visible process ID. Assigning the value * to the second field is not supported.
When a ProbeVue session is initiated in a mobile WPAR, it temporarily switches the WPAR to a non-checkpointable state. After the ProbeVue session terminates, the WPAR is checkpointable again.
I/O probe manager
I/O probe manager provides capabilities to trace I/O operation events in various layers of AIX I/O stack. Use the syscall probe manager to trace application I/O request that is triggered by a read/write system call. Use I/O probe manager to probe further into the syscall layer.
Use I/O probe manager to analyze response time of I/O operations of a block device that segregates the service time and queuing delay.
The following layers are supported:
- Logical File System (LFS)
- Virtual File System (VFS)
- Enhanced Journaled File Systems (JFS2)
- Logical Volume Manager (LVM)
- Small Computer System Interface (SCSI) disk driver
- Generic block devices
The primary use cases for I/O probe manager are as follows:
- Identify the following patterns of I/O usage of a device. Valid
devices can be a disk, logical volume, or volume group, or file system
(type or mount path) in a specified time period:
- I/O operation count
- Size of I/O operations
- Type of I/O operation (read/write)
- Sequential or random nature of I/O
- Get process or thread-wise usage information of a file system (type or mount path), logical volume, volume group, or disk.
- Get an end-to-end mapping of I/O flow among various layers (wherever possible).
- Monitor a specific I/O resource usage. For example:
- Trace any write operations of the /etc/password file.
- Trace read operation on block 0 of the
hdisk0
device. - Trace when a new logical volume is opened in root volume group
(
rootvg
).
- For Multipath I/O (MPIO) disks, get path-specific information
by the following actions:
- Get path-wise usage and response time information.
- Identify path switching or path failure.
- For I/O errors, get more details about the error in disk driver layer.
Probe specification
I/O probes must be specified in the following format in Vue script:
@@io:sub_type:io_event:operation_type:filter[|filter …]
This
specification consists of five tuples that are separated by colon
(:
). The first tuple is always @@io
.Probe sub type
The second tuple signifies the sub type of the probe that indicates the layer of AIX I/O stack that contains the probe. This tuple can have one of the following values:
Second tuple (sub type) | Description |
---|---|
disk | This probe starts for disk driver events. Currently, the I/O
probe manager supports only the scsidisk driver. |
lvm | This probe starts for Logical Volume Manager (LVM) events. |
bdev | This probe starts for any block I/O device. Disk, CD-ROM, diskette are examples of block devices. This sub type is used only when no other sub type is applicable. For example, if a block device is not a disk, volume group, or logical volume, this sub type is applicable. |
jfs2 | This probe starts for JFS2 file system events. |
vfs | This probe starts for any read/write operation on a file. |
For a disk type of second tuple, the third tuple can have the following values:
Sub type (Second tuple) | I/O event (Third Tuple) | Description |
---|---|---|
disk | entry | This probe starts whenever disk driver receives an I/O request to process. |
iostart | This probe starts when the disk driver picks up an I/O request from its ready queue and sends it down to lower layer (for example, adapter driver). A single original I/O request to disk driver can send multiple command requests (some might be driver-related task management command requests) to lower layer. However, sometimes the driver can combine multiple original requests and send a single request to lower layer. | |
iodone | This probe starts when the lower layer (for example, adapter driver) returns an I/O request (successful or failed) to disk driver. | |
exit | This probe starts when disk driver returns an I/O request (successful or failed) to its upper layer. |
__iobuf
, __diskinfo
, __diskcmd
(only
in disk:iostart
and disk:iodone
),
and __iopath
(only in disk:iostart
and disk:iodone
).For
every entry, a corresponding exit probe is defined that has the same __iobuf->bufid
value
available at both the probe points. The entry event can be followed
by multiple iostart
events, but at least one of them
must have the same __iobuf->bufid
value. Every iostart
event
has a matching iodone
event that has the same __iobuf->child_bufid
value.
For an LVM type of second tuple, the third tuple can have the following values:
Sub type (second tuple) | I/O event (third tuple) | Description |
---|---|---|
lvm | entry | This probe starts whenever the LVM layer receives an I/O request to process. |
iostart | This probe starts when LVM picks an I/O request from its ready queue and sends down to the lower layer (usually the disk driver). | |
iodone | This probe starts when the lower layer (for example, disk driver) returns an I/O request (successful or failed) to LVM. | |
exit | This probe starts when LVM returns an I/O request (successful or failed) to its upper layer. |
__iobuf
, __lvol
,
and __volgrp
. Every entry has a corresponding exit
probe, which has the same __iobuf->bufid
value
available at both the probe points. The entry event can be followed
by multiple iostart
events, but at least one of them
has the same __iobuf->bufid
value. Every iostart
event
has a matching iodone
event that has the same __iobuf->child_bufid
value.
For generic block device probes, the third tuple can have the following values:
Sub type (second tuple) | I/O event (third tuple) | Description |
---|---|---|
bdev | iostart | This probe gets fired when any block I/O (for example, disk,
logical volume, CD-ROM) device is initiated. It happens when the AIX devstrat kernel
service is called by any code. |
iodone | This probe gets fired when a block I/O request completion happens,
when the AIX iodone kernel
service is called by any code. |
__iobuf
.
Every iostart
event has a matching iodone
event
that has the same __iobuf->bufid
value.For JFS2 file system probes, the third tuple can have the following values:
Sub type (second tuple) | I/O event (third tuple) | Description |
---|---|---|
jfs2 | buf_map | This probe starts when a logical file extent gets mapped to an I/O buffer and is sent to the underlying logical volume. |
__j2info
.For Virtual file system (VFS) probes, the third tuple can have the following values:
Sub type (second tuple) | I/O event (third tuple) | Description |
---|---|---|
vfs | entry | This probe starts when any read/write operation on a file is initiated. |
exit | This probe starts when any read/write operation on a file is completed (whether success or failure). |
__file
. For
the same thread, every entry is followed by an exit event that has
the same __file->inode_id
value.
Probe operation type
The fourth tuple indicates the type of I/O operation that is specified by the probe. The fourth tuple can have one of the following values:
Fourth tuple | Description |
---|---|
read | The probe starts for only the read operation. |
write | The probe starts for only the write operation. |
* | The probe starts for both read and write operations. |
Probe filter
The fifth tuple is the filter tuple that helps in filtering more specific probes according to the requirement. The possible values are subtype dependent. Multiple values can be specified separated by | character, and the probe starts if it matches any of those filters. If the value of the fifth tuple is *, no filtering occurs and the probe starts if other tuples match. If multiple selectors are specified, and one of them is *, it is equivalent to the whole tuple value of *.
For disk probes, the fifth tuple can have the following values:
Filter (fifth tuple) | Description |
---|---|
Disk name. For example, hdisk0 |
The probe action is run only for the particular disk. |
Disk type. Allowed symbols: FC, ISCSI, VSCSI, SAS | The probe action is run only for disks with matching type.
The meanings of the symbols are as follows:
|
hdisk0
or
any other FC disk (at disk entry event, for both read/write operation
type)@@io:disk:entry:*:hdisk0|FC
For Logical Volume Manager (LVM) probes, the fifth tuple can have the following values:
Filter (fifth Tuple) | Description |
---|---|
Logical volume name, for example hd5, lg_dumplv |
The probe action is run only for the particular logical volume. |
Volume group name, for example rootvg |
The probe action is run only for those logical volumes that belong to a particular volume group. |
The following probe starts for any logical volume that belongs to either root volume group (rootvg), or test volume group (testvg) (at iostart event, for write operation only):
@@io:lvm:iostart:write:rootvg|testvg
Filter (fifth tuple) | Description |
---|---|
Block device name, for example: hdisk0, hd5, cd0 |
The probe action is run only for the particular block device. |
Consider the following examples for generic block device probes:
@@io:bdev:iostart:*:cd0
@@io:bdev:iodone:read:hdisk3|hdisk5
For JFS2 file system probes, the fifth tuple can have following values:
Filter (fifth tuple) | Description |
---|---|
File system mount path, for example: /usr |
The probe action is run only for the file system with the particular mount path. It must be a JFS2 file system, otherwiseProbeVue rejects that probe specification. |
Consider following examples for the JFS2 file system probes:
@@io:jfs2:buf_map:*:/usr|/tmp
For Virtual file system (VFS) probes, the fifth tuple can have following values:
Filter (fifth Tuple) | Description |
---|---|
File system mount path. For example, /tmp |
The probe action is run for files that belong to the file system. |
File system type. The allowed symbols are JFS2, NAMEFS,
NFS, JFS, CDROM, PROCFS, SFS, CACHEFS, NFS3, AUTOFS, POOLFS, VXFS,
VXODM, UDF, NFS4, RFS4, CIFS, PMEMFS, AHAFS, STNFS, ASMFS |
The probe action is run for files of the particular file system.
The symbols correspond to the AIX file
systems defined in the exported header file sys/vmount.h . |
Consider the following examples for the Virtual file system (VFS) probes:
@@io:vfs:entry:read:JFS2
@@io:vfs:exit:*:/usr|JFS
I/O probe related built-in variables for Vue scripts
__iobuf
built-in
variableYou can use the special __iobuf
built-in
variable to access various information about the I/O buffer that is
employed in the current I/O operation. It is accessible in probes
of sub types: disk
, lvm
, and bdev
.
Its member elements can be accessed by using the __iobuf->member
syntax.
Invalid
Value
is returned. This value is returned because of one
of the following reasons:- Page fault context is required, but the current
probevctrl
tunable value,num_pagefaults
, is either 0 or not sufficient. - The memory location that is containing the value is paged out.
- Any other severe system error such as invalid pointer or corrupted memory.
__iobuf
built-in variable has the
following members:
Member name | Type | Description | Invalid Value |
---|---|---|---|
blknum | unsigned long long | Starting block number of the I/O request. | 0xFFFFFFFFFFFFFFFF |
bcount | unsigned long long | Requested number of bytes in the I/O operation. | 0xFFFFFFFFFFFFFFFF |
bflags | unsigned long long | The flags that are associated with the I/O operation. The following
symbols are available: B_READ, B_ASYNC, B_ERROR .
The symbols can be used along with the bflags value to see whether
it is set. For example, if (__iobuf->bflags & B_READ )
is true, then it is a read operation.Note: There is no
B_WRITE flag.
If the B_READ flag is not set, it is considered to
be write operation. |
0 |
devnum | unsigned long long | The device number of the target device that is associated with the I/O operation. It has the device major number and minor number that is embedded in it. | 0 |
major_num | int | The major number of the target device of the I/O operation. | -1 |
minor_num | int | The minor number of the target device of the I/O operation. | -1 |
error | int | In case of any error in the I/O operation, this value is the error number. This value is defined in the exported errno.h header file. | -1 |
residue | unsigned long long | The remaining number of bytes from the original request that might not be read or written. On the I/O completion events, this value is ideally zero. But for read operation, a nonzero value might mean that you are trying to read more than what is available, which is acceptable. This value is considered only when error value is nonzero. | 0xFFFFFFFFFFFFFFFF |
bufid | unsigned long long | A unique number that is associated with the I/O request. While
the I/O is in progress, the bufid value uniquely identifies the I/O
request in all the events of a particular sub type. For example, in disk:
entry , disk: iostart , disk: iodone ,
and disk:exit . If the __iobuf->bufid matches,
it is the same I/O request at various stages). |
0 |
parent_bufid | unsigned long long | If the value is not 0, this value provides the bufid of
the upper layer buffer that is associated with this I/O request. You
can now link the current I/O operation with the upper layer I/O request.
For example, in a disk I/O request, the corresponding LVM I/O can
be determined.Note: The parent_bufid field is not set in all code
paths, and hence it is not always useful. Use the child_bufid field
to link I/O requests between two adjacent layers.
|
0 |
child_bufid | unsigned long long | If the value is not 0, this value provides the bufid of
the new I/O request that is sent to the lower layer. The best events
to record are disk:iostart , lvm:iostart ,
and bdev:iostart . You can identify the I/O in the
lower adjacent layer by matching the __iobuf->bufid value
to this child_bufid value. For example, in lvm:iostart ,
you can record the __iobuf->child_buf value. Then,
in disk:entry , you can match it with __iobuf->bufid to
identify the corresponding I/O request. |
0 |
__file
built-in variable
You
can use the __file
special built-in variable to get
various information about file operation. It is available in probes
of sub type VFS. Its member elements can be accessed by using the __file->member
syntax.
- Page fault context is required, but the current
probevctrl
tunable valuenum_pagefaults
is either 0 or not sufficient. - The memory location, which contains the value, is paged out.
- Any other severe system error such as invalid pointer, or corrupted memory.
The __file
built-in variable
has the following members:
Member name | Type | Description | Invalid Value |
---|---|---|---|
f_type | int | Specifies the type of the file. It can match one of the following
built-in constant values:
Note: The value might not match any of the built-in constants
because the list does not include every possible file type, but only
the most useful ones.
|
-1 |
fs_type | int | Specifies the type of the file system to which this file belongs.
It can match one of the following built-in constant values:
The built-in constants corresponds to the AIX file system types defined in the exported |
-1 |
mount_path | char * | Specifies the path where the associated file system is mounted. | null string |
devnum | unsigned long long | Specifies the device number of the associated block device of the file. Both the major and minor numbers are embedded in it. If there is no associated block device, then it is 0. | 0 |
major_num | int | Specifies the major number of the associated block device of the file. | -1 |
minor_num | int | Specifies the minor number of the associated block device of the file. | -1 |
offset | unsigned long long | Specifies the current read/write byte offset of the file. | 0xFFFFFFFFFFFFFFFF |
rw_mode | int | Specifies the read/write mode of the file. It matches one of the built-in constant values: F_READ or F_WRITE. | -1 |
byte_count | unsigned long long | At vfs: entry event, byte_count provides
the byte count of the read or write request. At vfs: exit
event, it provides the number of bytes that remained unfulfilled.
For example, the difference of this value between these two events
determines how many bytes were processed in the operation. |
0xFFFFFFFFFFFFFFFF |
fname | char * | Specifies the name of the file (only base name, not path). | null string |
inode_id | unsigned long long | Specifies a system-wide unique number that is associated with
the file. Note: It is different from file inode number.
|
0 |
path | path_t (new data type in VUE) | Specifies the complete file path. It can be printed by using printf() and
the format specifier %p . |
null string as file path |
error | int | If the read/write operation failed, the error number as defined in the exported errno.h header file. If there is no error, it is 0. | -1 |
__lvol
built-in variable
__lvol
special built-in variable to get
various information about the logical volume in an LVM operation.
It is available in probes of sub type lvm
. Its member
elements can be accessed by using the __lvol->member
syntax. Invalid
Value
, is returned. There might be following reasons for
getting this invalid value:- Page fault context is required, but the current
probevctrl
tunable valuenum_pagefaults
is either 0 or not sufficient. - The memory location that contains the value is paged out.
- Any other severe system error such as invalid pointer or corrupted memory.
__lvol
built variable in has following
members:
Member name | Type | Description | Invalid Value |
---|---|---|---|
name | char * | The name of the logical volume. | null string |
devnum | unsigned long long | The device number of the logical volume. It has both major number and minor number that is embedded in it. | 0 |
major_num | int | The major number of the logical volume. | -1 |
minor_num | int | The minor number of the logical volume. | -1 |
lv_options | unsigned int | The options that are related to the logical volume. The following
values are defined as built-in constants:
You can check whether one of these values is set by having
condition such as Note: All
possible values are not defined, and hence other options might be
available in the value.
|
0xFFFFFFFF |
__volgrp
built-in variable
__volgrp
special built-in variable to get
various information about the volume group in an LVM operation. It
is available in probes of sub type lvm
. Its member
elements can be accessed by using the __volgrp->member
syntax.Invalid
Value
is returned. The value could be invalid because of
the following reasons:- Page fault context is required, but the current
probevctrl
tunable valuenum_pagefaults
is either 0 or not sufficient. - The memory location that contains the value is paged out.
- Any other severe system error such as invalid pointer or corrupted memory.
__volgrp
built-in variable has
following members:
Member name | Type | Description | Invalid Value |
---|---|---|---|
name | char * | The name of the volume group. | null string |
devnum | unsigned long long | The device number of the volume group. It has major number and minor number that is embedded in it. | 0 |
major_num | int | The major number of the volume group. | -1 |
minor_num | int | The minor number of the volume group. Note: For volume group, AIX always assigns 0 as the minor
number.
|
-1 |
num_open_lvs | int | The number of open logical volumes that belong to this volume group. | -1 |
__diskinfo
built-in variable
__diskinfo
special built-in variable
to get various information about the disk in a disk I/O operation.
It is available in probes of sub type disk. Its member elements can
be accessed by using the __diskinfo->member
syntax. - Page fault context is required, but the current
probevctrl
tunable valuenum_pagefaults
is either 0 or not sufficient. - The memory location that contains the value is paged out.
- Any other severe system error such as invalid pointer or corrupted memory.
__diskinfo
built-in variable has following
members:
Member name | Type | Description | Invalid Value |
---|---|---|---|
name | char * | The name of the disk. | null string. |
devnum | unsigned long long | The device number of the disk. It has major number and minor number that are embedded in it. | 0 |
major_num | int | The major number of the disk. | -1 |
minor_num | int | The minor number of the disk. | -1 |
lun_id | unsigned long long | The Logical Unit Number (LUN) for the disk. | 0xFFFFFFFFFFFFFFFF |
transport_type | int | The transport type of the disk. It can match one of the following
built-in constant values:
|
-1 |
queue_depth | int | The queue depth of the disk. It indicates how many maximum simultaneous I/O requests that the disk driver can pass on to the lower layer (for example, adapter). If the number of incoming I/O requests is more than queue_depth, the request is handled differently. The extra request is handled by the disk driver in its wait queue until lower layer responds to at least one of the outstanding I/O requests. | -1 |
cmds_out | int | Number of outstanding I/O command requests to the lower layer (for example, adapter). | -1 |
path_count | int | Number of MPIO paths of the disk (Only if the disk is MPIO capable, else it is 0). | -1 |
reserve_policy | int | The SCSI reservation policy of the disk. It matches one of
the following built-in constant values:
Refer to AIX MPIO documentation to know more about the reservation policies. |
-1 |
scsi_flags | int | The SCSI flags of the disk. The following built-in flag values
are defined:
Note: All flag values are not defined, hence other flags present
might be available in the value.
|
0 |
__diskcmd
built-in variable
__diskcmd
special built-in variable to
get various information about the SCSI I/O command for the current
operation. It is available in probes of sub type disk (but only iostart
and iodone
events).
Its member elements can be accessed by using syntax __diskcmd->member
. - Page fault context is required, but the current probevctrl tunable
value
num_pagefaults
is either 0 or not sufficient. - The memory location that contains the value is paged out.
- Any other severe system error such as invalid pointer or corrupted memory.
__diskcmd
built-in variable
has following members:
Member name | Type | Description |
---|---|---|
cmd_type | int | The type of the SCSI command (both type and subtype are merged
together). The following built-in constant values are available as
command type:
Note: The built-in constants are bit position values and hence
their presence must be checked by using ‘&’ operator
(the ‘==’ operator must not be used). For example:
__diskcmd->cmd_type &
DK_IOCTL . |
retry_count | int | It indicates whether the I/O command is retried after any failure. Note: The
value of 1 means that it is the first attempt. Any larger value indicates
actual retrials.
|
path_switch_count | int | It indicates how many times the path was changed for this particular I/O operation (usually indicates some I/O path failure, either transient or permanent). |
status_validity | int | In case of any error, this value indicates whether it is a SCSI error or adapter error. It can match one of the following built-in constant values: SC_SCSI_ERROR or SC_ADAPTER_ERROR. If there is no error, then it is 0. |
scsi_status | int | If the status_validity field is set to SC_SCSI_ERROR, this
field gives more details about the error. It can match one of the
built-in constant values:
Note: All possible values are not defined. Hence, SC_SCSI_ERROR
can have a value that might not match any of the built-in values.
You can look up the corresponding SCSI command response code.
|
adapter_status | int | If the status_validity field is set to SC_ADAPTER_ERROR, this
field provides more information about the error. It can match one
of the following built-in constant values:
|
__iopath
built-in variable
__iopath
special built-in variable to
get various information about the I/O path for the current operation.
It is available in probes of sub type disk for iostart
and iodone
events
only. Its member elements can be accessed by using the __iopath->member
syntax
. Invalid Value
, is returned. There might
be following reasons for getting this value:- Page fault context is required, but the current probevctrl tunable
value
num_pagefaults
is either 0 or not sufficient. - The memory location that contains the value is paged out.
- Any other severe system error such as invalid pointer or corrupted memory.
__iopath
has following members:
Member name | Type | Description | Invalid Value |
---|---|---|---|
path_id | int | The ID of the current path (starting from 0). | -1 |
scsi_id | unsigned long long | The SCSI ID of the target on this path. | 0xFFFFFFFFFFFFFFFF |
lun_id | unsigned long long | The Logical Unit Number (LUN) on this path. | 0xFFFFFFFFFFFFFFFF |
ww_name | unsigned long long | The worldwide name of the target port on this path. | 0 |
cmds_out | int | The number of I/O commands outstanding on this path. | -1 |
__j2info
built-in variable
__j2info
is
a special built-in variable that you can use to get various information
about JFS2
file system operation. It is available
in probes of sub type jfs2. Its member elements can be accessed by
using the __j2info->member
syntax. Invalid
Value
is returned. There might be following reasons for getting
this value:- Page fault context is required, but the current
probevctrl
tunable valuenum_pagefaults
is either 0 or not sufficient. - The memory location that contains the value is paged out.
- Any other severe system error such as invalid pointer or corrupted memory.
__j2info
has the following members:
Member name | Type | Description | Invalid Value |
---|---|---|---|
inode_id | unsigned long long | A system-wide unique number that is associated with the file
of current operation. Note: It is different from the file inode number.
|
0 |
f_type | int | Type of the file. The __file->f_type description
provides possible values. |
-1 |
mount_path | char * | The path where the file system is mounted. | null string. |
devnum | unsigned long long | The device number of the underlying block device of the file system. It has both major number and minor number embedded. | 0 |
major_num | int | The major number of the underlying block device of the file system. | -1 |
minor_num | int | The minor number of the underlying block device of the file system. | -1 |
l_blknum | unsigned long long | The logical block number for this file operation. | 0xFFFFFFFFFFFFFFFF |
l_bcount | unsigned long long | The requested byte count between the logical blocks in this operation. | 0xFFFFFFFFFFFFFFFF |
child_bufid | unsigned long long | The bufid of the I/O request buffer that is sent down to the
lower layer (for example, LVM). In that layer, it appears as __iobuf->bufid . |
0 |
child_blknum | unsigned long long | The block number of the I/O request buffer that is sent down
to the lower layer (for example, LVM). In that layer, it appears as __iobuf->blknum . |
0xFFFFFFFFFFFFFFFF |
child_bcount | unsigned long long | The byte count of the I/O request buffer that is sent down
to the lower layer (for example, LVM). In that layer, it appears as __iobuf->bcount . |
0xFFFFFFFFFFFFFFFF |
child_bflags | unsigned long long | The flags of the I/O request buffer that is sent down to the
lower layer (for example, LVM). In that layer, it appears as __iobuf->bflags . |
0 |
Example scripts for I/O probe manager
- Script to trace any write operation to the
/etc/passwd
file:int write(int, char *, int); @@BEGIN { target_inodeid = fpath_inodeid("/etc/passwd"); } @@syscall:*:write:entry { if (fd_inodeid(__arg1) == target_inodeid) { printf("write on /etc/passwd: timestamp=%A, pid=%lld, pname=[%s], uid=%lld\n", timestamp(), __pid, __pname, __uid); } } If the scripts is in a VUE file, names etc_passwd.e. The script can be run as: # probevue etc_passwd.e In another terminal, if the user (root) runs: # mkuser user1 Then probevue displays an output similar to the following example: write on /etc/passwd: timestamp=Mar/03/15 16:10:07, pid=14221508, pname=[mkuser], uid=0
- Script to find the maximum and minimum I/O operation time for
a disk (for example,
hdisk0
) in a period. Also, find the block number, requested byte count, time of operation and type of operation (read or write) corresponding to the maximum or minimum time.
long long min_time, max_time;
@@BEGIN {
min_time = max_time = 0;
}
@@io:disk:entry:*:hdisk0 {
ts_entry[__iobuf->bufid] = (long long)timestamp();
}
@@io:disk:exit:*:hdisk0 {
if (ts_entry[__iobuf->bufid]) { /* only if we recorded entry time */
ts_now = timestamp();
op_type = (__iobuf->bflags & B_READ) ? "READ" : "WRITE";
dt = (long long)diff_time(ts_entry[__iobuf->bufid], ts_now, MICROSECONDS);
if (min_time == 0 || dt < min_time) {
min_time = dt;
min_blknum = __iobuf->blknum;
min_bcount = __iobuf->bcount;
min_ts = ts_now;
min_optype = op_type;
}
if (max_time == 0 || dt > max_time) {
max_time = dt;
max_blknum = __iobuf->blknum;
max_bcount = __iobuf->bcount;
max_ts = ts_now;
max_optype = op_type;
}
ts_entry[__iobuf->bufid] = 0;
}
}
@@END {
printf("Maximum and minimum IO operation time for [hdisk0]:\n");
printf("Max: %lld usec, block=%lld, byte count=%lld, operation=%s, time of operation=[%A]\n",
max_time, max_blknum, max_bcount, max_optype, max_ts);
printf("Min: %lld usec, block=%lld, byte count=%lld, operation=%s, time of operation=[%A]\n",
min_time, min_blknum, min_bcount, min_optype, min_ts);
}
Let this script be in a VUE file named disk_min_max_time.e. It can be executed as:
# probevue disk_min_max_time.e
Let there be some IO activity on hdisk0 (dd command can be used).
Then after a few minutes, if the above command is terminated (by pressing CTRL-C), then it will print output similar to:
^CMaximum and minimum IO operation time for [hdisk0]:
Max: 48174 usec, block=6927976, byte count=4096, operation=READ, time of operation=[Mar/04/15 03:31:07]
Min: 133 usec, block=6843288, byte count=4096, operation=READ, time of operation=[Mar/04/15 03:31:03]
Network probe manager
Network
probe manager tracks incoming and outgoing network packets in a system
(packet information as interpret by the bpf
module
in AIX). Probe specification
allows the user to specify Berkeley Packet Filter (BPF) filters, similar
to tcpdump filter expression for granular tracking.
You can use built-in variables to collect packet header and payload information for Internet protocols. For example, Ethernet, Internet Protocol Version 4/Version 6 (IPv4/v6), Transmission Control Protocol (TCP), User Datagram Protocol (UDP), Internet Control Message Protocol (ICMP), Internet Group Message Protocol (IGMP), and Address Resolution Protocol (ARP) protocols.
Network probe manager reports critical protocol-specific events (TCP state changes, round-trip times, retransmissions, UDP buffer overflows).
The network probe manager addresses following primary use cases:
- Provide the following packet-specific information according to
the bpf module based on IP address and ports:
- Track the incoming and outgoing bytes for a connection.
- Use following built-ins to gather protocol header and payload
information.
- TCP flags (SYN, FIN), TCP sequence and acknowledgment number.
- IPv4/IPv6 (IP addresses, protocol types: tcp, udp, icmp, igmp, and so on).
- ICMP (packet type: ECHO REQUEST, ECHO RESPONSE, and so on).
- Provide access to complete RAW network packet for probe script processing.
- Report the following protocol-related events:
- Track TCP sender and receiver buffer full events.
- TCP connection state changes from SYN-SENT state to ESTABLISHED state or from ESTABLISHED state to CLOSE state.
- Monitor delta time between state changes (for example, time that is taken from SYN-SENT state to ESTABLISHED state).
- Identify the listener (connection information) that discarded connections because the listener's queue is full.
- Identify retransmissions (second and further retransmission for a packet) for TCP connections.
- Identify the UDP socket that dropped packets because of insufficient receiving buffer.
Probe specification
Probe specification for network
probe manager contains three or five tuples that are separated by
: (colon). First tuple is always @@net
.
Network probe manager supports two major categories of specifications: One category gathers packet-specific information and another category gathers protocol-specific information.
- Format to gather packet specific information:
@@net:bpf:<interface1>|<interface 2>|…..:<protocol>:<Filter>
- Format to gather protocol specific information
@@net:tcp:<event_name>
@@net:udp:<event_name>
Probe sub type
The second tuple signifies the sub type of the probe that indicates which layer of AIX network stack contains the probe. This tuple can have one of the following values (it cannot be *):
Second Tuple (sub type) | Description |
---|---|
bpf | This probe starts at network interface layer when a packet matches the specific filter. |
tcp | This probe starts for TCP protocol-specific events. |
udp | This probe starts for UDP protocol-specific events. |
Probe network event or gather network packet information
The third tuple is specific to particular sub type (specified in second tuple). It cannot have a value of *.
bpf-based probes
The specification contains 5 tuples for bpf-based probes that are described in the following table:
Second tuple (Sub type) | Subsequent tuples | Description |
---|---|---|
bpf |
Third tuple: interface names | This tuple specifies an interface or a list of interfaces for
which the packet information can be captured. Possible values are enX (for
example, en0 ,en1 ) and lo0 .
The * value is not supported for this tuple. You can specify one or
more interfaces at a time by using | as delimiter. |
Fourth tuple: protocol | This tuple specifies the network protocol to start the probe.
Possible values are ether, arp, rarp, ipv4, ipv6, tcp, udp,
icmp4, icmp6 and igmp . Protocol-specific
built-ins are populated for access in Vue script. For example, a protocol
value of ipv4 populates __ip4hdr built-ins.The
* value for this tuple indicates that the probe starts for all protocol
types that match the specified filter. When the protocol is *, none
of the built-in values that are supported by network probe manager
are available to Vue scripts. You can access the raw packet data of
requested size by using the Vue function
copy_kdata () and
map to corresponding protocol headers.Note: Specifying * as a value
can be performance intensive as the probe is started for all incoming
and outgoing packets on the specified interfaces that match the filter.
There are also copies involved when the packet information is spanned
across multiple packet buffers.
|
|
Fifth tuple: bpf filter string | This tuple specifies the bpf filter expression (filter expressions
as specified in tcpdump command). Filter expression
must be provided in the double quotation marks. Filter expression
and protocol that is specified in the fourth tuple must be compatible.
The * value is not supported in this tuple. Refer to tcpdump documentation for detailed information on filter expressions. |
- Specification format to access the built-in variables that are
related to Ethernet header (
__etherhdr
), IP header(__ip4hdr
) or (__ip6hdr
), and TCP header (__tcphdr
) information from the Vue script when interface en0 receives or sends packet on port 23 (filter string ” port 23”):@@net:bpf:en0:tcp:“port 23”
- Specification format to access the built-in variables related
to Ethernet header(
__etherhdr
), IP header(__ip4hdr
or__ip6hdr
), and UDP header (__udphdr
) information from the Vue script when system receives or sends packet from host example.com (filter string “example.com”) on en0 and en1 interfaces:@@net:bpf:en0|en1:udp:“host example.com”
- Specification format to access the raw packet information when
system receives or sends packet from or to "host example.com":
@@net:bpf:en1:*:“host example.com”
bpf
probe specification uses a bpf
device.
These devices are shared by ProbeVue, tcpdump,
and any other application that uses the libpcap
or bpf
services
for packet capture and injection. The number of bpf probes depends
on the number of available bpf devices in the system. When a bpf
probe
is started, the __mdata variable contains the raw
packet data. You can access the raw data of requested size by using
the Vue function copy_kdata ()
and map to the ether_header,
ip header, and so on. Use the following structures to find out the
header and payload data information.
Example
VUE script to access the raw packet data when the “*” is specified as the protocol.
/* Define the ether header structure */
struct ether_header {
char ether_dhost[6];
char ether_shost[6];
short ether_type;
};
/* ProbeVue script to access and interpret the data from RAW packet */
@@net:bpf:en0:*:"port 23"
{
/* define the script local variables */
__auto struct ether_header eth;
__auto char *mb;
/* __mdata contains the address of packet data */
mb =(char *) __mdata;
printf("Network probevue\n");
/*
* Use already available “copy_kdata(…)” VUE function to copy data of
* requested size (size of ether_header) from mbuf data pointer to eth
* (ether_header) variable.
*/
copy_kdata (mb, eth);
printf("Ether Type from raw data :%x\n",eth.ether_type);
}
TCP probes
The specification contains three tuples for TCP probes as described in the following table:
Second tuple (Sub type) | Events (Third tuple) The * value is not supported in this tuple. |
Description |
---|---|---|
tcp | state_change | This probe is started whenever the TCP state changes. |
send_buf_full | This probe is started whenever the send buffer full event occurs. | |
recv_buf_full | This probe is started whenever the receive buffer full event occurs. | |
retransmit | This probe is started whenever the re-transmission of packet happens for TCP connection. | |
listen_q_full | This probe is started whenever a server (listener socket) discards the new connection requests due to listener’s queue being full. |
__proto_info built-in variable provides the TCP connection (four tuple) information (local IP, remote IP, local port, and remote port) whenever the TCP-related event occurs. Remote port and IP address contains a value of NULL for the listen_q_full event.
Example
Probe specifications for TCP protocol state changes:
@@net:tcp:state_change
udp probes
For udp
probes the specification
contains three tuples as described in the following table:
Second tuple (Sub type) | Events (third tuple) The * value is not supported in this tuple. |
Description |
---|---|---|
udp | sock_recv_buf_overflow | This probe is started whenever the datagram or the UDP socket’s receive buffer overflows. |
The __proto_info
built-in variable provides
the UDP protocol related data (source IP and destination IP addresses,
source and destination port numbers) whenever socket receive buffer
overflow event occurs.
@@net:udp:sock_recv_buf_overflow
Example
Probe specifications for UDP socket’s receive buffer overflow:
@@net:udp:sock_recv_buf_overflow
Network probe-related built-in variables for Vue scripts
Network related events can be probed using following built-in variables.
__etherhdr
built-in
variable
The __etherhdr variable is a special built-in variable
to get ether header information from filtered packet. This built-in
variable is available when you probe the packet information at interface
layer with any one of these protocols: “ether”, “ipv4”, “ipv6”, “tcp”, “udp”, “icmp4”,
icmp6”, “igmp”, “arp”, and “rarp”.
This variable is available in probes of sub type bpf
.
Its member elements can be accessed by using the syntax __etherhdr->member
.
The __etherhdr built-in value has the following members:
Member name | Type | Description |
---|---|---|
src_addr | mac_addr_t |
Source MAC address. The data type mac_addr_t is used to store the MAC address. Use format specifier “M” to print the MAC address. |
dst_addr | mac_addr_t | Destination MAC address. The data type mac_addr_t is used to store the MAC address. Use format specifier “M” to print the MAC address. |
ether_type | unsigned short | This name indicates the protocol encapsulated in the payload
of an Ethernet frame. Protocols can be IPv4, IPv6, ARP, and REVARP. It can match one of the following built-in constant values for ether_type:
Refer the header files |
The __ip4hdr variable is a special built-in
variable to get the IPv4 header information from filtered packet.
This variable is available when you probe the packet information at
interface layer with any one of the protocols: “ipv4”,“tcp”, “udp”, “icmp4”,
and “igmp”. And, it has valid data when IP version is
IPv4. This variable is available in probes of sub type bpf
.
Its member elements can be accessed by using the syntax __ip4hdr->member
.
This built-in variable has the following members:
Member name | Type | Description |
---|---|---|
src_addr | ip_addr_t |
Source IP address. The data type ip_addr_t is used to store the IP address. Use format specifier “I” to print the IP address in dotted decimal format and use format specifier “H” to print the host name. Host name printing is a costly operation. |
dst_addr | ip_addr_t | Destination IP address. The data type ip_addr_t is used to store the IP address. Use format specifier “I” to print the IP address in dotted decimal format and use format specifier “H” to print host name. Host name printing is a costly operation. |
protocol | unsigned short | This member name indicates the protocol that is used in the
data portion of the IP datagram. Protocols can be TCP, UDP, ICMP,
IGMP, FRAGMENTED, and so on. It can match one of the following built-in constant values for protocol.
Refer the header file |
ttl | unsigned short | Time to live or hop limit. |
cksum | unsigned short | IP header checksum. |
id | unsigned short | Identification number. This member is used for uniquely identifying the group of fragments of a single IP datagram. |
total_len | unsigned short | Total length. This value is entire packet (fragment) size, including IP header and data in bytes. |
hdr_len | unsigned short | Size of the IP header. |
tos | unsigned short | Type of service. |
frag_offset | unsigned short | Fragment offset. This value specifies the offset of particular fragment, relative to beginning of the original un fragmented IP datagram. The first fragment has an offset of zero. It can match one of
the built-in constant
Refer the header file |
__ip6hdr built-in variable
The __ip6hdr variable
is a special built-in variable to get the IPv6 header information
from filtered packet. This variable is available when user probes
the packet information at interface layer. This variable with any
one of the protocols (“ipv6”, “tcp”, “udp”
and “icmp6”) has valid data when IP version is IPv6. This
variable is available in probes of sub type bpf
.
Its member elements can be accessed by using the syntax __ip6hdr->member
.
This built-in variable has the following members:
Member name | Type | Description |
---|---|---|
src_addr | ip_addr_t |
Source IP address. The data type ip_addr_t is used to store the IP address. Use format specifier “I” to print the IP address and use format specifier “H” to print the host name. Host name printing is a costly operation. |
dst_addr | ip_addr_t | Destination IP address. The data type ip_addr_t is used to store the IP address. Use format specifier “I” to print the IP address and use format specifier “H” to print host name. Host name printing is costly operation. |
protocol | unsigned short | This value indicates the protocol that is used in the data
portion of the IP datagram. Protocols can be TCP, UDP, and ICMPV6,
and so on. It can match one of the following built-in constant values for protocol: IPPROTO_TCP,IPPROTO_UDP, IPPROTO_ROUTING,
IPPROTO_ICMPV6, IPPROTO_NONE, IPPROTO_DSTOPTS, IPPROTO_LOCAL Refer
the header file |
hop_limit | unsigned short | Hop limit (time to live). |
total_len | unsigned short | Total length (payload length). The size of the payload including any extension headers. |
next_hdr | unsigned short | Specifies the type of the next header. This field usually specifies the transport layer protocol that is used by a packet's payload. When extension headers are present in the packet, this field indicates which extension header follows. The values are shared with those used for the IPv4 protocol field. |
flow_label | unsigned int | Flow label. |
traffic_class | unsigned int | Traffic class. |
__tcphdr built-in variable
The __tcphdr variable
is a special built-in variable to get the tcp header information from
filtered packet. This variable is available when you probe the packet
information at interface layer with tcp protocol. It is available
in probes of sub type bpf. Its member elements can be accessed by
using the syntax __tcphdr->member
.
The __tcphdr built-in variable has the following members:
Member name | Type | Description |
---|---|---|
src_port | unsigned short | Source port of the packet. |
dst_port | unsigned short | Destination port of the packet. |
flags | unsigned short | These values are the control bits and are set to indicate the
communication of control information. 1 bit for each flag. It can match one of the built-in constant flag values. The flag values must be bitwise and with the built-in constant flag value to validate the presence of the particular flag.
Refer TCP documentation for detailed information about these
flags and refer the header file |
seq_num | unsigned int | Sequence number. |
ack_num | unsigned int | Acknowledgment number. |
hdr_len | unsigned int | TCP header length information |
cksum | unsigned short | Checksum. |
window | unsigned short | Window size. |
urg_ptr | unsigned short | Urgent pointer. |
__udphdr built-in variable
The __udphdr is
a special built-in variable that is used to get the udp
header
information from filtered packet. This built-in is available when
user probes the packet information at interface layer with udp as
protocol. It is available in probes of sub type bpf
.
Its member elements can be accessed by using the syntax __udphdr->member
.
__udphdr built-in variable has the following members:
Member name | Type | Description |
---|---|---|
src_port | unsigned short | Source port of the packet. |
dst_port | unsigned short | Destination port of the packet. |
length | unsigned short | UDP header and data length information. |
cksum | unsigned short | Checksum. |
__icmp built-in variable
The __icmp is a special
built-in variable that is used to get the icmp header information
from filtered packet. This built-in is available when user probes
the packet information at interface layer with icmp protocol. It is
available in probes of sub type bpf. Its member elements can be accessed
by using the syntax __icmp->member
.
This built-in variable has the following members:
Member name | Type | Description |
---|---|---|
type | unsigned short | Type of ICMP message. For example: 0 - echo reply, 8 - echo request, 3 - destination unreachable. Look in for all the types. For more information, refer to the standard network documentation. It can match one of the following built-in constant values for of ICMP message types:
Refer the header file Note: All possible message type values are not
defined, and hence there can be other options present in the value.
|
code | unsigned short | Subtype of ICMP message. For each type of message, several different codes and subtypes are defined. For example, no route to destination, communication with destination administratively prohibited, not a neighbor, address unreachable, port unreachable. For more information, refer to the standard network documentation. It can match one of the following built-in constant values for ICMP sub types: ICMP_UNREACH_NET ICMP_UNREACH_HOST ICMP_UNREACH_PROTOCOL ICMP_UNREACH_PORT ICMP_UNREACH_NEEDFRAG ICMP_UNREACH_SRCFAIL ICMP_UNREACH_NET_ADMIN_PROHIBITED ICMP_UNREACH_HOST_ADMIN_PROHIBITED Subtype values for type 4 The subtype values for type 4 are as follows:
Subtype values for type 6 The subtype values for type 6 are as follows:
Subtype values for type 7 The subtype values for type 7 are as follows: ICMP_PARAMPROB_PTR ICMP_PARAMPROB_MISSING Refer the header file /usr/include/netinet/ip_icmp.h for message subtype values. Note: Not all possible message sub types values are defined, and hence there might be other options present in the message sub type value. |
cksum | unsigned short | Checksum. |
__icmp6 built-in variable
__icmp6 is a special
built-in variable that is used to get the icmpv6 header information
from filtered packet. This is available when user probes the packet
information at interface layer with icmp6
protocol.
It is available in probes of sub type bpf
. Member
elements of this built-in variable can be accessed using syntax “__icmp6->member”.
__icmp6 has the following members:
Member name | Type | Description |
---|---|---|
type | unsigned short | Type of ICMPV6 message. This specifies the type of message, which determines the format of the remaining data. It can match one of the following built-in constant values for ICMPV6 types.
Refer the header file Note: Not all possible message type values are
defined, and hence there might be other options present in the value.
|
code | unsigned short | Subtype of ICMPV6 message. This value depends on the message type. It provides an extra level of message granularity. It can match one of the following built-in constant values for ICMPV6 sub types.
Refer the header file Note: Not all possible message sub type
values are defined, and hence there might be other options present
in the value.
|
cksum | unsigned short | Checksum. |
__igmp built-in variable
__igmp is a special
built-in variable that is used to get the igmp
header
information from filtered packet. This is available when user probes
the packet information at interface layer with igmp
protocol.
This is available in probes of sub type bpf
. Its
member elements can be accessed using syntax “__igmp->member”.
__igmp built-in has the following members:
Member name | Type | Description |
---|---|---|
type | unsigned short | Type of IGMP message. For example: It can match one of the following built-in constant values for IGMP Message types. Refer the header file Note: Not all possible message type values are defined, and hence there could be other options present in the value. |
code | unsigned short | Subtype of IGMP type. It can match one of the following built-in constant values for IGMP Message subtypes. Subtype values for type no 3. Note: Not all possible message
sub type values are defined, and hence there could be other options
present in the value.
|
cksum | unsigned short | IGMP Checksum value. |
group_addr | ip_addr_t | Group address that is reported or queried. This address is the multicast address that is queried when you are sending a Group-Specific or Group-and-Source-Specific Query. The field has a value of zero when you are sending a General Query. The data type ip_addr_t is used to store the group IP address. Use format specifier “I” to print the IP address. |
__arphdr built-in variable
The
__arphdr variable is a special built-in variable that is used to get
the arphdr
header information from filtered packet.
This variable is available when user probes the packet information
at interface layer with arp
or rarp
protocol.
It is available in probes of sub type bpf
.The __arphdr
member elements can be accessed by using the syntax __arphdr->member
.
The __arphdr built-in variable has following members:
Member name | Type | Description |
---|---|---|
hw_addr_type | unsigned short | Format of the hardware address type. This field identifies
the specific data-link protocol that is being used. It can match one of the following built-in constant values for data link protocol:
Refer the header file /usr/include/net/if_arp.h for protocol values. |
protocol_type | unsigned short | Format of the protocol address type. This field identifies
the specific network protocol that is being used. It can match one of the following built-in constant values for network protocol:
Refer the header file |
hdr_len | unsigned short | Mac or hardware address length. |
proto_len | unsigned short | Protocol or IP address length. |
operation | unsigned short | Specifies the operation that the sender is performing: 1 for
request, 2 for reply. It can match one of the following built-in constant values for network protocol:
Refer the header file |
src_mac_addr | mac_addr_t | Sender or source MAC address. Sender hardware or mac address is stored in mac_addr_t data type. The format specifier “%M” is used to print sender MAC or hardware address. |
dst_mac_addr | mac_addr_t | Target or Destination MAC address. Target hardware or MAC address is stored in mac_addr_t data type. The format specifier “%M” is used to print target MAC or hardware address. |
src_ip | ip_addr_t | Source or sender IP address. Sender IP address is stored
in The format specifier “%I” is used to print sender IP address. |
dst_ip | ip_addr_t | Target or Destination IP address. Target IP address is stored
in The format specifier “%I” is used to print target IP address. |
Example
Vue script to probe packet header
information for packets received or sent over port 23. Provides the
source and destination node information and also tcp
header
length information
@@net:bpf:en0:tcp:"port 23"
{
printf("src_addr:%I and dst_addr:%I\n",__ip4hdr->src_addr,__ip4hdr->dst_addr);
printf("src port:%d\n",__tcphdr->src_port);
printf("dst port:%d\n",__tcphdr->dst_port);
printf("tcp hdr_len:%d\n",__tcphdr->hdr_len);
}
Output:
# probevue bpf_tcp.e
src_addr:10.10.10.12 and dst_addr:10.10.18.231
src port:48401
dst port:23
tcp hdr_len:20
..................
.................
__proto_info built-in variable
The __proto_info variable is a special built-in variable
that is used to get the protocol (source and destination IP addresses
and ports) information for TCP or UDP events. The __proto_info variable
is available in probes of sub type tcp
or udp
.
Its member elements can be accessed by using the syntax __proto_info->member
.
The __proto_info built-in variable has the following members:
Member name | Type | Description |
---|---|---|
local_port | unsigned short | Local port |
remote_port | unsigned short | Remote port |
local_addr | ip_addr_t | Local address |
remote_addr | ip_addr_t | Remote address |
Additional information for TCP-specific events
The TCP state change events are described in the following table:
Name | Type | Description |
---|---|---|
__prev_state | short | Previous state information for connection. |
__cur_state | short | Present state information for connection. |
It can match one of the following built-in constant values
for TCP states:
The values are defined in exported header file |
Example:
The following Vue script provides state change information for a particular connection:
@@net:tcp:state_change
when(__proto_info->local_addr ==”10.10.10.1” and __proto_info->remote_addr == 10.10.10.2”
and __proto_info->local_port =”8000” and __proto_info->remote_port =”9000”)
{
printf(“Previous state:%d and current_state:%d\n”,__prev_state,__cur_state);
}
TCP retransmit event
Name | Type | Description |
---|---|---|
__nth_retransmit | unsigned short | Nth retransmission |
Examples
1. Following example Identifies the listener which has discarded connections due to listener's queue is full.
@@net:tcp:listen_q_full
{
printf(“Listener IP address:%I and Port number is:%d\n”,__proto_info->local_addr, __proto_info->local_port);
}
2. Following example Identifies connection which drop packets due to socket buffer overflows
@@net:udp:sock_recv_buf_overflow
{
printf("Connection information which drops packet due to socket buffer overflows:\n");
printf("Local IP address:%I and Remote IP address:%I\n",__proto_info->local_addr,__proto_info->remote_addr);
printf("local port :%d and remote port:%d\n",__proto_info->local_port, __proto_info->remote_port);
}
3. Identify retransmissions (second & further retransmission for a packet) for TCP connections for particular connection.
@@net:tcp:retransmit
when (__proto_info->local_addr == "10.10.10.1" &&
__proto_info->remote_addr == "10.10.10.2" &&
__proto_info->local_port == "4000" &&
__proto_info->remote_port == "5000")
{
printf(" %d th re-transmition for this connection\n", _nth_retransmit);
}
4. Identify the connection information whenever sender buffer full event occurs .
@@net:tcp:send_buf_full
{
printf("Connection information whenever send buffer full event occurs:\n");
printf("Local IP address:%I and Remote IP address:%I\n",__proto_info->local_addr,__proto_info->remote_addr);
printf("local port :%d and remote port:%d\n",__proto_info->local_port, __proto_info->remote_port);
}
Sysproc probe manager
Overview
The
sysproc probe manager provides an infrastructure to users and administrators
to dynamically trace process or thread related data without knowing
internals of sysproc
subsystem.
The aspects
of sysproc
subsystem for a user or administrator
is divided into the following main categories:
- Process (or thread) creation or termination
- Signal generation and delivery
- Scheduler and dispatcher events
- DR and CPU binding events
Process (or thread) creation or termination
- Did a process exit naturally or because of an error?
- When a process or thread got created or terminated or exceed?
- How long did a process run?
- Track events when a thread receives or returns from an exception.
Signal generation and delivery
- Signal source and signal information for a specific target.
- Signal delivery of asynchronous signals.
- Trace signal clears.
- Trace events when a signal handler other than default is installed.
- Signal target and signal information for a specific source.
- Trace signal handler entry or exit.
Scheduler and dispatcher events
Scheduler and dispatcher dictate how a process or thread runs in the system. Administrator analyzes system performance by using dynamic trace scheduler or dispatcher subsystem.
The dynamic trace scheduler or dispatcher subsystem helps discover the reasons for retention of threads.
- Trace thread or threads that are enqueued or dequeued from the run queue.
- Trace events when any thread in the system is preempted.
- Trace when a thread is being put to sleep over an event.
- Trace when a sleeping thread is being woken up.
- Track dispatches latency of a thread.
- Track virtual processor folding events.
- Trace change in any kernel thread priority.
Dynamic Reconfiguration (DR), and CPU binding events
This class of probes offer dynamic tracing capabilities to a user who tracks resources bound to a process.
- Track when a thread binding changes from one CPU to another.
- Track when the resources are attached or detached to a process.
- Track CPU binding events.
- Track start or end of a DR event.
Probe specification
The following format must be used in a Vue script to probe sysproc events:
@@sysproc:<sysproc_event>:<pid/tid/*>
First
tuple @@sysproc
indicates that this probe is specific
to sysproc events.
Second tuple specifies the event to be probed.
Signal send event, where either the process that is sending the signal or the one receiving it, can be useful. The following information specifies the appropriate filters for such probe events.
Probe points (events of interest)
A brief description of all events that can be probed through the sysproc probe manager is mentioned in the following table:
Probe (sysproc_event) | Description |
---|---|
forkfail | Track failures in fork interface. |
execfail | Track failures in exec interface. |
execpass | Track exec success. |
exit | Track exit of a process. |
threadcreate | Track creation of a kernel thread. |
threadterminate | Track termination of a kernel thread. |
threadexcept | Track process exceptions. |
sendsig | Track signal sent to a process by external sources. |
sigqueue | Tracks signals queued to a process |
sigdispose | Tracks signal disposals. |
sigaction | Track signal handler installations and reinstallations |
sighandlestart | Track when a signal handler is about to be called. |
sighandlefinish | Track when a signal handler completion |
changepriority | Track when priority of a process changes |
onreadyq | Track when a kernel thread gets on a ready queue. |
offreadyq | Track when a kernel thread is moved out of ready queue. |
dispatch | Track when the system dispatcher is called to schedule a thread |
oncpu | Track when a kernel thread acquires CPU. |
offcpu | Track when a kernel thread relinquishes CPU. |
blockthread | Track when a thread is blocked from getting CPU. |
foldcpu | Track folding of a CPU core. |
bindprocessor | Track event when a process/thread is bound to a CPU |
changecpu | Track events when a kernel thread changes CPU temporarily |
resourceattach | Track events when a resource is attached to another |
resourcedetach | Track events when a resource is detached from another |
drphasestart | Track when a drphase is getting initiated |
drphasefinish | Track when a drphase completes |
Method to access data at a probe-point
ProbeVue allows data access through built-in variables.
- Accessible at any probe point, irrespective of the probe manager.
For example:
__curthread
. - Accessible throughout probes of a specific probe manager.
- Accessible only at defined probes (events of interest)
Following are the list of built values of type (1).
- __trcid
- __errno__kernelmode
- __arg1 to __arg7
- __curthread
- __curproc
- __mst
- __tid
- __pid
- __ppid
- __pgid
- __uid
- __euid
- __ublock
- __execname
- __pname
The built-in variables are also classified as context specific and context independent. Context-specific built-ins provide data based on the execution context of the probe.
AIX kernel operates in thread or interrupt context. Context-specific probes produce correct result when probe is started at thread or process context.
Results that are obtained from context-specific built-ins in interrupt execution context might be unexpected. Context-independent built-ins do not depend on the execution context and can be accessed safely irrespective of probe execution environment.
Context specific built-in variables | Context independent built-in variables |
---|---|
__curthread | __trcid |
__curproc | __errno |
__tid | __kernelmode |
__pid | __arg1 to __arg7 |
__ppid | __mst |
__pgid | |
__uid | |
__euid | |
__ublock | |
__pname | |
__execname |
Probe points
Probe points are the specific events for which a probe is fired. Following are the list of probe points.
forkfail
The forkfail
probe
starts when fork fails. This probe determines the reasons of fork
failure.
Syntax: @@sysproc:forkfail:<pid/tid/*>
Special built-in supported
__forkfailinfo
{
fail_reason;
}
The fail_reason
variable has
one of the following values:
Reason | Description |
---|---|
FAILED_RLIMIT | Failed due to rlimit limitations |
FAILED_ALLOCATIONS | Failed due to internal resource allocations |
FAILED_LOADER | Failed at a loader stage |
FAILED_PROCDUP | Failed at procdup |
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
, __ublock
, __execname
, __pname
.Execution environment
Runs in process environment.
Example
Following example shows how to monitor all fork failures because of rlimit in the system.
@@BEGIN
{
x = 0;
}
@@sysproc:forkfail:*
when (__forkfailinfo->fail_reason == FAILED_RLIMIT)
{
printf ("process %s with pid %llu failed to fork a child\n",__pname,__pid);
x++;
}
@@END
{
printf ("Found %d failures during this vue session\n",x);
}
execfail
The execfail
probe
starts when a exec
function call fails. Use the execfail
probe
to determine the reasons for the failure.
Syntax: @@sysproc:execfail:<pid/tid/*>
Reason | Description |
---|---|
FAILED_PRIVILEGES | New process failed to acquire or inherit privileges |
FAILED_COPYINSTR | New process failed to copy instruction |
FAILED_V_USERACC | New process failed to discard v_useracc regions |
FAILED_CLEARDATA | Failed during clearing data for new process |
FAILED_PROCSEG | Failed to establish process private segment |
FAILED_CH64 | Failed to convert to a 64-bit process |
FAILED_MEMATT | Failed to attach to a memory resource set |
FAILED_SRAD | Failed to attach to a srad |
FAILED_MSGBUF | Error message buffer length is zero |
FAILED_ERRBUF | Failed to allocate error message buffer |
FAILED_ENVAR | Failed to allocate environment variables |
FAILED_CPYSTR | Copy string error |
FAILED_ERRBUFCPY | Failed to copy the error messages from errmsg_buf |
FAILED_TOOLNGENV | Env too long for allocated memory |
FAILED_USRSTK | Failed to setup user stack |
FAILED_CPYARG | Failed to copy arglist to stack |
FAILED_INITPTRACE | Failed to init ptrace |
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
, __ublock
, __execname
, __pname
.Execution environment
Runs in process environment.
exit
This
probe starts when a process exits. Exit is also a system call manager
and is traced through system call probe manager. Probing exit system
call through sysproc
probe manager explains nature
and reasons of exit. It also explains reasons for a user thread termination
in kernel space and not returned to user space.
Syntax: @@sysproc:forkfail:<pid/tid/*>
A program can exit because of the following reasons:
- On reaching a terminal condition when a user space program cannot proceed further.
- On receiving a terminal signal.
Special built-in supported
__exitinfo{
signo;
returnval;
iscore;
}
Where, signo value signifies the signal number that caused process termination, returnval is the value that is returned by exit. Nonzero signo is valid only if the program is stopped by a signal.
The iscore
variable is set when a
core is generated as a result of process exit.
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
, __ublock
, __execname
, __pname
.
Execution Environment
Runs in process environment.
Example
Following
example explains how to probe exit
event
echo '@@sysproc:exit:* { printf (" %s %llu %llu\n", __pname, __pid,__exitinfo->returnval);}' | probevue
Which will produce an output similar to the following.
ksh 5833042 0
telnetd 7405958 1
dumpctrl 7405960 0
setmaps 7275006 0
termdef 7274752 0
hostname 7274754 0
id 8257976 0
id 8257978 0
uname 8257980 0
expr 8257982 1
threadcreate
threadcreate
probe
starts when a thread is created successfully.
Syntax: @@sysproc:threadcreate:<pid/tid/*>
pid
or tid
must be the
process or thread ID of the process or thread that created the thread. Special built-in supported
__threadcreateinfo
{
tid;
pri;
policy;
}
where tid
indicates the thread id
of new thread that is created, and priority is the priority of the
thread. Policy denotes the thread scheduling policy of the thread.
Policy | Description |
---|---|
SCHED_OTHER | default AIX scheduling policy |
SCHED_FIFO | first in-first out scheduling policy |
SCHED_RR | round robin scheduling policy |
SCHED_LOCAL | local thread scope scheduling policy |
SCHED_GLOBAL | global thread scope scheduling policy |
SCHED_FIFO2 | FIFO with RQHEAD after short sleep |
SCHED_FIFO3 | FIFO with RQHEAD all the time |
SCHED_FIFO4 | FIFO with weak preempt |
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
, __ublock
, __execname
, __pname
.
Execution environment
Runs in process environment (user or kproc).
Example
To continuously print all processes in the system creating a thread printing process name, creating process id , id of the newly created thread and creation time-stamp.
echo '@@sysproc:threadcreate:*
{ printf ("%s %llu %llu %A\n",__pname,__pid,__threadcreateinfo->tid,timestamp());}' | probevue
An output similar to the following example is displayed.
nfssync_kproc 5439964 23921151 Feb/22/15 09:22:38
nfssync_kproc 5439964 24052201 Feb/22/15 09:22:38
nfssync_kproc 5439964 23920897 Feb/22/15 09:22:38
nfssync_kproc 5439964 22479285 Feb/22/15 09:22:55
nfssync_kproc 5439964 23920899 Feb/22/15 09:22:55
nfssync_kproc 5439964 22479287 Feb/22/15 09:22:55
threadterminate
The probe strarts for a thread which is terminated.
Syntax: @@sysproc:threadterminate:<pid/tid/*>
Special built-ins supported
None.
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process environment (user or kproc).
Example
To continuously print all processes in the system terminating a thread printing process name, creating process id , id of the newly created thread and creation time-stamp.
# echo '@@sysproc:threadterminate:* { printf ("%s %llu %llu %A\n",__pname,__pid,__tid,timestamp());}' | probevue
A output similar to one shown below can be observed.
nfssync_kproc 5439964 23855555 Feb/22/15 09:59:30
nfssync_kproc 5439964 21758249 Feb/22/15 09:59:30
nfssync_kproc 5439964 23855557 Feb/22/15 09:59:30
threadexcept
This probe starts when a program exception occurs. A program exception is generated when system detects a condition in which a program cannot continue normally. Some exceptions are fatal (illegal instruction) while some can be recovered (address space change).
Syntax: @@sysproc:threadexcept:<pid/tid/*>
Special built-ins supported
__threadexceptinfo
{
pid;
tid;
exception;
excpt_address
}
where pid
denotes process ID of the
process that received exception, tid is the thread ID of the
kernel thread that received exception, excpt_address
is
address that caused this exception while exception can assume one
of the values as denoted in the table.
Exception | Description |
---|---|
EXCEPT_FLOAT | Floating point exception |
EXCEPT_INV_OP | Invalid op-code |
EXCEPT_PRIV_OP | Privileged op in user mode |
EXCEPT_TRAP | Trap instruction |
EXCEPT_ALIGN | Code or data alignment |
EXCEPT_INV_ADDR | Invalid address |
EXCEPT_PROT | Protection |
EXCEPT_IO | Synchronous I/O |
EXCEPT_IO_IOCC | I/O exception from IOCC |
EXCEPT_IO_SGA | I/O exception from SGA |
EXCEPT_IO_SLA | I/O exception from SLA |
EXCEPT_IO_SCU | I/O exception from SCU |
EXCEPT_EOF | Reference beyond end-of-file (mmap) |
EXCEPT_FLOAT_IMPRECISE | Imprecise floating point exception |
EXCEPT_ESTALE_I | Stale text segment exception |
EXCEPT_ESTALE_D | Stale data segment exception |
EXCEPT_PT_WATCHP | Hit ptrace watchpoint |
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process or interrupt environment.
__pid
, __tid
that depend upon
the execution context might not indicate the process or thread id.
Special built-in members for this probe guarantee correct process
or thread id intended for the process or thread.Example
Following example shows trace program exceptions generated by a prove event being traced by a debugger.
# cat threadexcept.e
@@sysproc:threadexcept:*
{
printf ("PID = %llu TID= %llu EXCEPTION=%llu ADDRESS = %llu\n ",__threadexceptinfo->pid,__threadexceptinfo->tid,__threadexceptinfo-
>exception,__threadexceptinfo->excpt_address);
}
Run a debugging session on a program compiled with debugging support
# dbx a.out
Type 'help' for help.
Core file "core" is older than current program (ignored)
reading symbolic information ...
(dbx) stop in main
[1] stop in main
(dbx) r
[1] stopped in main at line 5
5 int a=5;
A output similar to one shown below can be observed.
PID = 6816134 TID= 24052015 EXCEPTION=131 ADDRESS = 268436372
sendsig
This probe is started when a signal is sent to a process through external sources ( other process , process from user space, from kernel streams or Interrupt context)
Syntax:@@sysproc:sendsig:<pid/*>
__dispatchinfo{
cpuid; <- cpu id
oldpid; <- pid of the thread currently running
oldtid; <- thread id of the thread currently running
oldpriority; <- priority of the thread currenly running
newpid; <- pid of the new process process selected for running
newtid; <- thread id of the thread selected for running
newpriority; <-priority of the thread selected for running
}
where pid id the process identifier of the target process receiving the signal. This probe does not allow specifying a thread identifier to filter results specific to a thread.
Special built-ins
_sigsendinfo{
tpid; ← target pid
spid; ← source pid
signo; ← signal sent
}
where tpid
is the target source process
identifier, spid
identifies source of the signal.
The spid
is non-zero when signal is sent from user
space or process context. Source process identifier is 0 if signal
is sent from an exception or interrupt context. Signal number information
is contained in signo
.
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process or interrupt environment.
__pid
, __tid
, which
depend upon thread execution context might not indicate the process
or thread id of interest. Special built-in members for this probe
guarantee correct process or thread id intended for the process or
thread. When this probe starts in process context, built-in members
that depend on execution context point to source process. built-in
members like __pid
, __tid
, and __curthread
provide
information regarding the source process.
Example
To continuously print signal source signal target and signal number of all signals.
echo '@@sysproc:sendsig:* {printf ("Source=%llu Target=%llu sig=%llu\n",__sigsendinfo->spid,__sigsendinfo->tpid,__sigsendinfo->signo);}' |
probevue
A output similar to one shown below can be observed.
Source=0 Target=6619618 sig=14
Source=0 Target=8257944 sig=20
Source=0 Target=8257944 sig=20
sigqueue
This probe starts when a queued signal is being sent to the process.
Syntax:@@sysproc:sigqueue:<pid/*>
Special built-ins
_sigsendinfo{
tpid; ← target pid
spid; ← source pid preprocess.cp
signo; ← signal sent
}
Since posix signals are queued to a process, specifying thread identifier is not allowed in this probe.
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
This probe starts in the context of the sending process. Hence, context-based built-ins refer to the sending process in this probe event.
Execution environment
This probe runs in process context.
Example
echo '@@sysproc:sigqueue:*{printf ("%llu %llu %llu\n",__sigsendinfo->spid,__sigsendinfo->tpid,__sigsendinfo->signo);}' | probevue
A output similar to one shown below can be observed.
8258004 6095294 31
sigdispose
Syntax : @@sysproc:sigdispose:<pid/tid/*>
Probe starts when a signal is disposed to a target process. Specify process ID of the process which received this signal in the sysprobe specification to filter this probe.
Special built-ins
__sigdisposeinfo{
tpid; ← target pid
ttid; ← target tid
signo; ← signal whose action is being taken.
fatal; ← will be set if the process is going to be killed as part of signal action
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
This probe can start from process or interrupt context. If started from interrupt context, this probe might not provide required value for context-based built-ins.
Example
Continuously print process identifier, thread identifier, signal number and indicate if this signal disposal will result in termination of the process for all processes in the system.
cat sigdispose.e
@@sysproc:sigdispose:*
{
printf ("%llu %llu %llu %llu\n",__sigdisposeinfo->tpid,__sigdisposeinfo->ttid, __sigdisposeinfo->signo,__sigdisposeinfo->fatal);
}
An output similar to one shown below is observed.
5964064 20840935 14 0
1 65539 14 0
4719084 19530213 14 0
sigaction
Syntax:@@sysproc:sigaction:<pid/tid/*>
This probe starts when a signal handler is installed or replaced.
Special built-ins
__sigactioninfo{
old_sighandle; ← old signal handler function address
new_sighandle; ←new signal handler function address
signo; ← Signal number
rpid; ← requester's pid
}
old_sighandle
will be 0 if a signal
handler is installed for the first time.
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
This probe starts in process environment.
Example
To track the beginning and finish of all signals in a system:
@@sysproc:sighandlestart:*
{
signal[__tid] = __sighandlestartinfo->signo;
printf ("Signal handler at address 0x%x invoked for thread id %llu to handle signal %llu\n",__sighandlestartinfo-
>sighandle,__curthread->tid,__sighandlestartinfo->signo);
}
@@sysproc:sighandlefinish:*
{
printf ("Signal handler completed for thread id %llu for signal %llu\n",__curthread->tid,signal[__tid]);
delete (signal,__tid);
}
An output similar to the one shown below can be observed.
Signal handler at address 0x20001d58 invoked for thread id 19923365 to handle signal 20
Signal handler completed for thread id 19923365 for signal 20
Signal handler at address 0x10003400 invoked for thread id 20840935 to handle signal 14
Signal handler completed for thread id 20840935 for signal 14
Signal handler at address 0x10002930 invoked for thread id 19530213 to handle signal 14
Signal handler completed for thread id 19530213 for signal 14
Signal handler at address 0x300275d8 invoked for thread id 22348227 to handle signal 14
Signal handler completed for thread id 22348227 for signal 14
Signal handler at address 0x20001a3c invoked for thread id 65539 to handle signal 14
Signal handler completed for thread id 65539 for signal 14
sighandlefinish
This probe starts at signal handler completion.
Syntax: @@sysproc:sighandlestart:<pid/tid/*>
Special built-ins supported: None.
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process environment. Protected, context switch is not allowed on executing CPU.
changepriority
This probe starts when the priority of a process is being changed. This event is not a scheduler or dispatcher-enforced.
Syntax: @@sysproc:changepriority:<pid/tid/*>
Special built-ins supported
__chpriorityinfo{
pid;
old_priority; <- current priority
new_priority; <- new scheduling priority of the thread.
}
Execution Environment
This probe runs in process environment.
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
, __ublock
, __execname
, __pname
.
Example
To track all processes whose priority is being changed:
echo '@@sysproc:changepriority:* { printf ("%s priority changing from %llu to %llu\n",__pname,__chpriorityinfo->old_priority,__chpriorityinfo-
>new_priority);}' | probevue
An output similar to one shown below can be observed.
xmgc priority changing from 60 to 17
xmgc priority changing from 17 to 60
xmgc priority changing from 60 to 17
xmgc priority changing from 17 to 60
xmgc priority changing from 60 to 17
offreadyq
This probe starts when a thread is removed from a system run queue.
Syntax:@@sysproc:offreadyq:<pid/tid/*>
Special built-ins supported
__readyprocinfo{
pid; <- process id of thread becoming ready
tid; <- Thread id.
priority; <- priority of the thread
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process or interrupt environment.
Use case: Trace time taken by a thread that is performing I/O operation to get back to ready queue.
@@BEGIN
{
printf (" Pid Tid Time Delta\n");
}
@@sysproc:offreadyq :*
{
ready[__tid] = timestamp();
printf ("offreadyq: %llu %llu %W\n",__readyprocinfo->pid,__readyprocinfo->tid,ready[__tid]);
}
@@sysproc:onreadyq :*
{
if (diff_time(ready[__tid],0,MICROSECONDS))
{
auto:diff = diff_time (ready[__tid],timestamp(),MICROSECONDS);
printf ("onreadyq : %llu %llu %W %llu\n",__readyprocinfo->pid,__readyprocinfo->tid,ready[__tid],diff);
delete (ready,__tid);
}
}
An output like the one showb below may be observed.
Pid Tid Time Delta
offreadyq: 7799280 20709717 5s 679697µs
onreadyq : 7799280 20709717 5s 679697µs 6
offreadyq: 7799280 20709717 5s 908716µs
onreadyq : 7799280 20709717 5s 908716µs 3
offreadyq: 7799280 20709717 6s 680186µs
onreadyq : 7799280 20709717 6s 680186µs 5
offreadyq: 7799280 20709717 6s 710720µs
onreadyq : 7799280 20709717 6s 710720µs 4
offreadyq: 7799280 20709717 6s 800720µs
onreadyq : 7799280 20709717 6s 800720µs 2
offreadyq: 7799280 20709717 6s 882231µs
onreadyq : 7799280 20709717 6s 882231µs 2
offreadyq: 7799280 20709717 6s 962313µs
onreadyq : 7799280 20709717 6s 962313µs 2
offreadyq: 7799280 20709717 6s 980311µs
onreadyq : 7799280 20709717 6s 980311µs 2
onreadyq
This probe starts when a thread is enqueued to system ready queue or its position in ready queue is modified.
Syntax:@@sysproc:offreadyq:<pid/tid/*>
Special built-ins supported
__readyprocinfo{
pid; <- process id of thread becoming ready
tid; <- Thread id.
priority; <- priority of the thread
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process or interrupt environment.
dispatch
This probe starts when system dispatcher is called to select a thread to run on a specific CPU.
Syntax:@@sysproc:dispatch:<pid/tid/*>
Special built-in supported
__dispatchinfo{
cpuid; <- CPU where selected thread will run.
oldpid; <- pid of the thread currently running
oldtid; <- thread id of the thread currently running
oldpriority; <- priority of the thread currenly running
newpid; <- pid of the new process process selected for running
newtid; <- thread id of the thread selected for running
newpriority; <-priority of the thread selected for running
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in interrupt environment only.
Example
print process thread id of old and selected thread on CPU '0' with dispatch time relative to start of the script
echo '@@sysproc:dispatch:* when (__cpuid == 0){printf ("%llu %llu %W\n",__dispatchinfo->oldtid,__dispatchinfo->newtid,timestamp());}' |
probevue
An output similar to the one shown below can be observed.
24641983 20709717 0s 48126µs
20709717 23593357 0s 48164µs
23593357 20709717 0s 48185µs
20709717 23593357 0s 48214µs
23593357 20709717 0s 48230µs
20709717 23593357 0s 48288µs
23593357 261 0s 48303µs
261 20709717 0s 48399µs
Example II
Time spent on CPU '0' by threads in between dispatch event.
@@BEGIN
{
printf ("Thread cpu Time-Spent\n");
}
@@sysproc:dispatch:* when (__cpuid == $1)
{
if (savetime[__cpuid] != 0)
auto:diff = diff_time (savetime[__cpuid],timestamp(),MICROSECONDS);
else
diff = 0;
savetime[__cpuid] = timestamp();
printf ("%llu %llu %llu\n",__dispatchinfo->oldtid,__dispatchinfo->cpuid,diff);
}
# probevue cputime.e 6
Thread cpu Time-Spent
3146085 6 0
3146085 6 9995
3146085 6 10002
3146085 6 10008
3146085 6 99988
3146085 6 100006
3146085 6 99995
3146085 6 99989
3146085 6 100010
3146085 6 100001
3146085 6 100000
3146085 6 99998
As can be observed thread 3146085 is being re-dispatched on the CPU at an interval of 1sec in absence of any other thread competing for this
CPU.
oncpu
This probe starts when a new process or thread acquires CPU.
Syntax:@@sysproc:oncpu:<pid/tid/*>
Where pid
is
process identifier and tid
is thread identifier of
process or thread that is acquiring the CPU.
Special built-ins supported
__dispatchinfo{
cpuid; <- CPU where selected thread will run.
newpid; <- pid of the new process process selected for running
newtid; <- thread id of the thread selected for running
newpriority; <-priority of the thread selected for running
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in interrupt environment only.
Example
To print time spent by threads of sysncd on all CPU's
#!/usr/bin/probevue
@@BEGIN
{
printf ("PROCESSID THREADID CPU TIME\n");
}
@@sysproc:oncpu:$1
{
savetime[__cpuid] = timestamp();
}
@@sysproc:offcpu:$1
{
if (savetime[__cpuid] != 0)
auto:diff = diff_time (savetime[__cpuid],timestamp(),MICROSECONDS);
else
diff = 0;
printf ("%llu %llu %llu %llu\n",
__dispatchinfo->oldpid,
__dispatchinfo->oldtid,
__dispatchinfo->cpuid,
diff);
}
# cputime.e `ps aux|grep syncd| grep -v grep| cut -f 6 -d " "`
An output like on the shown below can be observed.
3735998 18612541 0 2
3735998 15663427 0 1
3735998 15073557 0 1
3735998 18743617 0 1
3735998 18874693 0 1
3735998 18809155 0 15
3735998 18940231 0 20
3735998 18547003 0 1
3735998 19267921 0 1
3735998 19071307 0 17
3735998 18678079 0 1
3735998 18481465 0 1
3735998 19202383 0 15
3735998 19005769 0 1
3735998 19136845 0 19
3735998 6160689 0 190
offcpu
This probe starts when a process or thread is dispatched from a CPU.
Syntax:@@sysproc:dispatch:<pid/tid/*>
Special built-ins supported
__dispatchinfo{
cpuid; <- CPU where selected thread will run.
newpid; <- pid of the new process process selected for running
newtid; <- thread id of the thread selected for running
newpriority; <-priority of the thread selected for running
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in interrupt environment only.
blockthread
This probe starts when a thread is blocked from running on a CPU. Blocking is a form of sleeping when a thread sleeps without holding any resources.
Syntax: @@sysproc:blockthread:*
Special built-ins supported
__sleepinfo{
pid;
tid;
waitchan; <-- wait channel of this sleep.
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in interrupt environment only.
foldcpu
This probe starts when a CPU core is about to be folded. This probe does not happen in process context and must not be filtered with a pid or tid.
Syntax: @@sysproc:foldcpu:*
Special built-ins supported
__foldcpuinfo{
cpuid; <- logical cpu id which triggers core folding
gpcores; <- general purpose (unfolded, non-exclusive) cores available.
}
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
.
Example:
To track all CPU folding events in the system:
__foldcpuinfo{
cpuid; <- logical cpu id which triggers core folding
gpcores; <- general purpose (unfolded, non-exclusive) cores available.
}
bindprocessor
Syntax: @@sysproc:bindprocessor:<pid/tid/*>
This probe starts when a thread or process is bound to a CPU. Bindprocessor is a permanent event and must not be confused with temporary CPU switches.
Special built-ins supported
__bindprocessorinfo{
ispid <- 1 if cpu is bound to process; 0 for a thread
id; <- thread or process id.
cpuid;
};
Other supported built-ins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process environment.
changecpu
This
probe starts when a thread changes CPU temporarily. This event is
more likely to be captured during CPU funneling events or intentional
jumps of some kproc
events to perform CPU related
tasks (the xmgc
process jumps to all CPUs to manage
kernel heaps) special built-ins.
Syntax: @@sysproc:changecpu:*>
Special built-ins supported
__changecpuinfo
{
oldcpuid; <-source CPU
newcpuid; <- target CPU
pid;
tid; <-Thread id
}
Other supported builtins
__errno__kernelmode
, __arg1
to __arg7
, __curthread
, __curproc
, __mst
, __tid
, __pid
, __ppid
, __pgid
, __uid
, __euid
.
Execution environment
Runs in process environment.
Example
@@sysproc:changecpu:*
{
printf ("changecpu PID=%llu TID=%llu old_cpuid=%d new_cpuid= %d \n",
__changecpuinfo->pid,__changecpuinfo->tid,__changecpuinfo->oldcpuid,__changecpuinfo->newcpuid);
}
An output like the one shown below may be observed.
changecpu PID=852254 TID=1769787 old_cpuid=26 new_cpuid= 27
changecpu PID=852254 TID=1769787 old_cpuid=-1 new_cpuid= 0
changecpu PID=852254 TID=1769787 old_cpuid=0 new_cpuid= 1
changecpu PID=852254 TID=1769787 old_cpuid=1 new_cpuid= 2
resourceattach
This probe is fired when a resource is attached to another resource in the system.
Syntax: @@sysproc:resourceattach:*>
Special built-ins supported
__srcresourceinfo{
type;
subtype;
id; <- resource type identifier
offset; <-offset if a memory resource
length; <- length if a memory resource
policy;
}
__tgtresourceinfo{
type;
subtype;
id; <- resource type identifier
offset; <-offset if a memory resource
length; <- length if a memory resource
policy;
}
Resource type | Description |
---|---|
R_NADA | Nothing - invalid specification |
R_PROCESS | Process |
R_RSET | Resource set |
R_SUBRANGE | Memory range |
R_SHM | Shared Memory |
R_FILDES | File identified by an open file |
R_THREAD | Thread |
R_SRADID | SRAD identifier |
R_PROCMEM | Process Memory |
Other supported builtins
__errno__kernelmode
, __arg1
to __arg7
, __mst
.
Execution environment
Runs in process environment.
resourcedetach
This probe is fired when a resource is detached from another resource in the system.
Syntax: @@sysproc:resourcedetach:*>
Special built-ins supported
__srcresourceinfo{
type;
subtype;
id; <- resource type identifier
offset; <-offset if a memory resource
length; <- length if a memory resource
policy;
}
__tgtresourceinfo{
type;
subtype;
id; <- resource type identifier
offset; <-offset if a memory resource
length; <- length if a memory resource
policy;
}
Resource type | Description |
---|---|
R_NADA | Nothing - invalid specification |
R_PROCESS | Process |
R_RSET | Resource set |
R_SUBRANGE | Memory range |
R_SHM | Shared Memory |
R_FILDES | File identified by an open file |
R_THREAD | Thread |
R_SRADID | SRAD identifier |
R_PROCMEM | Process Memory |
Other supported builtins
__errno__kernelmode
, __arg1
to __arg7
, __mst
, __tid
, __pname
.
Execution environment
Runs in process environment.
drphasestart
This probe is fired when a dr handler is about to be called.
Syntax: @@sysproc:drphasestart:*
Special built-ins supported
__drphaseinfo{
dr_operation; ← dr operation
dr_flags;
dr_phase;
handler_rc; ← always 0 in drphasestart
}
dr_operation can have one of the following values:
- DR operation
- DR_RM_MEM_OPER
- DR_ADD_MEM_OPER
- DR_RM_CPU_OPER
- DR_ADD_CPU_OPER
- DR_CPU_SPARE_OPER
- DR_RM_CAP_OPER
- DR_ADD_CAP_OPER
- DR_RM_RESMEM_OPER
- DR_PMIG_OPER
- DR_WMIG_OPER
- DR_WMIG_CHECKPOINT_OPER
- DR_WMIG_RESTART_OPER
- DR_SOFT_RES_CHANGES_OPER
- DR_ADD_MEM_CAP_OPER
- DR_RM_MEM_CAP_OPER
- DR_CPU_AFFINITY_REFRESH_OPER
- DR_AME_FACTOR_OPER
- DR_PHIB_OPER
- DR_ACC_OPER
- DR_CHLMB_OPER
- DR_ADD_RESMEM_OPER
- Flag
- DRP_FORCE
- DRP_RPDP
- DRP_DOIT_SUCCESS
- DRP_PRE_REGISTERED
- DRP_CPU DRP_MEM DRP_SPARE
- DRP_ENT_CAP
- DRP_VAR_WGT
- DRP_RESERVE
- DRP_PMIG DRP_WMIG
- DRP_WMIG_CHECKPOINT
- DRP_WMIG_RESTART
- DRP_SOFT_RES_CHANGES
- DRP_MEM_ENT_CAP
- DRP_MEM_VAR_WGT
- DRP_CPU_AFFINITY_REFRESH
- DRP_AME_FACTOR
- DRP_PHIB
- DRP_ACC_UPDATE
- DRP_CHLMB
Other supported builtins
__errno__kernelmode
, __arg1
to __arg7
, __tid
Execution environment
Runs in process or interrupt environment.
Example