Performance Tuning -- The vmstat Tool


Contents

About this document
    Related documentation
About vmstat
Summary statistics

About this document

This document provides an overview of the output of the vmstat. This information applies to AIX Versions 4.x.

Related documentation

The fields produced by the s, f, and [Drives] options of vmstat are fully documented in the AIX Performance Tuning Guide, publication number SC23-2365, and in the online product documentation.

The product documentation library is also available:
http://www.rs6000.ibm.com/resource/aix_resource/Pubs/index.html


About vmstat

Although a system may have sufficient real resources, it may perform below expectations if logical resources are not allocated properly.

Use vmstat to determine real and logical resource utilization. It samples kernel tables and counters and then normalizes the results and presents them in an appropriate format.

By default, vmstat sends its report to standard out, but it can be run with the output redirected.

vmstat is normally invoked with an interval and a count specified. The interval is the length of time in seconds over which vmstat is to gather and report data. The count is the number of intervals to run. If no parameters are specified, vmstat will report a single record of statistics for the time since the system was booted. There may have been inactivity or fluctuations in the workload, so the results may not represent current activity. Be aware that the first record in the output presents statistics since the last boot (except when invoked with the -f or -s option). In many instances, this data can be ignored.

vmstat reports statistics about processes, virtual memory, paging activity, faults, CPU activity, and disk transfers. Options and parameters recognized by this tool are indicated by the usage prompt:

           vmstat [-fs] [Drives] [Interval] [Count]

The following figure lists output where the smallest work unit is called a kernel thread (kthr). The r and b under this column represent the number of "threads", not processes, placed on these queues.

    -------------------------------------------------------------------- 
   |                                                                    |
   |  kthr    memory          page           faults           cpu       |
   |  -----  --------   ----------------- --------------  -----------   |
   |  r  b   avm  fre   re pi po fr sr cy  in   sy   cs   us sy id wa   |
   |  0  0   6747 1253   0  0  0  0  0  0  114  10   22   0  1  26 0    |
   |  1  0   6747 1253   0  0  0  0  0  0  113  118  43   17 4  79 0    |
   |  0  0   6747 1253   0  0  0  0  0  0  118  99   33   8  3  89 0    |
   |                                                                    |
    -------------------------------------------------------------------- 
   Figure: Sample output from vmstat 1 3

kthr

The columns under the kthr heading in the output provide information about the average number of threads on various queues.

r

The r column indicates the average number of kernel threads on the run queue at one-second intervals.

This field indicated the number of "run-able" threads. The system counts the number of ready-to-run threads once per second and adds that number to an internal counter. vmstat then subtracts the initial value of this counter from the end value and divides the result by the number of seconds in the measurement interval. This value is typically less than five with a stable workload. If this value increases rapidly, look for an application problem. If there are many threads (especially CPU-intensive ones) competing for the CPU resource, it is quite possible they will be scheduled in round-robin fashion. If each one executes for a complete or partial time slice, the number of "run-able" threads could easily exceed 100.

b

The b column shows the average number of kernel threads on the wait queue at one-second intervals (awaiting resource, awaiting input/output).

Kernel threads are placed on the wait queue when scheduled for execution and are waiting for one of their process pages to be paged in. Once a second, the system counts the threads waiting and adds that number to an internal counter. vmstat then subtracts the initial value from the end value and divides the result by the number of seconds in the measurement interval. This value is usually near zero. Do not confuse this with wa -- waiting on input/output (I/O).

NOTE: On an SMP system, there will be an additional blocked process shown in the b column. This is for the lrud kproc that is part of the Virtual Memory Manager's (VMM) page-replacement algorithm.

Also, on a system with a compressed journaled file system (JFS) that is mounted, there will be an additional blocked process: the jfsc kproc.

memory

The information under the memory heading provides information about real and virtual memory.

avm

The avm column gives the average number of pages allocated to paging space. (In AIX, a page contains 4096 bytes of data.)

When a process executes, space for working storage is allocated on the paging devices (backing store). This can be used to calculate the amount of paging space assigned to executing processes. The number in the avm field divided by 256 will yield the number of megabytes (MB), systemwide, allocated to page space. The lsps -a command also provides information on individual paging space. It is recommended that enough paging space be configured on the system so that the paging space used does not approach 100 percent. When fewer than 128 unallocated pages remain on the paging devices, the system will begin to kill processes to free some paging space.

Versions of AIX before 4.3.2 allocated paging space blocks for pages of memory as the pages were accessed. On a large memory machine, where the application set is such that paging is never or rarely required, these paging space blocks were allocated but never needed. AIX Version 4.3.2 implements deferred paging space allocation, in which the paging space blocks are not allocated until paging is necessary, thus, helping reduce the paging space requirements of the system. The avm value in vmstat indicates the number of virtual memory (working storage) pages that have been accessed but not necessarily paged out. With the previous policy of "late page space allocation", avm had the same definition. However, since the VMM would allocate paging space disk blocks for each working page that was accessed, the paging space blocks was equal to the avm. The reason for the paging space blocks to be allocated at the time the working pages are accessed is so that if the pages had to be paged out of memory, there would be disk blocks on the page space lv's available for the in-memory pages to go. On systems that never page-out to page-space, it's a waste of disk space to have as many page space disk blocks as there is memory. With deferred policy, the page space disk blocks are only allocated for the pages that do need to be paged out. The avm number will grow as more processes get started and/or existing processes use more working storage. Likewise, the number will shrink as processes exit and/or free working storage.

fre

The fre column shows the average number of free memory frames. A frame is a 4096-byte area of real memory.

The system maintains a buffer of memory frames, called the free list, that will be readily accessible when the VMM needs space. The nominal size of the free list varies depending on the amount of real memory installed. On systems with 64MB of memory or more, the minimum value (MINFREE) is 120 frames. For systems with less than 64MB, the value is two times the number of MB of real memory, minus 8. For example, a system with 32MB would have a MINFREE value of 56 free frames.

If the fre value is substantially above the MAXFREE value (which is defined as MINFREE plus 8), then it is unlikely that the system is thrashing (continuously paging in and out). However, if the system is thrashing, be assured that the fre value is small. Most UNIX and AIX operating systems will use nearly all available memory for disk caching, so you need not be alarmed if the fre value oscillates between MINFREE and MAXFREE.

page

The information under the page heading includes information about page faults and paging activity.

re

The re column shows the number (rate) of pages reclaimed.

Reclaimed pages can satisfy an address translation fault without initiating a new I/O request (the page is still in memory). This includes pages that have been put on the free list but are accessed again before they are reassigned. It includes pages previously requested by VMM for which I/O has not yet been completed or those pre-fetched by VMM's read-ahead mechanism but hidden from the faulting segment.

pi

The pi column details the number (rate) of pages paged in from paging space.

Paging space is the part of virtual memory that resides on disk. It is used as an overflow when memory is overcommitted. Paging consists of paging logical volumes dedicated to the storage of working set pages that have been stolen from real memory. When a stolen page is referenced by the process, a page fault occurs and the page must be read into memory from paging space. There is no "good" number for this due to the variety of configurations of hardware, software, and applications.

One theory is that five page-ins per second should be the upper limit. Use this theoretical maximum as a reference but do not adhere to it rigidly. This field is important as a key indicator of paging space activity. If a page-in occurs, then there must have been a previous page-out for that page. It is also likely in a memory-constrained environment that each page-in will force a different page to be stolen and, therefore, paged out.

po

The po column shows the number (rate) of pages paged out to paging space.

Whenever a page of working storage is stolen, it is written to paging space. If not referenced again, it remains on the paging device until the process terminates or disclaims the space. Subsequent references to addresses contained within the faulted-out pages result in page faults, and the pages are paged in individually by the system. When a process terminates normally, any paging space allocated to that process is freed. If the system is reading in a significant number of persistent pages, you may see an increase in po without corresponding increases in pi. This situation does not necessarily indicate thrashing, but may warrant investigation into data access patterns of the applications.

fr

The fr column details the number (rate) of pages freed.

As the VMM page-replacement code routine scans the Page Frame Table (PFT), it uses criteria to select which pages are to be stolen to replenish the free list of available memory frames. The total pages stolen by the VMM -- both working (computational) and file (persistent) pages -- are reported as a rate per second. Just because a page has been freed does not mean that any I/O has taken place. For example, if a persistent storage (file) page has not been modified, it will not be written back to the disk. If I/O is not necessary, minimal system resources are required to free a page.

sr

The sr column details the number (rate) of pages scanned by the page-placement algorithm.

The VMM page-replacement code scans the PFT and steals pages until the number of frames on the free list is at least the MAXFREE value. The page-replacement code may have to scan many entries in the Page Frame Table before it can steal enough to satisfy the free list requirements. With stable, unfragmented memory, the scan rate and free rate may be nearly equal. On systems with multiple processes using many different pages, the pages are more volatile and disjointed. In this scenario, the scan rate may greatly exceed the free rate.

cy

The cy column provides the rate of complete scans of the Page Frame Table.

cy shows how many times (per second) the page-replacement code has scanned the Page Frame Table. Since the free list can be replenished without a complete scan of the PFT and because all of the vmstat fields are reported as integers, this field is usually zero.

faults

The information under the faults heading in the vmstat output provides information about process control.

in

The in column shows the number (rate) of device interrupts.

This column shows the number of hardware or device interrupts (per second) observed over the measurement interval. Examples of interrupts are disk request completions and the 10 millisecond clock interrupt. Since the latter occurs 100 times per second, the in field is always greater than 100.

sy

The sy column details the number (rate) of system calls.

Resources are available to user processes through well-defined system calls. These calls instruct the kernel to perform operations for the calling process and exchange data between the kernel and the process. Since workloads and applications vary and different calls perform different functions, it is impossible to say how many system calls per second are too many.

cs

The cs column shows the number (rate) of context switches.

The physical CPU resource is subdivided into logical time slices of 10 milliseconds each. Assuming a process is scheduled for execution, it will run until its time slice expires, it is preempted, or it voluntarily gives up control of the CPU. When another process is given control of the CPU, the context, or working environment, of the previous process must be saved and the context of the current process must be loaded. AIX has a very efficient context switching procedure, so each switch is inexpensive in terms of resources. Any significant increase in context switches is cause for further investigation.

cpu

The information under the cpu heading in the vmstat output provides a breakdown of CPU usage.

us

The us column shows the percent of CPU time spent in user mode.

Processes execute in user mode or system (kernel) mode. When in user mode, a process executes within its code and does not require kernel resources to perform computations, manage memory, set variables, and so on.

sy

The sy column details the percent of CPU time spent in system mode.

If a process needs kernel resources, it must execute a call and go into system mode to make that resource available. I/O to a drive, for example, requires a call to open the device, seek, and read and write data. This field shows the percent of time the CPU was in system mode. Optimum use would have the CPU working 100 percent of the time. This holds true in the case of a single-user system with no need to share the CPU. Generally, if us+sy time is below 90 percent, a single-user system is not considered CPU constrained. However, if us+sy time on a multi-user system exceeds 80 percent, the processes may spend time waiting in the run queue. Response time and throughput might suffer.

id

If there are no processes available for execution (the run queue is empty), the system dispatches a process called wait. The ps report (with the -k or g option) identifies this as kproc with a process ID (PID) of 516. Do not worry if your ps report shows a high aggregate time for this process. It means you have had significant periods of time when no other processes could run. If there are no I/Os pending to a local disk, all time charged to the wait process is classified as idle time.

wa

The wa column details CPU idle time (percent) with pending local disk I/O.

If there is at least one outstanding I/O to a local disk when the wait process is running, the time is classified as "waiting on I/O". A wa value over 40 percent could indicate that the disk subsystem may not be balanced properly, or it may be the result of a disk-intensive workload. If there is only one process available for execution -- often the case on a technical workstation -- there may be no way to avoid waiting on I/O.

NOTE: The wa column on SMP machines running AIX Version 4.3.2 or earlier is somewhat exaggerated. This is due to the method used in calculating wio.

Method used in AIX 4.3.2 and earlier AIX Versions

At each clock interrupt on each processor (100 times a second in AIX), a determination is made as to which of four categories (usr/sys/wio/idle) to place the last 10 milliseconds of time. If the CPU was busy in usr mode at the time of the clock interrupt, then usr gets the clock tick added into its category. If the CPU was busy in kernel mode at the time of the clock interrupt, then the sys category gets the tick. If the CPU was NOT busy, then a check is made to see if any I/O to disk is in progress. If any disk I/O is in progress, then the wio category is incremented. If NO disk I/O is in progress and the CPU is not busy, then the idl category gets the tick.


Summary statistics

vmstat with the -s option reports absolute counts of various events since the system was booted. There are 23 separate events reported in the vmstat -s output; the following 4 have proven most helpful. The 19 remaining fields contain a variety of activities from address translation faults to lock misses to system calls. The information in those 19 fields is also valuable but is less frequently used.

page ins

The page ins field shows the number systemwide page-ins.

When a page is read from disk to memory, this count is incremented. It is a count of VMM-initiated read operations and, with the page outs field, represents the real I/O (disk reads and writes) initiated by the VMM.

page outs

The page outs field shows the number of systemwide page-outs.

The process of writing pages to the disk is count incremented. The page outs field value is a total count of VMM-initiated write operations and, with the page ins field, represents the total amount of real I/O initiated by the VMM.

paging space page ins

The paging space page ins field is the count of ONLY pages read from paging space.

paging space page outs

The paging space page outs field is the count of ONLY pages written to paging space.

Using the summary statistics

The four preceding fields can be used to indicate how much of the system's I/O is for persistent storage. If the value for paging space page ins is subtracted from the (systemwide) value for page ins, the result is the number of pages that were read from persistent storage (files). Likewise, if the value for paging space page outs is subtracted from the (systemwide) value for page outs, the result is the number of persistent pages (files) that were written to disk.

Remember that these counts apply to the time since system initialization. If you need counts for a given time interval, execute vmstat -s at the time you want to start monitoring and again at the end of the interval. The deltas between like fields of successive reports will be the count for the interval. It is easier to redirect the output of the reports to a file and then perform the math.






[ Doc Ref: 90605226914708     Publish Date: Dec. 05, 2000     4FAX Ref: 6220 ]