Case study of a situation in which disks are overloaded

You can identify overloaded disks and the dbspaces that reside on those disks. After you identify the overloaded disks, you can correct the problem.

The following case study illustrates a situation in which the disks are overloaded. This study shows the steps taken to isolate the symptoms and identify the problem based on an initial report from a user, and it describes the needed correction.

A database application that does not have the wanted throughput is being examined to see how performance can be improved. The operating-system monitoring tools reveal that a high proportion of process time was spent idle, waiting for I/O. The database server administrator increases the number of CPU VPs to make more processors available to handle concurrent I/O. However, throughput does not increase, which indicates that one or more disks are overloaded.

To verify the I/O bottleneck, the database server administrator must identify the overloaded disks and the dbspaces that reside on those disks.

To identify overloaded disks and the dbspaces that reside on those disks:

To check the asynchronous I/O (AIO) queues, use onstat -g ioq. Figure 1 shows the output.

Figure 1. Output from the onstat -g ioq option

AIO I/O queues:
q name/id    len maxlen totalops  dskread dskwrite  dskcopy
opt   0      0      0        0        0        0        0
msc   0      0      0        0        0        0        0
aio   0      0      0        0        0        0        0
pio   0      0      1        1        0        1        0
lio   0      0      1      341        0      341        0
gfd   3      0      1      225        2      223        0
gfd   4      0      1      225        2      223        0
gfd   5      0      1      225        2      223        0
gfd   6      0      1      225        2      223        0
gfd   7      0      0        0        0        0        0
gfd   8      0      0        0        0        0        0
gfd   9      0      0        0        0        0        0
gfd  10      0      0        0        0        0        0
gfd  11      0      28      32693     29603     3090        0
gfd  12      0      18      32557     29373     3184        0
gfd  13      0      22      20446     18496     1950        0

In Figure 1, the maxlen and totalops columns show significant results:

The maxlen column shows the largest backlog of I/O requests to accumulate within the queue. The last three queues are much longer than any other queue in this column listing.
The totalops column shows 100 times more I/O operations completed through the last three queues than for any other queue in the column listing.

The maxlen and totalops columns indicate that the I/O load is severely unbalanced.

Another way to check I/O activity is to use onstat -g iov. This option provides a slightly less detailed display for all VPs.

To check the AIO activity for each disk device associated with each queue, use onstat -g iof, as Figure 2 shows.

Figure 2. Partial output from the onstat -g iof option

gfd pathname         bytes read     page reads  bytes write    page writes io/s
3   /dev/infx5       85456896       41727       207394816      101267      572.9
	op type     count          avg. time
	seeks       0              N/A
	reads       13975          0.0015
	writes      51815          0.0018
	kaio_reads  0              N/A
	kaio_writes 0              N/A

Depending on how your chunks are arranged, several queues can be associated with the same device.

To determine the dbspaces that account for the I/O load, use onstat -d, as Figure 3 shows.

Figure 3. Output from the onstat -d option

   Dbspaces
   address  number   flags    fchunk   nchunks  flags    owner    name
   c009ad00 1        1        1        1        N        informix rootdbs
   c009ad44 2        2001     2        1        N T      informix tmp1dbs
   c009ad88 3        1        3        1        N        informix oltpdbs
   c009adcc 4        1        4        1        N        informix histdbs
   c009ae10 5        2001     5        1        N T      informix tmp2dbs
   c009ae54 6        1        6        1        N        informix physdbs
   c009ae98 7        1        7        1        N        informix logidbs
   c009aedc 8        1        8        1        N        informix runsdbs
   c009af20  9         1        9        3        N        informix  acctdbs 
    9 active, 32 total
   
   Chunks
   address  chk/dbs offset   size     free     bpages   flags pathname
   c0099574 1   1   500000   10000    9100              PO-   /dev/infx2
   c009960c 2   2   510000   10000    9947              PO-   /dev/infx2
   c00996a4 3   3   520000   10000    9472              PO-   /dev/infx2
   c009973c 4   4   530000   250000   242492            PO-   /dev/infx2
   c00997d4 5   5   500000   10000    9947              PO-   /dev/infx4
   c009986c 6   6   510000   10000    2792              PO-   /dev/infx4
   c0099904 7   7   520000   25000    11992             PO-   /dev/infx4
   c009999c 8   8   545000   10000    9536              PO-   /dev/infx4
   c0099a34 9    9    250000  450000    4947              PO-    /dev/infx5 
   c0099acc 10   9    250000  450000    4997              PO-    /dev/infx6 
   c0099b64 11   9    250000  450000    169997            PO-    /dev/infx7 
    11 active, 32 total

In the Chunks output, the pathname column indicates the disk device. The chk/dbs column indicates the numbers of the chunk and dbspace that reside on each disk. In this case, only one chunk is defined on each of the overloaded disks. Each chunk is associated with dbspace number 9.

The Dbspaces output shows the name of the dbspace that is associated with each dbspace number. In this case, all three of the overloaded disks are part of the acctdbs dbspace.

Although the original disk configuration allocated three entire disks to the acctdbs dbspace, the activity within this dbspace suggests that three disks are not enough. Because the load is about equal across the three disks, it does not appear that the tables are necessarily laid out badly or improperly fragmented. However, you might get better performance by adding fragments on other disks to one or more large tables in this dbspace or by moving some tables to other disks with lighter loads.