0% found this document useful (0 votes)

434 views170 pages

Ebook Db2 Performance Handbook All en 1006

DAN LUKSETICH is a senior DB2 DBA, application architect, presenter, author, and teacher. He has over 17 years working with DB2 as a DB2 dba, application architect. She performs performance audits to help reduce costs through proper tuning.

Uploaded by

bluenilux

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

434 views170 pages

Ebook Db2 Performance Handbook All en 1006

Uploaded by

bluenilux

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 170

CA Performance

Handbook
for DB2 for z/OS
About the Contributors from Yevich, Lawson and Associates Inc.
DAN LUKSETICH is a senior DB2 DBA. He works as a DBA, application architect, presenter,
author, and teacher. Dan has over 17 years working with DB2 as a DB2 DBA, application
architect, system programmer and COBOL and BAL programmer — working on major
implementations on z/OS, AIX, and Linux environments.

His experience includes DB2 application design and architecture, database administration,
complex SQL and SQL tuning, performance audits, replication, disaster recovery, stored
procedures, UDFs, and triggers.

SUSAN LAWSON is an internationally recognized consultant and lecturer with a strong

background in database and system administration. She works with clients to help
development, implement and tune some of the world’s largest and most complex DB2
databases and applications. She also performs performance audits to help reduce costs
through proper performance tuning.

She is an IBM Gold Consultant for DB2 and z/Series. She has authored the IBM ‘DB2 for
z/OS V8 DBA Certification Guide’, DB2 for z/OS V7 Application Programming Certification
Guide’ and ‘DB2 for z/OS V9 DBA Certification Guide’ — 2007. She also co-authored several
books including ‘DB2 High Performance Design and Tuning’ and ‘DB2 Answers’ and is a
frequent speaker at user group and industry events. (Visit DB2Expert.com)

About CA
CA (NYSE: CA), one of the world's largest information technology (IT) management
software companies, unifies and simplifies the management of enterprise-wide IT for greater
business results. Our vision, tools and expertise help customers manage risk, improve service,
manage costs and align their IT investments with their business needs. CA Database
Management encompasses this vision with an integrated and comprehensive solution for
database design and modeling, database performance management, database administration,
and database backup and recovery across multiple database systems and platforms.

Copyright © 2007 CA. All Rights Reserved. One CA Plaza, Islandia, N.Y. 11749. All trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.
Table of Contents

About This Handbook

Originally known as “The DB2 Performance Tool Handbook” by PLATINUM Technologies
(PLATINUM was acquired by Computer Associates in 1999), this important update provides
information to consider as you approach database performance management, descriptions of
common performance and tuning issues during design, development, and implementation of a
DB2 for z/OS application and specific techniques for performance tuning and monitoring.
• Chapter 1: Provides a overview information on database performance tuning
• Chapter 2: Provides descriptions of SQL access paths
• Chapter 3: Describes SQL tuning techniques
• Chapter 4: Explains how to design and tune tables and indexes for performance
• Chapter 5: Describes data access methods
• Chapter 6: Describes how to properly monitor and isolate performance problem
• Chapter 7: Describes DB2 application performance features
• Chapter 8: Explains how to adjust subsystem parameters, and configure subsystem
resources for performance
• Chapter 9: Describes tuning approaches when working with packaged applications
• Chapter 10: Offers tuning tips drawn from real world experience

Who Should Read This Handbook

The primary audiences for this handbook are physical and logical database administrators.
Since performance management has many stakeholders, other audiences who will benefit
from this information include application developers, data modelers and data center managers.
For performance initiatives to be successful, DBAs, developers, and managers must cooperate
and contribute to this effort. This is because there is no right and wrong when it comes to
performance tuning. There are only trade-offs as people make decisions about database design
and performance. Are you designing for ease of use, minimized programming effort, flexibility,
availability, or performance? As these choices are made, there are different organizational costs
and benefits whether they are measured in the amount of time and effort by the personnel
involved in development and maintenance, response time for end users, or CPU metrics.

This handbook assumes a good working knowledge of DB2 and SQL, and is designed to help
you build good performance into the application, database, and the DB2 subsystem. It provides
techniques to help you monitor DB2 for performance, and to identify and tune production
performance problems.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS i

Page intentionally left blank
CHAPTER 1

Introducing DB2 Performance

Many people spend a lot of time thinking about DB2 performance, while others spend no time
at all on the topic. Nonetheless, performance is on everyone’s mind when one of two things
happen: an online production system does not respond within the time specified in a service
level agreement (SLA), or an application uses more CPU (and thus costs more money) than is
desired. Most of the efforts involved in tuning during these situations revolve around fire
fighting, and once the problem is solved the tuning effort is abandoned.

This handbook aims to provide information that performance fire fighters and performance
tuners need. It also provides the information necessary to design databases for high
performance, to understand how DB2 processes SQL and transactions, and to tune those
processes for performance. For database performance management, the philosophy of
performance tuning begins with database design, and extends through (database and
application) development and production implementation.

Performance tuning can be categorized as follows:

• Designing for performance (application and database)
• Testing and measuring performance design efforts
• Understanding SQL performance and tuning SQL
• Tuning the DB2 subsystem
• Monitoring production applications

An understanding of DB2 performance, even at a high level, goes a long way to ensure well
performing applications, databases, and subsystems.

Using This Handbook

This handbook provides important information to consider as you approach database
performance management, descriptions of common performance and tuning issues during
design, development, and implementation of a DB2 for z/OS application and specific
techniques for performance tuning and monitoring.
• Throughout this handbook, you’ll find:
• Fundamentals of DB2 performance
• Descriptions of features and functions that help you manage and optimize performance
• Methods for identifying DB2 performance problems
• Guidelines for tuning both static and dynamic SQL
• Guidelines for the proper design of tables and indexes for performance

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 1

INTRODUCING DB2 PERFORMANCE

• DB2 subsystem performance information , and advice for subsystem configuration and tuning
• Commentary for proper application design for performance
• Real-world tips for performance tuning and monitoring

The following chapters discuss how to fix various performance problems:

• Chapter 2 provides descriptions of SQL access paths
• Chapter 3 describes SQL tuning techniques
• Chapter 4 explains how to design and tune tables and indexes for performance
• Chapter 5 describes data access methods
• Chapter 6 describes how to properly monitor and isolate performance problems
• Chapter 7 describes DB2 application performance features that you can take advantage of
• Chapter 8 explains how to adjust subsystem parameters, and configure subsystem resources
for performance
• Chapter 9 describes actions that can be taken when working with packaged applications
• Chapter 10 offers tuning tips drawn from experience at a variety of implementations

Measuring DB2 Performance

Performance is a relative term. That is, good performance is a balance between expectations
and reality. Good performance is not something that is typically delivered at the push of a
button. If someone tells you they want everything to run as fast as possible, then you should
tell them that the solution is to buy an infinite number of machines. In reality, however, in order
for effective performance tuning to take place, proper expectations and efforts should be
identified and understood. Only then will the full benefits of performance tuning be realized.

Most people don’t inherently care about performance. They want their data, they want it now,
they want it accurate, they want it all the time, and they want it to be secured. Of course, once
they get these things they are also interested in getting the data fast, and at the lowest cost
possible. Enter performance design and tuning.

Getting the best possible performance from DB2 databases and applications means a number
of things. First and foremost, it means that users do not need to wait for the information they
require to do their jobs. Second, it means that the organization using the application is not
spending too much money running the application. The methods for designing, measuring, and
tuning for performance vary from application to application. The optimal approach depends
upon the type of application. Is it is an online transaction processing system (OLTP)? A data
warehouse? A batch processing system?

2 ca.com
INTRODUCING DB2 PERFORMANCE

Your site may use one or all of the following to measure performance:
• User complaints
• Overall response time measurements
• Service level agreements (SLAs)
• System availability
• I/O time and waits
• CPU consumption and associated charges
• Locking time and waits

(See Chapters 2, 3, 5 and 6.)

Possible Causes of DB2 Performance Problems

When the measurements you use at your site warn you of a performance problem, you face
the challenge of finding the cause of the problem. Some common causes of performance
problems are:
• Poor database design
• Poor application design
• Catalog statistics that do not reflect the data size and/or organization
• Poorly written SQL
• Generic SQL design
• Inadequately allocated system resources, such as buffers and logs

In other situations you may be dealing with packaged software applications that cause
performance problems. In these situations it may be difficult to tune the application but quite
possible to improve performance through database and subsystem tuning. (See Chapter 9.)

Designing for Performance

An application can be designed and optimized for high availability, ease of programming,
flexibility and reusability of code, or performance. The most thorough application design
should take into account all of these things. Major performance problems are caused when
performance is entirely overlooked during design, is considered late in the design cycle or is
not properly understood, tested and addressed prior to production.

Designing an application with regards to performance will help avoid many performance
problems after the application has been deployed. Proactive performance engineering can
help eliminate redesign, recoding, and retrofitting during implementation in order to satisfy
performance expectations, or alleviate performance problems. Proactive performance
engineering allows you to analyze and stress test the designs before they are implemented.
Proper techniques (and tools) can turn performance engineering research and testing from
a lengthy process inhibiting development to one that can be performed efficiently prior
to development.
CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 3
INTRODUCING DB2 PERFORMANCE

There are many options for proper index and table design for performance. These options have
to do specifically with the type of application that is going to be implemented, specifically with
the type of data access (random versus sequential, read versus update, etc.). In addition, the
application as well as to be properly designed for performance. This involves understanding the
application data access patterns, as well as the units of work (UOWs) and inputs. Taking these
things into consideration, and designing the application and database accordingly is the best
defense against performance problems. (See Chapters 4 and 7.)

Finding and Fixing Performance Problems

How do we find performance problems? Are SQL statements in an application causing
problems? Does the area of concern involve the design of our indexes and tables? Or, perhaps
our data has become disorganized, or the catalog statistics don’t accurately reflect the data in
the tables? It can also be an issue of dynamic SQL, and poor performing statements, or
perhaps the overhead of preparing the dynamic SQL statements.

When it comes to database access, finding and fixing the worst performing SQL statements is
the top priority. However, how does one prioritize the worst culprits? Are problems caused by
the statements that are performing table space scans, or the ones with matching index access?
Performance, of course, is relevant, and finding the least efficient statement is a matter of
understanding how that statement is performing relative to the business need that it serves.
(See Chapter 6.)

Fixing the Performance Problems

Once a performance problem has been identified, what is the best approach to address it? Is it
a matter of updating catalog statistics? Tuning an SQL statement? Adjusting parameters for
indexing, table space, or subsystem? With the proper monitoring in place, as described in
Chapter 6, the problem should be evident. Proper action will be the result of properly
understanding the problem first. (See Chapters 2-4; 7-10.)

4 ca.com
CHAPTER 2

SQL and Access Paths

One of the most important things to understand about DB2, and DB2 performance, is that
DB2 offers a level of abstraction from the data. We don’t have to know where exactly the data
physically exists. We don’t need to know where the datasets are, we don’t need to know the
access method for the data, and we don’t need to know what index to use. All we need to know
is the name of the table, and the columns we are interested in. DB2 takes care of the rest. This
is a significant business advantage in that we can quickly build applications that access data
via a standardized language that is independent of any access method or platform. This
enables ultra-fast development of applications, portability across platforms, and all the power
and flexibility that comes with a relational or object-relational design. What we pay in return
is more consumption of CPU and I/O.

Using DB2 for z/OS can be more expensive than more traditional programming and access
methods on the z/OS platform. Yes, VSAM and QSAM are generally cheaper than DB2. Yes,
flat file processing can be cheaper than DB2, sometimes! However, with continually shrinking
people resources and skill sets on the z/OS platform, utilizing a technology such as DB2 goes
a long way in saving programming time, increasing productivity, and portability. This means
leverage, and leverage goes a long way in the business world.

So, with the increased cost of using DB2 over long standing traditional access methods on
the z/OS platform, can we possibly get the flexibility and power of DB2, along with the best
performance possible? Yes, we can. All we need is an understanding of the DB2 optimizer, the
SQL language, and the trade-offs between performance, flexibility, portability, and people costs.

The DB2 Optimizer

There is no magic to computer programming. Computers are stupid. They only understand
zeros and ones. However, computers fool us in that in spite of their stupidity they are incredibly
fast. This speed enables an incredible number of calculations to be performed in a fraction of a
second. The DB2 optimizer is no exception. The DB2 optimizer (in a simplistic view) simply
accepts a statement, checks it for syntax, checks it against the DB2 system catalog, and then
builds access paths to the data. It uses catalog statistics to attach a cost to the various possible
access paths, and then chooses the cheapest path. Not to insult the incredible efforts that the
IBM DB2 development team has performed in making the DB2 optimizer the most accurate
and efficient engine for SQL optimization on the planet! However, understanding the optimizer,
the DB2 access paths, and how your data is accessed will help you understand and adjust DB2
performance.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 5

SQL AND ACCESS PATHS

The optimizer is responsible for interpreting your queries, and determining how to access your
data in the most efficient manner. However, it can only utilize the best of the available access
paths. It does not know what access paths are possible that are not available. It also can only
deal with the information that it is provided. That is, it is dependent upon the information we
give it. This is primarily the information stored in the DB2 system catalog, which includes the
basic information about the tables, columns, indexes and statistics. It doesn’t know about
indexes that could be built, or anything about the possible inputs to our queries (unless all
literal values are provided in a query), input files, batch sequences, or transaction patterns. This
is why an understanding of the DB2 optimizer, statistics, and access paths is very important,
but also important is the design of applications and databases for performance. These topics
are covered in the remainder of this guide, and so this chapter serves only as an education into
the optimizer, statistics, and access paths, and is design to build a basis upon which proper
SQL, database, and application tuning can be conducted.

The Importance of Catalog Statistics

It is possible to maintain information about the amount of data, and the organization of that
data in DB2 tables. These are the DB2 catalog statistics, and are typically collected by the DB2
RUNSTATS utility.

The DB2 optimizer is cost based, and so catalog statistics are critical to proper performance.
The optimizer uses a variety of statistics to determine the cost of accessing the data in tables.
These statistics can be cardinality statistics, frequency distribution statistics, or histogram
statistics. The type of query (static or dynamic), use of literal values, and runtime optimization
options will dictate which statistics are used. The cost associated with various access paths
will affect whether or not indexes are chosen, or which indexes are chosen, the access path,
and table access sequence for multiple table access paths. For this reason maintaining accurate
up to date statistics is critical to performance.

It should be noted that catalog statistics and database design are not the only things the DB2
optimizer uses to calculate and choose an efficient access path. Additional information, such as
central processor model, number of processors, and size of the RID pool, various installation
parameters, and buffer pool sizes are used to determine the access path. You should be aware
of these factors as you tune your queries.

There are three flavors to DB2 catalog statistics. Each provides different details about the data
in your tables, and DB2 will utilize these statistics to determine the best way to access the data
dependent upon the objects and the query.

6 ca.com
SQL AND ACCESS PATHS

Cardinality Statistics
DB2 needs an accurate estimate of the number of rows that qualify after applying various
predicates in your queries in order to determine the optimal access path. When multiple tables
are accessed the number of qualifying rows estimated for the tables can also affect the table
access sequence. Column and table cardinalities provide the basic, but critical information for
the optimizer to make these estimates.

Cardinality statistics reflect the number of rows in a table, or the number of distinct values for a
column in a table. These statistics provide the main source of what is known as the filter factor,
a percentage of the number of rows expected to be returned, for a column or a table. DB2 will
use these statistics to determine such things as to whether or not access is via an index scan
or a table space scan, and when joining tables which table to access first. For example, for the
following statement embedded in a COBOL program:

SELECT EMPNO

FROM EMP

WHERE SEX = :SEX

DB2 has to choose what is the most efficient way to access the EMP table. This will depend
upon the size of the table, as well as the cardinality of the SEX column, and any index that
might be defined upon the table (especially one on the SEX column). It will use the catalog
statistics to determine the filter factor for the SEX column. In this particular case, 1/COLCARDF,
COLCARDF being a column in the SYSIBM.SYSCOLUMNS catalog table. The resulting fractional
number represents the number of rows that are expected to be returned based upon the
predicate. DB2 will use this number, the filter factor, to make decisions as to whether of not to
use an index (if available), or table space scan. For example, if the column cardinality of the
SEX column of the EMP table is 3 (male, female, unknown), then we know there are 3 unique
occurrences of a value of SEX in the table. In this case, DB2 determines a filter factor of 1/3, or
33% of the values. If the table cardinality of the table, as reflected in the CARDF column of the
SYSIBM.SYSTABLES catalog table, is 10,000 employees then the estimated number of rows
returned from the query will be 3,333. DB2 will use this information to make decisions about
which access path to choose.

If the predicate in the query above is used in a join to another table, then the filter factor
calculated will be used not only to determine the access path to the EMP table, but it could
also influence which table will be accessed first in the join.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 7

SQL AND ACCESS PATHS

SELECT E.EMPNO, D.DEPTNO, D.DEPTNAME

FROM EMP E

INNER JOIN

DEPT D

ON E.WORKDEPT = D.DEPTNO

WHERE LASTNAME = :LAST-NAME

In addition to normal cardinality statistics, statistics can be gathered upon groups of columns,
which is otherwise known as columns correlation statistics. This is very important to know
because again the DB2 optimizer can only deal with the information it is given. Say, for
example you have the following query:

SELECT E.EMPNO

FROM EMP E

WHERE SALARY > :SALARY

AND EDLEVEL = :EDLEVEL

Now, imagine in a truly hypothetical example that the amount of money that someone is paid
is correlated to the amount of education they have received. For example, an intern being paid
minimum wage won’t be paid as much as an executive with an MBA. However, if the DB2
optimizer is not given this information it has to multiply the filter factor for the SALARY
predicate by the filter factor for the EDLEVEL predicate. This may result in an exaggerated filter
factor for the table, and could negatively influence the choice of index, or the table access
sequence of a join. For this reason, it is important to gather column correlation statistics on
any columns that might be correlated. More conservatively, it may be wise to gather column
correlation statistics on any columns referenced together in the WHERE clause.

8 ca.com
SQL AND ACCESS PATHS

Frequency Distribution Statistics

In some cases cardinality statistics are not enough. With only cardinality statistics the DB2
optimizer has to make assumptions about the distribution of data in a table. That is, it has to
assume that the data is evenly distributed. Take, for example, the SEX column from the
previous queries. The column cardinality is 3, however it is quite common that the values in the
SEX column will be ‘M’ for male, or ‘F’ for female, with each representing approximately 50% of
the values, and the value ‘U’ occurs only a small fraction of the time. This is an example of what
we will call skewed distribution.

Skewed distributions can have a negative influence on query performance if DB2 does not
know about the distribution of the data in a table and/or the input values in a query predicate.
In the following query:

SELECT EMPNO

FROM EMP

WHERE SEX = :SEX

DB2 has no idea what the input value of the :SEX host variable is, and so it uses the default
formula of 1/COLCARDF. Likewise, if the following statement is issued:

SELECT EMPNO

FROM EMP

WHERE SEX = ‘U’

and most of the values are ‘M’ or ‘F’, and DB2 has only cardinality statistics available, then the
same formula and filter factor applies. This is where distribution statistics can pay off.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 9

SQL AND ACCESS PATHS

DB2 has the ability to collect distribution statistics via the RUNSTATS utility. These statistics
reflect the percentage of frequently occurring values. That is, you can collect the information
about the percentage of values that occur most or least frequently in a column in a table. These
frequency distribution statistics can have a dramatic impact on the performance of the
following types of queries:
• Dynamic or static SQL with embedded literals
• Static SQL with host variables bound with REOPT(ALWAYS)
• Static or dynamic SQL against nullable columns with host variables that cannot be null
• Dynamic SQL with parameter markers bound with REOPT(ONCE), REOPT(ALWAYS), or
REOPT(AUTO)

So, if we gathered frequency distribution statistics for our SEX column in the previous
examples, we may find the following frequency values in the SYSIBM.SYSCOLDIST table
(simplified in this example):

VALUE FREQUENCY

‘M’ .49
‘F’ .49
‘U’ .01

Now, if DB2 has this information, and is provided with the following query:

SELECT EMPNO

FROM EMP

WHERE SEX = ‘U’

It can determine that the value ‘U’ represents about 1% of the values of the sex column. This
additional information can have dramatic effects on access path and table access sequence
decisions.

10 ca.com
SQL AND ACCESS PATHS

Histogram Statistics
Histogram statistics improve on distribution statistics in that they provide value-distribution
statistics that are collected over the entire range of values in a table. This goes beyond the
capabilities of distribution statistics in that distribution statistics are limited to only the most
or least occurring values in a table (unless they are, of course, collected all the time for every
single column which can be extremely expensive and difficult). To be more specific, histogram
statistics summarize data distribution on an interval scale, and span the entire distribution of
values in a table.

Histogram statistics divide values into quantiles. The quantile defines an interval of values
The quantity of intervals is determined via a specification of quantiles in a RUNSTATS
execution. Thus a quantile will represent a range of values within the table. Each quantile will
contain approximately the same percentage of rows. This can go beyond the distribution
statistics in that all values are represented.

Now, the previous example is no good because we had only three values for the SEX column of
the EMP table. So, we’ll use a hypothetical example with our EMP sample table. Let’s suppose,
theoretically, that we needed to query on the middle initial of person’s names. Those initials are
most likely in the range of ‘A’ to ‘Z’. If we had only cardinality statistics then any predicate
referencing the middle initial column would receive a filter factor of 1/COLARDF or 1/26. If we
had distribution statistics then any predicate that used a literal value could take advantage of
the distribution of values to determine the best access path. That determination would be
dependent upon the value being recorded as one of the most or least occurring values. If it is
not, then the filter factor is determined based upon the difference in frequency of the remaining
values. Histogram statistics eliminate this guesswork.

If we calculated histogram statistics for the middle initial, they may look something like this:

QUANTILE LOWVALUE HIGHVALUE CARDF FREQ

1 A G 5080 20%
2 H L 4997 19%
3 M Q 5001 20%
4 R U 4900 19%
5 V Z 5100 20%

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 11

SQL AND ACCESS PATHS

Now, the DB2 optimizer has information about the entire span of values in the table, and any
possible skew. This is especially useful for range predicates such as this:

SELECT EMPNO

FROM EMP

WHERE MIDINIT BETWEEN ‘A’ and ‘G’

Access Paths
There are a number of access paths that DB2 can choose when accessing the data in your
database. This section describes those access paths, as well as when they are effective. The
access paths are exposed via use of the DB2 EXPLAIN facility. The EXPLAIN facility can be
invoked via one of these techniques:
• EXPLAIN SQL statement
• EXPLAIN(YES) BIND or REBIND parameter
• Visual Explain (DB2 V7, DB2 V8, DB2 9)
• Optimization Service Center (DB2 V8, DB2 9)

Table Space Scan

If chosen, DB2 will scan an entire table space in order to satisfy a query. What is actually
scanned depends upon the type of table space. Simple table spaces (supported in DB2 9, but
you can’t create them) will be scanned in their entirety. Segmented and universal tablespaces
will be scanned for only the table specified in the query (or for the spacemaps that qualify).
Nonetheless, a table space scan will search the entire table space to satisfy a query. A table
space scan is indicated by an “R” in the PLAN_TABLE ACCESSTYPE column.

A table space scan will be selected as an access method when:

• A matching index scan is not possible because no index is available, or no predicates match
the index column(s)
• DB2 calculates that a high percentage of rows will be retrieved, and in this case any index
access would not be beneficial
• The indexes that have matching predicates have a low cluster ratio and are not efficient for
large amounts of data that the query is requesting

12 ca.com
SQL AND ACCESS PATHS

A table space scan is not necessarily a bad thing. It depends upon the nature of the request
and the frequency of access. Sure, running a tables pace scan every one second in support of
online transactions is probably a bad thing. A table space scan in support of a report that spans
the content of an entire table once per day is probably a good thing.

Limited Partition Scan and Partition Elimination

DB2 can take advantage of the partitioning key values to eliminate partitions from a table
space scan, or via index access. DB2 utilizes the column values that define the partitioning
key to eliminate partitions from the query access path. Queries have to be coded explicitly to
eliminate partitions. This partition elimination can apply to table space scans, as well as index
access. Suppose, for example, that the EMP table is partitioned by the EMPNO column (which
it is in the DB2 sample database). The following query would eliminate any partitions not
qualified by the predicate:

SELECT EMPNO, LASTNAME

FROM EMP

WHERE EMPNO BEWTEEN ‘200000’ AND ‘299999’

DB2 can choose an index based access method if there was an index available on the EMPNO
column. DB2 could also choose a table space scan, but utilize the partitioning key values to
access only the partition in which the predicate matches.

DB2 can also choose a secondary index for the access path, and still eliminate partitions using
the partitioning key values if those values are supplied in a predicate. For example, in the
following query:

SELECT EMPNO, LASTNAME

FROM EMP

WHERE EMPNO BEWTEEN ‘200000’ AND ‘299999’

AND WORKDEPT = ‘C01’

DB2 could choose a partitioned secondary index on the WORKDEPT column, and eliminate
partitions based upon the range provided in the query for the EMPNO.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 13

SQL AND ACCESS PATHS

DB2 can employ partition elimination for predicates coded against the partitioning key using:
• Literal values (DB2 V7, DB2 V8, DB2 9)
• Host variables (DB2 V7 with REOPT(VARS), DB2 V8, DB2 9)
• Joined columns (DB2 9)

Index Access
DB2 can utilize indexes on your tables to quickly access the data based upon the predicates in
the query, and to avoid a sort in support of the use of the DISTINCT clause, as well as for the
ORDER BY and GROUP BY clauses, and INTERSECT, and EXCEPT processing.

There are several types of index access. The index access is indicated in the PLAN_TABLE by
an ACCESSTYPE value of either I, I1, N, MX, or DX.

DB2 will match the predicates in your queries to the leading key columns of the indexes
defined against your tables. This is indicated by the MATCHCOLS column of the PLAN_TABLE.
If the number of columns matched is greater than zero, then the index access is considered a
matching index scan. If the number of matched columns is equal to zero, then the index access
is considered a non-matching index scan. In a non-matching index scan all of the key values
and their record identifiers (RIDs) are read. A non-matching index scan is typically used if DB2
can use the index in order to avoid a sort when the query contains an ORDER BY, DISTINCT, or
GROUP BY clause, and also possibly for an INTERSECT, or EXCEPT.

The matching predicates on the leading key columns are equal (=) or IN predicates. This would
correspond to an ACCESSTYPE value of either “I” or “N”, respectively. The predicate that
matches the last index column can be an equal, IN, NOT NULL, or a range predicate (<, <=, >,
>=, LIKE, or between). For example, for the following query assume that the EMPPROJACT
DB2 sample table has an index on the PROJNO, ACTNO, EMSTDATE, and EMPNO columns:

SELECT EMENDATE, EMPTIME

FROM EMPPROJACT

WHERE PROJNO = ‘AD3100’

AND ACTNO IN (10, 60)

AND EMSTDATE > ‘2002-01-01’

AND EMPNO = 000010

14 ca.com
SQL AND ACCESS PATHS

In the above example DB2 could choose a matching index scan matching on the PROJNO,
ACTNO, and EMSTDATE columns. It does not match on the EMPNO column due to the fact
that the previous predicate is a range predicate. However, the EMPNO column can be applied
as an index screening column. That is, since it is a stage 1 predicate (predicate stages are
covered in Chapter 2) DB2 can apply it to the index entries after the index is read. So, although
the EMPNO column does not limit the range of entries retrieved from the index it can eliminate
the entries as the index is read, and therefore reduce the number of data rows that have to be
retrieved from the table.

If all of the columns specified in a statement can be found in an index, then DB2 may
choose index only access. This is indicated in the PLAN_TABLE with the value “Y” in the
INDEXONLY column.

DB2 is also able to access indexes via multi-index access. This is indicated via an ACCESSTYPE
of “M” in the PLAN_TABLE. A multi-index access involves reading multiple indexes, or the
same index multiple times, gathering qualifying RID values together in a union or intersection
of values (depending upon the predicates in the SQL statement), and then sorting them in data
page number sequence. In this way a table can still be accessed efficiently for more complicated
queries. With multi-index access each operation is represented in the PLAN_TABLE in an
individual row. These steps include the matching index access (ACCESTYPE = “MX” or “DX”)
for regular indexes or DOCID XML indexes, and then unions or intersections of the rids
(ACCESSTYPE = “MU”, “MI”, “DU”, or “DI”).

The following example demonstrates a SQL statement that can perhaps utilize multi-index
access if there is an index on the LASTNAME and FIRSTNME columns of the EMP table.

SELECT EMPNO

FROM EMP

WHERE (LASTNAME = ‘HAAS’ AND FIRSTNME = ‘CHRISTINE’)

OR (LASTNAME = ‘CHRISTINE’ AND FIRSTNME = ‘HAAS’)

With the above query DB2 could possibly access the index twice, union the resulting RID
values together (do to the OR condition in the WHERE clause), and then access the data.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 15

SQL AND ACCESS PATHS

List Prefetch
The preferred method of table access, when access is via an index, is by utilizing a clustered
index. When an index is defined as clustered, then DB2 will attempt to keep the data in the
table in the same sequence as the key values in the clustering index. Therefore, when the table
is accessed via the clustering index, and the data in the table is well organized (as represented
by the catalog statistics), then the access to the data in the table will typically be sequential
and predictable. Accessing the data in a table via a non-clustering index, or via the clustering
index with a low cluster ratio (the table data is disorganized), then the access to the data
can be very random and unpredictable. In addition, the same data pages could be read
multiple times.

When DB2 detects that access to a table will be via a non-clustering index, or via a clustering
index that has a low cluster ratio, then DB2 may choose an index access method called list
prefetch. List prefetch will also be used as part of the access path for multi-index access and
access to the inner table of a hybrid join. List prefetch is indicated in the PLAN_TABLE via a
value of “L” in the PREFETCH column.

List prefetch will access the index via a matching index scan of one or more indexes, and
collect the RIDs into the RID pool, which is a common area of memory. The RIDs are then
sorted in ascending order by the page number, and the pages in the table space are then
retrieved in a sequential or skip sequential fashion. List prefetch will not be used if the
matching predicates include an IN-list predicate.

List prefetch will not be chosen as the access path if DB2 determines that the RIDs to be
processed will take more than 50% of the RID pool when the query is executed. Also, list
prefetch can terminate during execution if DB2 determines that more than 25% of the rows of
the table must be accessed. In these situations (called RDS failures) the access path changes
to a table space scan.

In general, a list prefetch is useful for access to a moderate number of rows when access to the
table is via a non-clustering or clustering index when the data is disorganized (low cluster
ratio). It is usually less useful for queries that process large quantities of data or very small
amounts of data.

Nested Loop Join

In a nested loop join DB2 will scan the outer table (the first table accessed in the join
operation) via a table space scan or index access, and then for each row in that table that
qualifies DB2 will then search for matching rows in the inner table (the second table accessed).
It will concatenate any rows it finds in the inner table with the outer table. Nested loop join is
indicated via a value of “1” in the METHOD column of the PLAN_TABLE.

16 ca.com
SQL AND ACCESS PATHS

A nested loop join will repeatedly access the inner table as rows are accessed in the outer
table. Therefore, the nested loop join is most efficient when a small number of rows qualify for
the outer table, and is a good access path for transactions that are processing little or no data.
It is important that there exist an index that supports the join on the inner table, and for the
best performance that index should be a clustering index in the same order as the outer table.
In this way the join operation can be a sequential and efficient process. DB2 (DB2 9) can also
dynamically build a sparse index on the inner table when no index is available on the inner
table. It may also sort the outer table if it determines that the join columns are not in the same
sequence as the inner table, or the join columns of the outer table are not in an index or are in a
clustering index where the table data is disorganized.

The nested loop join is the method selected by DB2 when you are joining two tables together,
and do not specify join columns.

Hybrid Join
The preferred method for a join is to join two tables together via a common clustering index. In
these cases it is likely that DB2 will choose a nested loop join as the access method for small
amounts of data. If DB2 detects that the inner table access will be via a non-clustering index or
via a clustering index in which the cluster ratio is low, then DB2 may choose a hybrid join as
the access method. The hybrid join is indicated via a value of “4” in the METHOD column of
the PLAN_TABLE.

In a hybrid join DB2 will scan the outer table either via a table space scan or via an index, and
then join the qualifying rows of the outer table with the RIDs from the matching index entries.
DB2 also creates a RID list, as it does for list prefetch, for all the qualifying RIDs of the inner
table. Then DB2 will sort the outer table and RIDs, creating a sorted RID list and an inter-
mediate table. DB2 will then access the inner table via list prefetch, and join the inner table
to the intermediate table.

Hybrid join can out perform a nested loop join when the inner table access is via a non-
clustering index or clustering index for a disorganized table. This is especially true if the query
will be retrieving more than a trivial amount of data. That is, the hybrid join is better if the
equivalent nested loop join would be accessing the inner table in a random fashion, and thus
accessing the same pages multiple times. Hybrid join takes advantage of the list prefetch
processing on the inner table to read all the data pages only once in a sequential or skip
sequential manner. Thus, hybrid join is better when the query processes more than a tiny
amount of data (nested loop is better in this case), or less than a very large amount of data
(merge scan is better in this case). Since list prefetch is utilized, the hybrid join can experience
RID list failures. This can result in a table space scan of the inner table on initial access to the
inner table, or a restart of the list prefetch processing upon subsequent accesses (all within the
same query). If you are experiencing RID list failures in a hybrid join then you should attempt to
encourage DB2 to use a nested loop join.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 17

SQL AND ACCESS PATHS

Merge Scan Join

If DB2 detects that a large number of rows of the inner and outer table will qualify for the join,
or the tables have no indexes on the matching columns of the join, then DB2 may choose a
merge scan join as the access method. A merge scan join is indicated via a value of “2” in the
METHOD column of the PLAN_TABLE.

In a merge scan join DB2 will scan both tables in the order of the join columns. If there are no
efficient indexes providing the join order for the join columns, then DB2 may sort the outer
table, the inner table, or both tables. The inner table is placed into a work file, and the outer
table will be placed into a work file if it has to be sorted. DB2 will then read both tables, and
join the rows together via a match/merge process.

Merge scan join is best for large quantities of data. The general rule should be that if you are
accessing small amounts of data in a transaction then nested loop join is best. Processing large
amounts of data in a batch program or a large report query? Then merge scan join is generally
the best.

Star Join
A star join is another join method that DB2 will employ in special situations in which it detects
that the query is joining tables that are a part of a star schema design of a data warehouse. The
star join is indicated via the value of “S” in the JOIN_TYPE column of the PLAN_TABLE. The
star join is best described in the DB2 Administration Guide (DB2 V7, DB2 V8) of the DB2
Performance Monitoring and Tuning Guide (DB2 9).

Read Mechanisms
DB2 can apply some performance enhancers when accessing your data. These performance
enhancers can have a dramatic effect on the performance of your queries and applications.

Sequential Prefetch
When your query reads data in a sequential manner, DB2 can elect to use sequential prefetch
to read the data ahead of the query requesting it. That is, DB2 can launch asynchronous read
engines to read data from the index and table page sets into the DB2 buffers. This hopefully
allows the data pages to already be in the buffers in expectation of the query accessing them.
The maximum number of pages read by a sequential prefetch operation is determined by
the size of the buffer pool used. When the buffer pool is smaller than 1000 pages, then the
prefetch quantity is limited due to the risk of filling up the buffer pool. If the buffer pool has
more than 1000 pages then 32 pages can be read with a single physical I/O.

Sequential prefetch is indicated via a value of “S” in the PREFETCH column of the
PLAN_TABLE. DB2 9 will only use sequential prefetch for a table space scan. Otherwise, DB2 9
will rely primarily on dynamic prefetch.

18 ca.com
SQL AND ACCESS PATHS

Dynamic Prefetch and Sequential Detection

DB2 can perform sequential prefetch dynamically at execution time. This is called dynamic
prefetch. It employs a technique called sequential detection in order to detect the forward
reading of table pages and index leaf pages. DB2 will track the pattern of pages accessed in
order detect sequential or near sequential access. The most recent eight pages are tracked, and
data access is determined to be sequential if more than 4 of the last eight pages are page-
sequential. A page is considered page-sequential if it is within one half of the prefetch quantity
for the buffer pool. Therefore, it is not five of eight sequential pages that activates dynamic
prefetch, but instead five of eight sequentially available pages. As DB2 can activate dynamic
prefetch, it can also deactivate dynamic prefetch if it determines that the query or application
is no longer reading sequentially. DB2 will also indicate in an access path if a query is likely to
get dynamic prefetch at execution time. This is indicated via a value of “D” in the PREFETCH
column of the PLAN_TABLE.

Dynamic prefetch can be activated for single cursors, or for repetitive queries within a single
application thread. This could improve the performance of applications that issue multiple
SQL statements that access the data as a whole sequentially. It is also important, of course,
to cluster tables that are commonly accessed together to possibly take advantage of
sequential detection.

It should be noted that dynamic prefetch is dependent upon the bind parameter
RELEASE(DEALLOCATE). This is because the area of memory used to track the last eight
pages accessed is destroyed and rebuilt for RELEASE(COMMIT). It should also be noted that
RELEASE(DEALLOCATE) is not effective for remote connections, and so sequential detection
and dynamic prefetch are less likely for remote applications.

Index Lookaside
Index lookaside is another powerful performance enhancer that is active only after an initial
access to an index. For repeated probes of an index DB2 will check the most current index leaf
page to see if the key is found on that page. If it is then DB2 will not read from the root page of
the index down to the leaf page, but simply use the values from the leaf page. This can have a
dramatically positive impact on performance in that getpage operations and I/O’s can be avoided.

Index lookaside depends on the query reading index data in a predictable pattern. Typically this
is happening for transactions that are accessing the index via a common cluster with other
tables. Index lookaside is also dependent upon the RELEASE(DEALLOCATE) bind parameter,
much in the same way as sequential detection.

Sorting
DB2 may have to invoke a sort in support of certain clauses in your SQL statement, or in
support of certain access paths. DB2 can sort in support of an ORDER BY or GROUP BY clause,
to remove duplicates (DISTINCT, EXCEPT, or INTERSECT), or in join or subquery processing.
In general, sorting in an application will always be faster than sorting in DB2 for small result
sets. However, by placing the sort in the program you remove some of the flexibility, power, and
portability of the SQL statement, along with the ease of coding. Perhaps instead you can take
advantage of the ways in which DB2 can avoid a sort.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 19

SQL AND ACCESS PATHS

Avoiding a Sort for an ORDER BY

DB2 can avoid a sort for an ORDER BY by utilizing the leading columns of an index to match
the ORDER BY columns if those columns are the leading columns of the index used to access a
single table, or the leading columns of the index to access the first table in a join operation.
DB2 can also utilize something called order by pruning to avoid a sort. Suppose for the
following query we have an index on (WORKDEPT, LASTNAME, FIRSTNME):

SELECT *

FROM EMP

WHERE WORKDEPT = ‘D01’

ORDER BY WORKDEPT, LASTNAME, FIRSTNME

In the above query the sort will be avoided if DB2 uses the index on those three columns to
access the table. DB2 can also avoid a sort if any number of the leading columns of an index
match the ORDER BY clause, or if the query contains an equals predicate on the leading
columns (ORDER BY pruning). So, both of the following queries will avoid a sort if the
index is used:

SELECT *

FROM EMP

ORDER BY WORKDEPT

SELECT *

FROM EMP

WHERE WORKDEPT = ‘D01’

ORDER BY WORKDEPT, LASTNAME

20 ca.com
SQL AND ACCESS PATHS

Avoiding a Sort for a GROUP BY

A group by can utilize a unique or duplicate index, as well as a full or partial index key to avoid
a sort. The grouped columns must match the leading columns of the index, and any missed
leading columns must be in equals or IN predicates in the WHERE clause. Assuming again an
index on (WORKDEPT, LASTNAME, FIRSTNME), each of the following statements can avoid a
sort if the index is used:

SELECT WORKDEPT, LASTNAME, FIRSTNME, COUNT(*)

FROM EMP

GROUP BY WORKDEPT, LASTNAME, FIRSTNAME

SELECT WORKDEPT, COUNT(*)

FROM EMP

GROUP BY WORKDEPT

SELECT WORKDEPT, FIRSTNME, COUNT(*)

FROM EMP

WHERE LASTNAME = ‘SMITH’

GROUP BY WORKDEPT, FIRSTNAME

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 21

SQL AND ACCESS PATHS

Avoiding a Sort for a DISTINCT

A DISTINCT can avoid a sort by utilizing only a unique index if that index is used to access the
table. You could possibly utilize a GROUP BY on all of the columns in the SELECT list to take
advantage of the additional capabilities of GROUP BY to avoid a sort. So, with this in mind the
following two statements are identical:

SELECT DISTINCT WORKDEPT, LASTNAME, FIRSTNME

FROM EMP

SELECT WORKDEPT, LASTNAME, FIRSTNME

FROM EMP

GROUP BY WORKDEPT, LASTNAME, FIRSTNME

When using this technique, make sure you properly document so other programmers
understand why you are grouping on all columns.

Avoiding a Sort for a UNION, EXCEPT, or INTERSECT

DB2 can possibly avoid a sort for an EXCEPT or INTERSECT. A sort could be avoided if there is
a matching index with the unioned/intersected/excepted columns, or if it realizes that the
inputs to the operation are already sorted by a previous operation. Sort cannot be avoided for a
UNION. However, you have to ask yourself, “are duplicates possible?” If not, then change each
to a UNION ALL, EXCEPT ALL, or INTERSECT ALL. If duplicates are possible, you might
consider if the duplicates are produced by only one subselect of the query. If so, then perhaps a
GROUP BY can be used on all columns in those particular subselects since GROUP BY has a
greater potential to avoid a sort.

Parallelism
DB2 can utilize something called parallel operations for access to your tables and/or indexes.
This can have a dramatic performance impact on performance for queries that process large
quantities of data across multiple table and index partitions. The response time for these data
or processor intensive queries can be dramatically reduced. There are two types of query
parallelism; query I/O parallelism and query CP parallelism.

22 ca.com
SQL AND ACCESS PATHS

I/O parallelism manages concurrent I/O requests for a single query. It can fetch data using
these multiple concurrent I/O requests into the buffers, and significantly reduce the response
time for large, I/O bound queries.

Query CP parallelism can break a large query up into smaller queries, and then run the smaller
queries in parallel on multiple processors. This can also significantly reduce the elapsed time
for large queries.

You can enable query parallelism by utilizing the DEGREE(ANY) parameter of a BIND
command, or by setting the value of the CURRENT DEGREE special register to the value of
‘ANY’. Be sure, however, that you utilize query parallelism for larger queries, as there is
overhead to the start of the parallel tasks. For small queries this can actually be a performance
detriment.

DB2 can also utilize something called Sysplex query parallelism. In this situation DB2 can split
a query across multiple members of a data sharing group to utilize all of the processors on the
members for extreme query processing of very large queries.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 23

Page intentionally left blank
CHAPTER 3

Predicates and SQL Tuning

This chapter will address the fundamentals of SQL performance and SQL processing in DB2.
In order to write efficient SQL statements, and to tune SQL statements, we need a basic
understanding of how DB2 optimizes SQL statements, and how it accesses your data.

When writing and tuning SQL statements, we should maintain a simple and important
philosophy. That being “filter as much as possible as early as possible”. We have to keep in
mind that the most expensive thing we can do is to travel from our application to DB2. Once
we are processing inside DB2 we need to do our filtering as close to the indexes and data
as possible.

We also need to understand how to reduce repeat processing. Keep in mind that the most
efficient SQL statement is the one that never executes. That is, we should only do absolutely
what is necessary to get the job done.

Types of DB2 Predicates

Predicates in SQL statements fall under classifications. These classifications dictate how DB2
processes the predicates, and how much data is filtered when during the process. These
classifications are:
• Stage 1 Indexable
• Stage 1
• Stage 2
• Stage 3

You can apply a very simple philosophy in order to understand which of your predicates
might fall within one of these classifications. That is, the DB2 stage 1 engine understands your
indexes and tables, and can utilize an index for efficient access to your data. Only a stage 1
predicate can limit the range of data accessed on a disk. The stage 2 engine processes
functions and expressions, but is not able to directly access data in indexes and tables. Data
from stage 1 is passed to stage 2 for further processing, and so stage 1 predicates are generally
more efficient than stage 2 predicates. Stage 2 predicates cannot utilize an index, and thus
cannot limit the range of data retrieved from disk. Finally, a stage 3 predicate is a predicate that
is processed in the application layer. That is, filtering performed once the data is retrieved from
DB2 and processed in the application. Stage 3 predicates are the least efficient.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 25

PREDICATES AND SQL TUNING

There is a table in the DB2 Administration Guide (DB2 V7 and DB2 V8) or the DB2
Performance Monitoring and Tuning Guide (DB2 9) that lists the various predicate forms, and
whether they are stage 1 indexable, stage 1, or stage 2. Only examples are given in this guide.

Stage 1 Indexable
The best performance for filtering will occur when stage 1 indexable predicates are used. But
just because a predicate is listed in the chart as stage 1 indexable does not mean it will be used
for either an index or processed in stage 1 as there are many other factors that must also be in
place. The first thing that determines if a predicate is stage 1 is the syntax of the predicate and
then secondly, it is the type and length of constants used in the predicate. This is one area
which has improved dramatically with version 8 in that more data types can be promoted
inside stage 1 thus improving stage 1 matching. If the predicate, even though it would classify
as a stage 1 predicate, is evaluated after a join operation then it is a stage 2 predicate. All
indexable predicates are stage 1 but not all stage 1 predicates are indexable.

Stage 1 indexable predicates are predicates that can be used to match on the columns of an
index. The most simple example of a stage 1 predicate would be of the form <col op value>,
where col is a column of a table, op is an operator (=, >, <, >=, <=) and value represents a non-
column expression (an expression that does not contain a column from the table). Predicates
containing BETWEEN, IN (for a list of values), and LIKE (without a leading search character)
can also be stage 1 indexable. The stage 1 indexable predicate, when an index is utilized,
provides the best filtering in that it can actually limit the range of data accessed from the index.
You should try to utilize stage 1 matching predicates whenever possible. Assuming that an
index exists on the EMPNO column of the EMP table, then the predicate in the following query
is a stage 1 indexable predicate:

SELECT LASTNAME, FIRSTNME

FROM EMP

WHERE EMPNO = ‘000010’

26 ca.com
PREDICATES AND SQL TUNING

Other Stage 1
Just because a predicate is stage 1 does not mean that it can utilize an index. Some stage 1
predicates are not available for index access. These predicates (again, the best reference is the
chart in the IBM manuals) are generally of the form <col NOT op value>, where col is a column
of a table, op is an operator, and value represents a non-column expression, host variable, or
value. Predicates containing NOT BETWEEN, NOT IN (for a list of values), NOT LIKE (without a
leading search character), or LIKE (with a leading search character) can also be stage 1
indexable. Although non-indexable stage 1 predicates cannot limit the range of data read from
an index, they are available as index screening predicates. The following is an example of a
non-indexable stage 1 predicate:

SELECT LASTNAME, FIRSTNME

FROM EMP

WHERE EMPNO <> ‘000010’

Stage 2
The DB2 manuals give a complete description of when a predicate can be stage 2 versus stage
1. However, generally speaking stage 2 happens after data accesses and performs such things
as sorting and evaluation of functions and expressions. Stage 2 predicates cannot take
advantage of indexes to limit the data access, and are generally more expensive than stage 1
predicates because they are evaluated later in the processing.

Stage 2 predicates are generally those that contain column expressions, correlated subqueries,
and CASE expressions, among others. A predicate can also appear to be stage 1, but processed
as stage 2. For example, any predicate processed after a join operation is stage 2. Also,
although DB2 does a good job of promoting mismatched data types to stage 1 via casting (as
of version 8) some predicates with mismatched data types are stage 2. One example is a range
predicate comparing a character column to a character value that exceeds the length of the
column. The following are examples of stage 2 predicates (EMPNO is a character column of
fixed length 6):

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 27

PREDICATES AND SQL TUNING

SELECT LASTNAME, FIRSTNME

FROM EMP

WHERE EMPNO > ‘00000010’

SELECT LASTNAME, FIRSTNME

FROM EMP

WHERE SUBSTR(LASTNAME,1,1) = ‘B’

Stage 3
A stage 3 predicate is a fictitious predicate we’ve made up to describe filtering that occurs
after the data has been retrieved from DB2. That is, filtering that is done in the application
program. You should have no stage 3 predicates. Performance is best served when all filtering
is done in the data server, and not in the application code. Imagine for a moment that a COBOL
program has read the EMP table. After reading the EMP table, the following statement is
executed:

IF (BONUS + COMM) < (SALARY*.10) THEN

CONTINUE

ELSE

PERFORM PROCESS-ROW

END-IF.

28 ca.com
PREDICATES AND SQL TUNING

This is an example of a stage 3 predicate. The data retrieved from the table isn’t used unless
the condition is false. Why did this happen? Because the programmer was told that no stage 2
predicates are allowed in SQL statements as part of a shop standard. The truth is, however,
that the following stage 2 predicate would always outperform a stage 3 predicate:

SELECT EMPNO

FROM EMP

WHERE BONUS + COMM >= SALARY * 0.1

Combining Predicates
It should be known that when simple predicates are connected together by an OR condition,
the resulting compound predicate will be evaluated at the higher stage of the two simple
predicates. The following example contains two simple predicates combined by an OR. The
first predicate is stage 1 indexable, and the second is non-indexable stage 1. The result is that
the entire compound predicate is stage 1 and not indexable:

SELECT EMPNO

FROM EMP

WHERE WORKDEPT = ‘C01’ OR SEX <> ‘M’

In the next example the first simple predicate is stage 1 indexable, but the second (connected
again by an OR) is stage 2. Thus, the entire compound predicate is stage 2:

SELECT EMPNO

FROM EMP

WHERE WORKDEPT = ‘C01’ OR SUBSTR(LASTNAME,1,1) = ‘L’

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 29

PREDICATES AND SQL TUNING

Boolean Term Predicates

Simply put, a Boolean term predicate is a simple or compound predicate, that when evaluated
false for the row, makes the entire WHERE clause evaluate to false. This is important because
only Boolean term predicates can be considered for single index access. Non-Boolean term
predicates can at best be considered for multi-index access. For example, the following
predicate contains Boolean term predicates:

WHERE LASTNAME = ‘HAAS’ AND MIDINIT = ‘T’

If the predicate on the LASTNAME column evaluates as false, then the entire WHERE clause is
false. The same is true for the predicate on the MIDINIT column. DB2 can take advantage of
this in that it could utilize an index (if available) on either the LASTNAME column or the
MIDINIT column. This is opposed to the following WHERE clause:

WHERE LASTNAME = ‘HAAS’ OR MIDINIT = ‘T’

In this case if the predicate on LASTNAME or MIDINIT evaluates as false, then the rest of the
WHERE clause must be evaluated. These predicates are non-Boolean term.

We can modify predicates in the WHERE clause to take advantage of this to improve index
access. While the following query contains non-Boolean term predicates:

WHERE LASTNAME > ‘HAAS’

OR (LASTNAME = ‘HAAS’ AND MIDINIT > ‘T’)

30 ca.com
PREDICATES AND SQL TUNING

A redundant predicate can be added that, while making the WHERE clause functionally
equivalent, contains a Boolean term predicate, and thus could take better advantage of an
index on the LASTNAME column:

WHERE LASTNAME >= ‘HAAS’

AND (LASTNAME > ‘HAAS’

OR (LASTNAME = ‘HAAS’ AND MIDINIT > ‘T’))

Predicate Transitive Closure

DB2 has the ability to add redundant predicates if it determines that the predicate can be
applied via transitive closure. Remember the rule of transitive closure; If a=b and b=c, then a=c.
DB2 can take advantage of this to introduce redundant predicates. For example, in a join
between two tables such as this:

SELECT *

FROM T1 INNER JOIN T2

ON T1.A = T2.B

WHERE T1.A = 1

DB2 will generate a redundant predicate on T2.B = 1. This gives DB2 more choices for filtering
and table access sequence. You can add your own redundant predicates, however DB2 will
not consider them when it applies predicate transitive closure, and so they will be redundant
and wasteful.

Predicates of the form <COL op value>, where the operation is an equals or range predicate,
are available for predicate transitive closure. So are BETWEEN predicates. However, predicates
that contain a LIKE, an IN, or subqueries are not available for transitive closure, and so it may
benefit you to code those predicates redundantly if you feel they can provide a benefit.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 31

PREDICATES AND SQL TUNING

Coding Efficient SQL

There’s a saying that states “the most efficient SQL statement is the SQL statement that is
never executed”. Recently, someone was overheard to say that the most efficient SQL
statement is the one that is never written. Funny stuff. However, there is truth in those sayings.
We need to do two things when we place SQL statements into our applications. The first is
that we need to minimize the number of times we go to the data server. The second is that if
we need to go to the data server we should do so in the most efficient manner. Remember,
there is no right or wrong answer, there are only trade-offs to performance.

Avoid Unnecessary Processing

The very first thing you need to ask yourself is if a SQL statement is necessary. The number
one performance issue in recent times has to be the quantity of SQL issued. The main reason
behind is typically a generic development or an object-oriented design. These types of designs
run contrary to performance, but speed development and create flexibility and adaptability to
business requirements. The price you pay, however, is performance.

You need to look at your SQL statements and ask some questions. Is the statement needed?
Do I really need to run the statement again? Does the statement have a DISTINCT, and are
duplicates possible? Does the statement have an ORDER BY, and is the order important? Are
you repeatedly accessing code tables, and can the codes be cached within the program?

Coding SQL for Performance

There are some basic guidelines for coding SQL statements for the best performance:
• Retrieve the fewest number of rows The only rows of data returned to the application
should be those actually needed for the process and nothing else. We should never find a
data filtering statement in a program, as DB2 should be handling all the filtering of the rows.
Whenever there is a process that needs to be performed on the data, and there is a decision
as to where the process is done, program or in DB2, then leave it in DB2. This is increasingly
the case as DB2 evolves into the “ultimate” server, and more and more applications are
distributed across the network and using DB2 as the database. There will always be that rare
situation when all the data needs to be brought to the application, filtered, transformed and
processed. However, this is generally due to an inadequacy somewhere else in the design,
normally a shortcoming in the physical design, missing indexes, or some other
implementation problem.
• Retrieve only the columns needed by the program Retrieving extra columns results in the
column being moved from the page in the buffer pool to a user work area, passed to stage 2,
and returned cross-memory to the program’s work area. That is an unnecessary expenditure
of CPU. You should code your SQL statements to return only the columns required. Can
these mean possibly coding multiple SQL statements against the same table for different
sets of columns? Yes, it does. However, that is the effort required for improved performance
and reduction of CPU.

32 ca.com
PREDICATES AND SQL TUNING

• Reduce the number of SQL statements Each SQL statement is a programmatic call to the
DB2 subsystem that incurs fixed overhead for each call. Careful evaluation of program
processes can reveal situations in which SQL statements are issued that don’t need to be.
This is especially true for programmatic joins. If separate application programs are retrieving
data from related tables, then this can result in extra SQL statements issued. Code SQL joins
instead of programmatic joins. In one simple test of a programmatic join of two tables in a
COBOL program compared to the equivalent SQL join, the SQL join consumed 30% less CPU.
• Use stage 1 predicates You should be aware of the stage 1 indexable, stage 1, and stage 2
predicates, and try to use stage 1 predicates for filtering whenever possible. See this section
in this chapter on promoting predicates to help determine if you can convert your stage 2
predicates to stage 1, or stage 1 non-indexable to indexable.
• Never use generic SQL statements Generic SQL equals generic performance. Generic SQL
generally only has logic that will retrieve some specific data or data relationship, and leaves
the entire business rule processing in the application. Common modules, SQL that is used to
retrieve current or historical data, and modules that read all data into a copy area only to
have the caller use a few fields are all examples we’ve seen. Generic SQL and generic I/O
layers do not belong in high performing systems.
• Avoid unnecessary sorts Avoid unnecessary sorting is a requirement in any application but
more so in any high performance environment especially with a database involved. Generally,
if ordering is always necessary then there should probably be indexes to support the
ordering. Sorting can be caused by GROUP BY, ORDER BY, DISTINCT, INTERSECT, EXCEPT,
and join processing. If ordering is required for a join process, it generally is a clear indication
that an index is missing. If ordering is necessary after a join process, then hopefully the result
set is small. The worse possible scenario is where a sort is performed as the result of the
SQL in a cursor, and the application only processes a subset of the data. In this case the sort
overhead needs to be removed. It is more an application of common sense. If the SQL has
to sort 100 rows and only 10 are processed by the application, then it may be better to sort
in the application, or perhaps an index needs to be created to help avoid the sort.
• Only sort truly necessary columns When DB2 sorts data, the columns used to determine
the sort order actually appear twice in the sort rows. Make sure that if you specify the sort
only on the necessary columns. For example, any column specified in an equals predicate in
the query does not need to be in the ORDER BY clause.
• Use the ON clause for all join predicates By using explicit join syntax instead of implicit join
syntax you can make a statement easier to read, easier to convert between inner and outer
join, and harder to forget to code a join predicate.
• Avoid UNIONs (not necessarily UNION ALL) Are duplicates possible? If not then replace
the UNION with a UNION ALL. Otherwise, see if it is possible to produce the same result
with an outer join, or in a situation in which the subselects of the UNION are all against the
same table try using a CASE expression and only one pass through the data instead.
• Use joins instead of subqueries Joins can outperform subqueries for existence checking if
there are good matching indexes for both tables involved, and if the join won’t introduce any
duplicate rows in the result. DB2 can take advantage of predicate transitive closure, and also
pick the best table access sequence. These things are not possible with subqueries.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 33

PREDICATES AND SQL TUNING

• Code the most selective predicates first DB2 processes the predicates in a query in a
specific order:
– Indexable predicates are applied first in the order of the columns of the index
– Then other stage 1 predicates are applied
– Then stage 2 predicates are applied
Within each of the stages above DB2 will process the predicates in this order:
– All equals predicates (including single IN list and BETWEEN with only one value)
– All range predicates and predicates of the form IS NOT NULL
– Then all other predicate types
Within each grouping DB2 will process the predicates in the order they have been coded
in the SQL statement. Therefore, all SQL queries should be written to evaluate the most
restrictive predicates first to filter unnecessary rows earlier, reducing processing cost at a
later stage. This includes subqueries as well (within the grouping of correlated and non-
correlated).
• Use the proper method for existence checking For existence checking using a subquery an
EXISTS predicate generally will outperform an IN predicate. For general existence checking
you could code a singleton SELECT statement that contains a FETCH FIRST 1 ROW ONLY
clause.
• Avoid unnecessary materialization Running a transaction that processes little or no data?
You might want to use correlated references to avoid materializing large intermediate result
sets. Correlated references will encourage nested loop join and index access for transactions.
See the advanced SQL and performance section of this chapter for more details.

Promoting Predicates
We need to code SQL predicates as efficiently as possible, and so when you are writing
predicates you should make sure to use stage 1 indexable predicates whenever possible. If
you’ve coded a stage 1 non-indexable predicate, or a stage 2 predicate, then you should be
asking yourself if you can possibly promote those predicates to a more efficient stage.

If you have a stage 1 non-indexable predicate, can it be promoted to stage 1 indexable? Take
for example the following predicate:

WHERE END_DATE_COL <> ‘9999-12-31’

34 ca.com
PREDICATES AND SQL TUNING

If we use the “end of the DB2 world” as an indicator of data that is still active, then why not
change the predicate to something that is indexable:

WHERE END_DATE_COL < ‘9999-12-31’

Do you have a stage 2 predicate that can be promoted to stage 1, or even stage 1 indexable?
Take for example this predicate:

WHERE BIRTHDATE + 30 YEARS < CURRENT DATE

The predicate above is applying date arithmetic to a column. This is a column expression,
which makes the predicate a stage 2 predicate. By moving the arithmetic to the right side of
the inequality we can make the predicate stage 1 indexable:

WHERE BIRTHDATE < CURRENT DATE – 30 YEARS

These examples can be applied as general rules for promoting predicates. It should be noted
that as of DB2 9 it is possible to create an index on an expression. An index on an expression
can be considered for improved performance of column expressions when it’s not possible to
eliminate the column expression in the query.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 35

PREDICATES AND SQL TUNING

Functions, Expressions and Performance

DB2 has lots of scalar functions, as well as other expressions that allow you to manipulate
data. This contributes to the fact that the SQL language is indeed a programming language as
much as a data access language. It should be noted, however, that the processing of functions
and expressions in DB2, if not for filtering of data, but instead for pure data manipulation, are
generally more expensive than the equivalent application program process. Keep in mind that
even the simple functions such as concatenation:

SELECT FIRSTNME CONCAT LASTNAME

FROM EMP

will be executed for every row processed. If you need the ultimate in ease of coding, time to
delivery, portability, and flexibility, then code expressions and functions like this in your SQL
statements. If you need the ultimate in performance, then do the manipulation of data in your
application program.

CASE expressions can be very expensive, but CASE expressions will utilize “early out” logic
when processing. Take the following CASE expression as an example:

CASE WHEN C1 = ‘A’

OR C1 = ‘K’

OR C1 = ‘T’

OR C1 = ‘Z’

THEN 1 ELSE NULL END

36 ca.com
PREDICATES AND SQL TUNING

If most of the time the value of the C1 column is a blank, then the following functionally
equivalent CASE expression will consume significantly less CPU:

CASE WHEN C1 <> ‘ ‘

AND (C1 = ‘A’

OR C1 = ‘K’

OR C1 = ‘T’

OR C1 = ‘Z’)

THEN 1 ELSE NULL END

DB2 will take the early out from the Case expression for the first not true of an AND, or the
first TRUE of an OR. So, in addition to testing for the blank value above, all the other values
should be tested with the most frequently occurring values first.

Advanced SQL and Performance

Advanced SQL queries can be a performance advantage or performance disadvantage.
Because the possibilities with advanced SQL are endless it would be impossible to document
all of the various situations and the performance impacts. Therefore, this section will give some
examples and tips for improving performance of some complex queries. Please keep in mind,
however, that complex SQL will always be a performance improvement over an application
program process when the complex SQL is filtering or aggregating data.

CORRELATION In general, correlation encourages nested loop join and index access. This can
be very good for your transaction queries that process very little data. However, it can be bad
for your report queries that process vast quantities of data. The following query is generally a
very good performer when processing large quantities of data:

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 37

PREDICATES AND SQL TUNING

SELECT TAB1.EMPNO, TAB1.SALARY,

TAB2.AVGSAL,TAB2.HDCOUNT

FROM

(SELECT EMPNO, SALARY, WORKDEPT

FROM EMP

WHERE JOB='SALESREP') AS TAB1

LEFT OUTER JOIN

(SELECT AVG(SALARY) AS AVGSAL, COUNT(*) AS HDCOUNT,

WORKDEPT

FROM EMP

GROUP BY WORKDEPT) AS TAB2

ON TAB1.WORKDEPT = TAB2.WORKDEPT

If there is one sales rep per department, or if most of the employees are sales reps, then the
above query would be the most efficient way to retrieve the data. In the query above, the entire
employee table will be read and materialized in the nested table expression called TAB2. It is
very likely that the merge scan join method will be used to join the materialized TAB2 to the
first table expression called TAB1.

38 ca.com
PREDICATES AND SQL TUNING

Suppose now that the employee table is extremely large, but that there are very few sales reps,
or perhaps all the sales reps are in one or a few departments. Then the above query may not be
the most efficient due to the fact that the entire employee table still has to be read in TAB2, but
most of the results of that nested table expression won’t be returned in the query. In this case,
the following query may be more efficient:

SELECT TAB1.EMPNO, TAB1.SALARY,

TAB2.AVGSAL,TAB2.HDCOUNT

FROM EMP TAB1

,TABLE(SELECT AVG(SALARY) AS AVGSAL,

COUNT(*) AS HDCOUNT

FROM EMP

WHERE WORKDEPT = TAB1.WORKDEPT) AS TAB2

WHERE TAB1.JOB = 'SALESREP'

While this statement is functionally equivalent to the previous statement, it operates in a very
different way. In this query the employee table referenced as TAB1 will be read first, and then
the nested table expression will be executed repeatedly in a nested loop join for each row that
qualifies. An index on the WORKDEPT column is a must.

MERGE VERSUS MATERIALIZATION FOR VIEWS AND NESTED TABLE EXPRESSIONS When you
have a reference to a nested table expression or view in your SQL statement DB2 will possibly
merge that nested table expression or view with the referencing statement. If DB2 cannot
merge then it will materialize the view or nested table expression into a work file, and then
apply the referencing statement to that intermediate result. IBM states that merge is more
efficient than materialization. In general, that statement is correct. However, materialization
may be more efficient if your complex queries have the following combined conditions:
• Nested table expressions or view references, especially multiple levels of nesting
• Columns generated in the nested expressions or views via application of functions,
user-defined functions, or other expressions
• References to the generated columns in the outer referencing statement

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 39

PREDICATES AND SQL TUNING

In general, DB2 will materialize when some sort of aggregate processing is required inside the
view or nested table expression. So, typically this means that the view or nested table expression
contains aggregate functions, grouping (GROUP BY), or DISTINCT. If materialization is not
required then the merge process happens. Take for example the following query:

SELECT MAX(CNT)

FROM (SELECT ACCT_RTE, COUNT(*) AS CNT

FROM YLA.ACCT_TABLE) AS TAB1

DB2 will materialize TAB1 in the above example. Now, take a look at the next query:

SELECT SUM(CASE WHEN COL1=1 THEN 1 END) AS ACCT_CURR

,SUM(CASE WHEN COL1>1 THEN 1 END) AS ACCT_LATE

FROM

(SELECT CASE ACCT_RTE WHEN ‘AA’ THEN 1 WHEN ‘BB’

THEN 2 WHEN ‘CC’ THEN ‘2’ WHEN ‘DD’ THEN 2

WHEN ‘EE’ THEN 3 END AS COL1

FROM YLA.ACCT_TABLE) AS TAB1

40 ca.com
PREDICATES AND SQL TUNING

In this query there is no DISTINCT, GROUP BY, or aggregate functions, and so DB2 will merge
in inner table expression with the outer referencing statement. Since there are two references
to COL1 in the outer referencing statement, then the CASE expression in the nested table
expression will be calculated twice during query execution. The merged statement would look
something like this:

SELECT SUM(CASE WHEN

CASE ACCT_RTE WHEN ‘AA’ THEN 1 WHEN ‘BB’

THEN 2 WHEN ‘CC’ THEN ‘2’ WHEN ‘DD’ THEN 2

WHEN ‘EE’ THEN 3 END =1 THEN 1 END) AS ACCT_CURR

,SUM(CASE WHEN

CASE ACCT_RTE WHEN ‘AA’ THEN 1 WHEN ‘BB’

THEN 2 WHEN ‘CC’ THEN ‘2’ WHEN ‘DD’ THEN 2

WHEN ‘EE’ THEN 3 END >1 THEN 1 END) AS ACCT_LATE

FROM YLA.ACCT_TABLE

For this particular query the merge is probably more efficient than materialization. However, if
you have multiple levels of nesting and many references to generated columns, merge can be
less efficient than materialization. In these specific cases you may want to introduce a non-
deterministic function into the view or nested table expression to force materialization. We use
the RAND() function.

UNION IN A VIEW OR NESTED TABLE EXPRESSION You can place a UNION or UNION ALL into a
view or nested table expression. This allows for some really complex SQL processing, but also
enables you to create logically partitioned tables. That is, you can store data in multiple tables
and then reference them all together as one table in a view. This is useful for quickly rolling
through yearly tables, or to create optional table scenarios with little maintenance overhead.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 41

PREDICATES AND SQL TUNING

While each SQL statement in a union in view (or table expression) results in an individual
query block, and SQL statements written against our view are distributed to each query block,
DB2 does employ a technique to prune query blocks for efficiency. DB2 can, depending upon
the query, prune (eliminate) query blocks at either statement compile time or during statement
execution. If we consider the account history view:

CREATE VIEW V_ACCOUNT_HISTORY

(ACCOUNT_ID, AMOUNT) AS

SELECT ACCOUNT_ID, AMOUNT

FROM HIST1

WHERE ACCOUNT_ID BETWEEN 1 AND 100000000

UNION ALL

SELECT ACCOUNT_ID, AMOUNT

FROM HIST1

WHERE ACCOUNT_ID BETWEEN 100000001 AND 200000000

then consider the following query:

SELECT *

FROM V_ACCOUNT_HISTORY

WHERE ACCOUNT_ID = 12000000

42 ca.com
PREDICATES AND SQL TUNING

The predicate of this query contains the literal value 12000000, and this predicate is distributed
to both of the query blocks generated. However, DB2 will compare the distributed predicate
against the predicates coded in the UNION inside our view, looking for redundancies. In any
situations in which the distributed predicate renders a particular query block unnecessary, DB2
will prune (eliminate) that query block from the access path. So, when our distributed
predicates look like this:

. . .

WHERE ACCOUNT_ID BETWEEN 1 AND 100000000

AND ACCOUNT_ID = 12000000

. . .

WHERE ACCOUNT_ID BETWEEN 100000001 AND 200000000

AND ACCOUNT_ID = 12000000

DB2 will prune the query blocks generated at statement compile time base upon the literal
value supplied in the predicate. So, although in our example above two query blocks would
be generated, one of them will be pruned when the statement is compiled. DB2 compares the
literal predicate supplied in the query against the view with the predicates in the view. Any
unnecessary query blocks are pruned. So, since one of the resulting combined predicates is
impossible, DB2 eliminates that query block. Only one underlying table will then be accessed.

Query block pruning can happen at statement compile (bind) time, or at run time if a host
variable or parameter marker is supplied for a redundant predicate. So, let’s take the previous
query example, and replace the literal with a host variable:

SELECT *

FROM V_ACCOUNT_HISTORY

WHERE ACCOUNT_ID = :H1

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 43

PREDICATES AND SQL TUNING

If this statement was embedded in a program, and bound into a plan or package, two query
blocks would be generated. This is because DB2 does not know the value of the host variable in
advance, and distributes the predicate amongst both generated query blocks. However, at run
time DB2 will examine the supplied host variable value, and dynamically prune the query
blocks appropriately. So, if the value 12000000 was supplied for the host variable value, then
one of the two query blocks would be pruned at run time, and only one underlying table would
be accessed. This is a complicated process that does not always work. You should test it by
stopping one of the tables, and then running a query with a host variable that should prune the
query block on that table. If the statement is successful then runtime query block pruning is
working for you.

We can get query block pruning on literals and host variables. However, we can’t get query
block pruning on joined columns. In certain situations with many query blocks (UNIONS),
many rows of data, and many index levels for the inner view or table expression of a join, we
have recommended using programmatic joins in situations in which the query can benefit from
runtime query block pruning using the joining column. Please be aware of the fact that this is
an extremely specific recommendation, and certainly not a general recommendation. You
should also be aware of the fact that there are limits to UNION in view (or table expression),
and that you should always test to see if you get bind time or run time query block pruning. In
some cases it just doesn’t happen, and there are APARs out there that address the problems,
but are not comprehensive. So, testing is important.

You can influence proper use of runtime query block pruning by encouraging distribution of
joins and predicates into the UNION in view (see section below). This is done by reducing the
number of tables in the UNION, or by repeating host variables in predicates instead of, or in
addition to using correlation. Take a look at the query below.

SELECT

{history data columns}

FROM V_ACCOUNT_HISTORY HIST

WHERE HIST.ACCT_ID = :acctID

AND HIST.HIST_EFF_DTE = ‘2005-08-01’

AND HIST.UPD_TSP =

(SELECT MAX(UPD_TSP)

FROM V_ACCOUNT_HISTORY HIST2

WHERE HIST2.ACCT_ID = :acctID

AND HIST2.HIST_EFF_DTE = ‘2005-08-01’)

WITH UR;

44 ca.com
PREDICATES AND SQL TUNING

The predicate on ACCT_ID in the subquery could be correlated to the outer query, but it isn’t.
The same goes for the predicate on the HIST_EFF_DTE predicate in the subquery. The reason
for repeating the host variable and value references was to take advantage of the runtime
pruning. Correlated predicates would not have gotten the pruning.

Recommendations for Distributed Dynamic SQL

DB2 for z/OS is indeed the ultimate data server, and there is a proliferation of dynamic
distributed SQL hitting the system, either via JDBC, ODBC, or DB2 CLI. Making sure this
SQL is always performing the best can sometimes be a challenge. Here are some general
recommendations for improving the performance of distributed dynamic SQL:
• Turn on the dynamic statement cache.
• Use a fixed application level authorization id. This allows accounting data to be grouped by
the id, and applications can be identified by their id. It also makes for a more optimal use of
the dynamic statement cache in that statements are reused by authorization id.
• Parameter markers are almost always better than literal values. You need an exact statement
match to reuse the dynamic statement cache. Literal values are only helpful when there is a
skewed distribution of data, and that skewed distribution is properly reflected via frequency
distribution or histogram statistics in the system catalog.
• Use the EXPLAIN STMTCACHE ALL command to expose all the statements in the dynamic
statement cache. The output from this command goes into a table that contains all sorts of
runtime performance information. In addition, you can EXPLAIN individual statements in the
dynamic statement cache once they have been identified.
• Use connection pooling.
• Consolidate calls into stored procedures. This only works if multiple statements and program
logic, representing a transaction, are consolidated together in a single stored procedure.
• Use the DB2 CLI/ODBC/JDBC Static Profiling feature. This CLI feature can be used to turn
dynamic statements into static statements. This is documented in the DB2 Call Level Guide
and Reference, Volume 1. Static profiling enables you to capture dynamic SQL statements on
the client, and then bind the statements on the server into a package. Once the statements
are bound the initialization file can be modified, instructing the interface software to use the
static statement whenever the equivalent dynamic statement is issued from the program.

Be aware of the fact that if you are moving a local batch process to a remote server, you are
going to loose some of the efficiencies that go along with the RELEASE(DEALLOCATE) bind
parameter, in particular sequential detection and index lookaside.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 45

PREDICATES AND SQL TUNING

Influencing the Optimizer

The DB2 optimizer is very good at picking the correct access path, as long as it is given the
proper information. In situations in which it does not pick the optimal access path it typically
has not been given enough information. This is sometimes the case when we are using
parameter markers or host variables in a statement.

Proper Statistics
What can we say? DB2 utilizes a cost-based optimizer, and that optimizer needs accurate
statistical information about your data. Collecting the proper statistics is a must to good
performance. With each new version of DB2 the optimizer takes more advantage of catalog
statistics. This also means that with each new version DB2 is more dependent upon catalog
statistics. You should have statistics on every column referenced in every WHERE clause in
your shop. If you are using parameter markers and host variables then in the least you need
cardinality statistics. If you have skewed data, or are using literal values in your SQL
statements, then perhaps you need frequency distribution and/or histogram statistics. If you
suspect columns are correlated, you can gather column correlation statistics. To determine if
columns are correlated you can run these two queries (DB2 V8, and DB2 9):

SELECT COUNT (DISTINCT CITY) AS CITYCNT *

COUNT (DISTINCT STATE) AS STATECNT

FROM CUSTOMER

SELECT COUNT (*) FROM

(SELECT DISTINCT CITY, STATE

FROM CUSTOMER) AS FINLCNT

If the number from the second query is lower than the number from the first query then the
columns are correlated.

You can also run GROUP BY queries against tables for columns used in predicates to count the
occurrences of values in these columns. These counts can give you a good indication as to
whether or not you need frequency distribution statistics or histogram statistics, and runtime
reoptimization for skewed data distributions.

46 ca.com
PREDICATES AND SQL TUNING

Runtime Reoptimization
If your query contains a predicate with an embedded literal value then DB2 knows something
about the input to the query, and can take advantage of frequency distribution or histogram
statistics if available. This can result in a much improved filter factor, and better access path
decisions by the optimizer. However, what if DB2 doesn’t know anything about your input value:

SELECT *

FROM EMP

WHERE MIDINIT > :H1

In this case if the values for MIDINT are highly skewed, then DB2 could make an inaccurate
estimate of the filter factor for some input values.

DB2 can employ something called runtime reoptimization to help your queries. For static SQL
the option of REOPT(ALWAYS) is available. This bind option will instruct DB2 to recalculate
access paths at runtime using the host variable parameters. This can result in improved
execution time for large queries. However, if there are many queries in the package then they
will all get reoptimized. This could negatively impact statement execution time for these
queries. In situations where you use REOPT(ALWAYS) consider separating the query that can
benefit into it’s own package.

For dynamic SQL statements there are three options, REOPT(ALWAYS), REOPT(ONCE), and
REOPT(AUTO). REOPT(AUTO) is DB2 9 only.
• REOPT(ALWAYS) This will reoptimize a dynamic statement with parameter markers based
upon the values provided on every execution.
• REOPT(ONCE) Will reoptimize a dynamic statement the first time it is executed base upon
the values provided for parameter markers. The access path will then be reused until the
statement is removed from the dynamic statement cache, and needs to be prepared again.
This reoptimization option should be used with care as the first execution should have good
representative values.
• REOPT(AUTO) Will track how the values for the parameter markers change on every
execution, and will then reoptimize the query based upon those values if it determines that
the values have changed significantly

There is also a system parameter called REOPTEXT (DB2 9) that enables the REOPT(AUTO)
like behavior, subsystem wide, for any dynamic SQL queries (without NONE, ALWAYS, or
ONCE already specified) that contain parameter markers when it detects changes in the values
that could influence the access path.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 47

PREDICATES AND SQL TUNING

OPTIMIZE FOR Clause

One thing that the optimizer certainly does not know about your query is how much data
you are actually going to fetch from the query. DB2 is going to use the system catalog statistics
to get an estimate of the number of rows that will be returned if the entire query is processed
by the application. However, if you are not going to read all the rows of the result set, then
you should use the OPTIMIZE FOR n ROWS clause, where n is the number of rows you intent
to fetch.

The OPTIMIZE FOR clause is a way, within a SQL statement, to tell DB2 how many rows you
intent to process. DB2 can then make access path decisions in order to determine the most
efficient way to access the data for that quantity. The use of this clause will discourage such
things as list prefetch and sequential prefetch, and multi-index access. It will encourage index
usage to avoid a sort, and a join method of nested loop join. A value of 1 is the strongest
influence on these factors.

You should actually put the number of rows you intend to fetch in the OPTIMIZE FOR clause.
Incorrectly representing the number of rows you intent to fetch can result in a more poorly
performing query.

Encouraging Index Access and Table Join Sequence

There are certain techniques you can employ to encourage DB2 to choose a certain index or a
table access sequence in a query that accesses multiple tables:

When DB2 joins tables together in an inner join it attempts to select the table that will qualify
the fewest rows first in the join sequence. If, for some reason, DB2 has chosen the incorrect
table first (maybe due to statistics or host variables) then you can attempt to change the table
access sequence by employing one or more of these techniques:
• Enable predicates on the table you want to be first. By increasing potential matchcols on this
table DB2 may select an index for more efficient access and change the table access sequence.
• Disable predicates on the table you don’t want accessed first. Predicate disablers are
documented in the DB2 administration Guide (DB2 V7, DB2 V8) or the DB2 Performance
Monitoring and Tuning Guide (DB2 9). We do not recommend using predicate disablers.
• Force materialization of the table you want accessed first by placing it into a nested table
expression with a DISTINCT or GROUP BY. This could change the join type as well as the
join sequence. This technique is especially useful when a nested loop join is randomly
accessing the inner table.
• Convert joins to subqueries. When you code subqueries you tell DB2 the table access
sequence. Non-correlated subqueries are accessed first, then the outer query is executed,
and then any correlated subqueries are executed. Of course, this is only effective if the table
moved from a join to a subquery doesn’t have to return data.
• Convert a joined table to a correlated nested table expression. This will force another table to
be accessed first as the data for the correlated reference is required prior to the table in the
correlated nested table expression being accessed.
• Convert an inner join to a left join. By coding a left join you have absolutely dictated the table
join sequence to DB2, as well as the fact that the right table will filter no data.

48 ca.com
PREDICATES AND SQL TUNING

• Add a CAST function to the join predicate for the table you want accessed first. By placing
this function on that column you will encourage DB2 to access that table first in order to
avoid a stage 2 predicate against the second table.
• Code an ORDER BY clause on the columns of the index of the table that you want to be
accessed first in the join sequence. This may influence DB2 to use that index to avoid the
sort, and access that table first.
• Change the order of the tables in the FROM clause. You can also try converting from implicit
join syntax to explicit join syntax and vice versa.
• Try coding a predicate on the table you want accessed first as a non-transitive closure
predicate, for example a non-correlated subquery against the SYSDUMMY1 table that
returns a single value rather than an equals predicate on a host variable or literal value. Since
the subquery is not eligible for transitive closure, then DB2 will not generate the predicate
redundantly against the other table, and has less encouragement to choose that table first.

If DB2 has chosen one index over another, and you disagree then you try one of these
techniques to influence index selection:
• Code an ORDER BY clause on the leading columns of the index you want chosen. This may
encourage DB2 to chose that index to avoid a sort.
• Add columns to the index to make the access index-only.
• Increase the index matchcols either by modifying the query or the index.
• You could disable predicates that are matching other indexes. The IBM manuals document
predicate disablers. We don’t recommend them.
• Try using the OPTIMIZE FOR clause.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 49

Page intentionally left blank
CHAPTER 4

Table and Index Design for

Performance

It is very important to make the correct design choices when designing physical objects such
as tables, table spaces, and indexes — once a physical structure has been defined and implemented,
it is generally difficult and time-consuming to make changes to the underlying structure. The
best way to perform logical database modeling is to use strong guidelines developed by an
expert in relational data modeling, or to use one of the many relational database modeling
tools supplied by vendors. But it is important to remember that just because you can ‘press a
button’ to have a tool migrate your logical model into a physical model, does not mean that
the physical model is the most optimal for performance. There is nothing wrong with twisting
the physical design to improve performance as long as the logical model is not compromised
or destroyed.

DB2 objects need to be designed for availability, ease of maintenance, and overall performance,
as well as for business requirements. There are guidelines and recommendations for achieving
these design goals, but how each of these is measured will depend on the business and the
nature of the data.

General Table Space Performance Recommendations

Here are some general recommendations for table space performance.

Use Segmented or Universal Table Spaces

Support for simple table spaces is going away. Although you can create them in DB2 V7 and
DB2 V8, they cannot be created in DB2 9 (although you can still use previously created simple
table spaces). There are several advantages to using segmented table spaces. Since the pages
in a segment will only contain rows from one table, there will be no locking interference with
other tables. In simple table spaces, rows are intermixed on pages, and if one table page is
locked, it can inadvertently lock a row of another table just because it is on the same page.
When you only have one table per table space, this is not an issue; however, there are still
several benefits to having a segmented table space for one table. If a table scan is performed,
the segments belonging to the table being scanned are the only ones accessed; empty pages
will not be scanned. If a mass delete or a DROP table occurs, segment pages are available for
immediate reuse, and it is not necessary to run a REORG utility. Mass deletes are also much
faster for segmented table spaces, and they produce less logging (as long as the table has not
been defined with DATA CAPTURE CHANGES). Also, the COPY utility will not have to copy
empty pages left by a mass delete. When inserting records, some read operations can be
avoided by using the more comprehensive space map of the segmented table space. Whatever
version you are on, you should be using a segmented table space over a simple table space.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 51

TABLE AND INDEX DESIGN FOR PERFORMANCE

DB2 9 introduces a new type of table space called a universal table space. A universal table
space is a table space that is both segmented and partitioned. Two types of universal table
spaces are available: the partition-by-growth table space and the range-partitioned table space.

A universal table space offers the following benefits:

• Better space management relative to varying-length rows: A segmented space map page
provides more information about free space than a regular partitioned space map page.
• Improved mass delete performance: Mass delete in a segmented table space organization
tends to be faster than table spaces that are organized differently. In addition, you can
immediately reuse all or most of the segments of a table.

Before DB2 9, partitioned tables required key ranges to determine the target partition for
row placement. Partitioned tables provide more granular locking and parallel operations by
spreading the data over more data sets. Now, in DB2 9, you have the option to partition
according to data growth, which enables segmented tables to be partitioned as they grow,
without the need for key ranges. As a result, segmented tables benefit from increased table
space limits and SQL and utility parallelism that were formerly available only to partitioned
tables, and you can avoid needing to reorganize a table space to change the limit keys.

You can implement partition-by-growth table space organization in several ways:

• You can use the new MAXPARTITIONS clause on the CREATE TABLESPACE statement to
specify the maximum number of partitions that the partition-by-growth table space can
accommodate. The value that you specify in the MAXPARTITIONS clause is used to protect
against run-away applications that perform an insert in an infinite loop.
• You can use the MAXPARTITIONS clause on the ALTER TABLESPACE statement to alter
the maximum number of partitions to which an existing partition-by-growth table space can
grow. This ALTER TABLESPACE operation acts as an immediate ALTER.

A range-partitioned table space is a type of universal table space that is based on partitioning
ranges and that contains a single table. The new range-partitioned table space does not
replace the existing partitioned table space, and operations that are supported on a regular
partitioned or segmented table space are supported on a range-partitioned table space. You
can create a range-partitioned table space by specifying both SEGSIZE and NUMPARTS
keywords on the CREATE TABLESPACE statement. With a range-partitioned table space, you
can also control the partition size, choose from a wide array of indexing options, and take
advantage of partition-level operations and parallelism capabilities. Because the range-
partitioned table space is also a segmented table space, you can run table scans at the
segment level. As a result, you can immediately reuse all or most of the segments of a table
after the table has been dropped or a mass delete has been performed.

Range-partitioned universal table spaces follow the same partitioning rules as for partitioned
table spaces in general. That is, you can add, rebalance, and rotate partitions. The maximum
number of partitions possible for both range-partitioned and partition-by-growth universal
table spaces, as for partitioned table spaces, is controlled by the DSSIZE and page size.

52 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

Clustering and Partitioning

The clustering index is NOT always the primary key. It generally is not the primary key but a
sequential range retrieval key and should be chosen by the most frequent range access to the
table data. Range and sequential retrieval are the primary requirement, but partitioning is
another requirement and can be the more critical requirement, especially as tables get
extremely large in size. If you do not specify an explicit clustering index DB2 will cluster by the
index that is the oldest by definition (often referred to as the first index created). If the oldest
index is dropped and recreated, that index will now be a new index and clustering will now be
by the next oldest index. The basic underlying rule to clustering is that if your application is
going to have a certain sequential access pattern or a regular batch process you should cluster
the data according to that input sequence.

Clustering and partitioning can be completely independent, and we’re given a log of options
for organizing our data in a single dimension (clustering and partitioning are based on the
same key) dual dimensions (clustering inside each partition by a different key) or multiple
dimensions (combining different tables with different partitioning unioned inside a view). You
should chose a partitioning strategy based upon a concept of application controlled parallelism,
separating old and new data, grouping data by time, or grouping data by some meaningful
business entity (e.g. sales region, office location). Then within those partitions you can cluster
the data by your most common sequential access sequence.

There is a way to dismiss clustering for inserts. See the section in this chapter on append
processing.

There are several advantages to partitioning a table space. For large tables, partitioning is the
only way to store large amounts of data, but partitioning also has advantages for tables that
are not necessarily large. DB2 allows us to define up 4096 partitions of up to 64 GB each
(however, total table size is limited depending on the DSSIZE specified). Non-partitioned table
spaces are limited to 64 GB of data. You can take advantage of the ability to execute utilities on
separate partitions in parallel. This also gives you the ability to access data in certain partitions
while utilities are executing on others. In a data-sharing environment, you can spread partitions
among several members to split workloads. You can also spread your data over multiple
volumes and need not use the same storage group for each data set belonging to the table
space. This also allows you to place frequently accessed partitions on faster devices.

Free Space
The FREEPAGE and PCTFREE clauses are used to help improve the performance of updates
and inserts by allowing free space to exist on table spaces. Performance improvements include
improved access to the data through better clustering of data, faster inserts, fewer row
overflows, and a reduction in the number of REORGs required. Some tradeoffs include an
increase in the number of pages, fewer rows per I/O and less efficient use of buffer pools, and
more pages to scan. As a result, it is important to achieve a good balance for each individual
table space and index space when deciding on free space, and that balance will depend on the
processing requirements of each table space or index space. When inserts and updates are
performed, DB2 will use the free space defined, and by doing this it can keep records in
clustering sequence as much as possible. When the free space is used up, the records must be
located elsewhere, and this is when performance can begin to suffer. Read-only tables do not

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 53

TABLE AND INDEX DESIGN FOR PERFORMANCE

require any free space, and tables with a pure insert-at-end strategy (append processing)
generally don’t require free space. Exceptions to this would be tables with VARCHAR columns
and tables using compression that are subject to updates. When DB2 attempts to maintain
cluster during inserting and updating it will search nearby for free space and/or free pages for
the row. If this space is not found DB2 will exhaustively search the table space for a free place
to put the row before extending a segment or a data set. You can notice this activity by
gradually increasing insert CPU times in you application (by examining the accounting records)
as well as increasing getpage counts and relocated row counts. When this happens it’s time for
a REORG, and a perhaps a reevaluation of your free space quantities.

Allocations
The PRIQTY and SECQTY clauses of the CREATE TABLESPACE and ALTER TABLESPACE SQL
statements specify the space that is to be allocated for the table space if the table space is
managed by DB2. These settings influence the allocation by the operating system of the
underlying VSAM data sets in which table space and index space data is stored. The PRIQTY
specifies the minimum primary space allocation for a DB2-managed data set of the table space
or partition. The primary space allocation is in kilobytes, and the maximum that can be
specified is 64 GB. DB2 will request a data set allocation corresponding to the primary space
allocation, and the operating system will attempt to allocate the initial extent for the data set in
one contiguous piece. The SECQTY specifies the minimum secondary space allocation for a
DB2-managed data set of the table space or partition. DB2 will request secondary extents in a
size according to the secondary allocation. However, the actual primary and secondary data set
sizes depend upon a variety of settings and installation parameters.

You can specify the primary and secondary space allocations for table spaces and indexes or
allow DB2 to choose them. Having DB2 choose the values, especially for the secondary space
quantity, increases the possibility of reaching the maximum data set size before running out of
extents. In addition, the MGEXTSZ subsystem parameter will influence the SECQTY
allocations, and when set to YES (NO is the default) changes the space calculation formulas to
help utilize all of the potential space allowed in the table space before running out of extents.

You can alter the primary and secondary space allocations for a table space. The secondary
space allocation will take immediate effect. However, since the primary allocation happens
when the data set is created, then that allocation will not take affect until a data set is added
(depends upon the type of table space) or until the data set is recreated via utility execution
(such as a REORG or LOAD REPLACE).

Column Ordering
There are two reasons you want to order your columns in specific ways; to reduce CPU
consumption when reading and writing columns with variable length data, and to minimize the
amount of logging performed when updating rows. Which version of DB2 you are using will
impact how you, or how DB2, organizes your columns.

54 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

For reduced CPU when using variable length columns you’ll want to put your variable length
columns after all of your fixed length columns (DB2 V7 and DB2 V8). If you mix the variable
length columns and your fixed length columns together then DB2 will have to search for any
fixed or variable length column after the first variable length column, and this will increase CPU
consumption. So, in DB2 V7 or DB2 V8 you want to put the variable length columns after the
fixed length columns when defining your table. This is especially true for any read-only
applications. For applications in which the rows are updated, you may want to organize your
data differently (read on). Things change with DB2 9 as it employs something called reordered
row format. Once you move to new function mode in DB2 9 any new tablespace you create will
automatically have its variable length columns placed after the fixed length columns physically
in the table space, regardless of the column ordering in the DDL. Within each grouping (fixed
and variable) your DDL column order is respected. In addition to new table spaces any table
spaces that are REORGed or LOAD REPLACEd will get the reordered row format.

For reduced logging you’ll want to order the rows in your DDL a little differently. For high
update tables you’ll want the columns that never changed placed first in the row, followed by
the columns that change less frequently, then followed by the columns that changed all the
time (e.g. an update timestamp). So, you’ll want you variable length columns that never change
in front of the fixed length columns that do change (DB2 V7 and DB2 V8) in order to reduce
logging. This is because DB2 will record the first byte changed to last byte changed for fixed
length rows, and first byte changed to end of the row for variable length rows if the length
changes (unless the table has been defined with DATA CAPTURE CHANGES which will cause
the entire before and after image to be logged for updates). This all changes once you’ve
moved to DB2 9, and the table space is using the reordered row format. In this case you have
no control over the placement of never changing variable length rows in front of always
changing fixed length rows. This can possibly mean increased logging for your heavy updaters.
To reduce the logging in these situations you can still order the columns such that the most
frequently updated columns are last, and DB2 will respect the order of the columns within the
grouping. You can also contact IBM about turning off the automatic reordered row format if
this is a concern for you.

Utilizing Table Space Compression

Using the COMPRESS clause of the CREATE TABLESPACE and ALTER TABLESPACE SQL
statements allows for the compression of data in a table space or in a partition of a partitioned
table space. In many cases, using the COMPRESS clause can significantly reduce the amount
of DASD space needed to store data, but the compression ratio achieved depends on the
characteristics of the data.

Compression allows us to get more rows on a page and therefore see many of the following
performance benefits, depending on the SQL workload and the amount of compression:
• Higher buffer pool hit ratios
• Fewer I/Os
• Fewer getpage operations
• Reduced CPU time for image copies

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 55

TABLE AND INDEX DESIGN FOR PERFORMANCE

There are also some considerations for processing cost when using compression, but that cost
is relatively low.
• The processor cost to decode a row using the COMPRESS clause is significantly less than the
cost to encode that same row.
• The data access path DB2 uses affects the processor cost for data compression. In general,
the relative overhead of compression is higher for table space scans and less costly for
index access.

Some data will not compress well so you should query the PAGESAVE column in
SYSIBM.SYSTABLEPART to be sure you are getting a savings (at least 50% is average). Data
that does not compress well includes binary data, encrypted data, and repeating strings. Also
you should never compress small tables/rows if you are worried about concurrency issues as
this will put more rows on a page.

Keep in mind when you compress the row is treated as varying length with length change when
it comes to updates. This means there is a potential for row relocation causing high numbers in
NEARINDREF and FARINDREF. This means you are now doing more I/O to get to your data
because it has been relocated and you will have to REORG to get it back to its original position.

Utilizing Constraints
Referential integrity (RI) allows you to define required relationships between and within tables.
The database manager maintains these relationships, which are expressed as referential
constraints, and requires that all values of a given attribute or table column also exist in some
other table column.

In general DB2 enforced referential integrity is much more efficient than coding the equivalent
logic in your application program. In addition, have the relationships enforced in a central
location in the database is much more powerful than making it dependent upon application
logic. Of course, you are going to need indexes to support the relationships enforced by DB2.

Remember that referential integrity checking has cost associated with it and can become
expensive if used for something like continuous code checking. RI is meant for parent/child
relationship, not code checking. Better options for this include check constraints, or even better
to put codes in memory and check them there.

Table check constraints will enforce data integrity at the table level. Once a table-check
constraint has been defined for a table, every UPDATE and INSERT statement will involve
checking the restriction or constraint. If the constraint is violated, the data record will not be
inserted or updated, and a SQL error will be returned.

56 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

A table check constraint can be defined at table creation time or later, using the ALTER TABLE
statement. The table-check constraints can help implement specific rules for the data values
contained in the table by specifying the values allowed in one or more columns in every row of
a table. This can save time for the application developer, since the validation of each data value
can be performed by the database and not by each of the applications accessing the database.
However, check constraints should, in general, not be used for data edits in support of data
entry. It’s best to cache code values locally within the application and performs the edits local
to the application. This will avoid numerous trips to the database to enforce the constraints.

Indexing
Depending upon you application and the type of access, indexing can be a huge performance
advantage or a performance bust. Is your application a heavy reader, or perhaps even a read-
only application? Then lots of indexes can be a real performance benefit. What if your application
is constantly inserting, updating, and deleting from your table. Then in that case maybe lots of
indexes can be a detriment. When does it matter, well of course it depends. Just remember this
simple rule; if you are adding a secondary index to a table then for inserts and deletes, and
perhaps even updates, you are adding another random read to these statements. Can you
application afford that in support of queries that may use the index? That’s for you to decide.

The Holy Grail of Efficient Database Design

When designing your database you should set out with a single major goal in mind for your
tables that will contain a significant amount of data, and lots of DML activity. That goal is one
index per table. If you can achieve this goal then you’ve captured the holy grail of efficient
database design. These indexes would need to support:
• Insert strategy
• Primary key
• Foreign key (if a child table)
• SQL access path
• Clustering

You can head on this path by respecting some design objectives:

• Avoid surrogate keys. Use meaningful business keys instead
• Let the child table inherit the parent key as part of the child’s primary key
• Cluster all tables in a common sequence
• Determine the common access paths, respect them, and try not to change them
during design
• Never entertain a “just in case” type of design mentality

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 57

TABLE AND INDEX DESIGN FOR PERFORMANCE

General Index Design Recommendations

Once you’ve determined your indexes you need to design them properly for performance. Here
are some general index design guidelines.

Index Compression
As of DB2 9 an index can be defined with the COMPRES YES option (COMPRESS NO is
default). Index compression can be used where there is a desire to reduce the amount of disk
space an index consumes. Index compression is recommended for applications that do
sequential insert operations with few of no delete operations. Random inserts and deletes can
adversely effect compression. An Index compression is also recommended for applications
where the indexes are created primarily for scan operations.

A bufferpool that is used to create the index must be 8K, 16K, or 32K in size. The physical page
size for the index on disk will be 4K. The reason that the bufferpool size is larger than the page
size is that index compression only saves space on disk. The data in the index page is expanded
when read into the pool. So, index compression can possibly save you read time for sequential
operations, and perhaps random (but far less likely).

Index compression can have a significant impact on the REORGs and index rebuilds resulting
in significant savings in this area. Keep in mind, however, that if you use the copy utility to back
up an index that image copy is actually uncompressed.

Index Free Space

Setting the PCTFREE and FREEPAGE for your indexes depends upon how much insert and
delete activity is going to occur against those indexes. For indexes that have little of no inserts
and deletes (updates that change key columns are actually inserts and deletes) then you can
probably use a small PCTFREE with no free pages. For indexes with heavy changes you should
consider larger amounts of free space. Keep in mind that adding free space may increase the
number of index levels, and subsequently increase the amount of I/O for random reads. If you
don’t have enough free space you could get an increased frequency of index page splits. When
DB2 splits a page it’s going to look for a free page in which to place one of the split pages. If it
does not find a page nearby it will exhaustively search the index for a free page. This could lead
to CPU and locking problems for very large indexes. The best thing to do is to set a predictive
PCTFREE that anticipates growth over a period of time such that you will never split a page.
Then you should monitor the frequency of page splits to determine when to REORG the index,
or establish a regular REORG policy for that index.

58 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

Secondary Indexes
There are two types of secondary indexes, non-partitioning secondary indexes and data
partitioned secondary indexes.

NON-PARTITIONING SECONDARY INDEXES NPSIs are indexes that are used on partitioned
tables. They are not the same as the clustered partitioning key, which is used to order and
partition the data, but rather they are for access to the data. NPSIs can be unique or non-
unique. While you can have only one clustered partitioning index, you can have several NPSIs
on a table if necessary. NPSIs can be broken apart into multiple pieces (data sets) by using the
PIECESIZE clause on the CREATE INDEX statement. Pieces can vary in size from 254 KB to 64
GB — the best size will depend on how much data you have and how many pieces you want to
manage. If you have several pieces, you can achieve more parallelism on processes, such as
heavy INSERT batch jobs, by alleviating the bottlenecks caused by contention on a single data
set. As of DB2 V8 and beyond the NPSI can be the clustering index.

NPSIs are great for fast read access as there is a single index b-tree structure. They can,
however, grow extremely large and become a maintenance and availability issue.

DATA PARTITIONED SECONDARY INDEXES The DPSI index type provides us with many
advantages for secondary indexes on a partitioned table space over the traditional NPSIs
(Non-Partitioning Secondary Indexes) in terms of availability and performance.

The partitioning scheme of the DPSI will be the same as the table space partitions and the
index keys in ‘x’ index partition will match those in ‘x’ partition of the table space. Some of the
benefits that this provides include:
• Clustering by a secondary index
• Ability to easily rotate partitions
• Efficient utility processing on secondary indexes (no BUILD-2 phase)
• Allow for reducing overhead in data sharing (affinity routing)

DRAWBACKS OF DPSIS While there will be gains in furthering partition independence, some
queries may not perform as well. If the query has predicates that reference columns in a single
partition are therefore are restricted to a single partition of the DPSI it will benefit from this
new organization. The queries will have to be designed to allow for partition pruning through
the predicates in order to accomplish this. This means that the at least leading column of the
partitioning key has to be supplied in the query in order for DB2 to prune (eliminate) partitions
from the query access path. However if the predicate references only columns in the DPSI it
may not perform very well because it may need to probe several partitions of the index. Other
limitations to using DPSIs include the fact that they cannot be unique (some exceptions in DB2
9) and they may not be the best candidates for ORDER BYs.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 59

TABLE AND INDEX DESIGN FOR PERFORMANCE

Rebuild or Recover?
As of DB2 V8 you can define an index as COPY YES. This means, as with a table space, you
can use the COPY and RECOVER utilities to backup and recover these indexes. This may be
especially useful for very large indexes. Be aware, however, that large NPSI’s cannot be copied
in pieces, and can get very large in size. You’ll need to have large data sets to hold the backup.
This could mean large quantities of tapes, or perhaps even hitting the 59 volume limit for a
data set on DASD. REBUILD will require large quantities of temporary DASD to support sorts,
as well as more CPU than a RECOVER. You should carefully consider whether your strategy for
an index should be backup and recover, or rebuild.

Special Types of Tables Used for Performance

Here are some table designs that are part of the DB2 product offering and are intended to help
you boost the performance of the applications.

Materialized Query Tables

Decision support queries are often difficult and expensive. They typically operate over a large
amount of data which may have to scan or process terabytes of data and possibly perform
multiple joins and complex aggregations. With these types of queries traditional optimization
and performance is not always optimal.

As of DB2 V8, one solution can be with the use of MQTs — Materialized Query Tables. This
allows you to precompute whole or parts of each query and then use computed results to
answer future queries. MQTs provide the means to save the results of prior queries and then
reuse the common query results in subsequent queries. This helps avoid redundant scanning,
aggregating and joins. MQTs are useful for data warehouse type applications.

MQTs do not completely eliminate optimization problems but rather move optimizations
issues to other areas. Some challenges include finding the best MQT for expected workload,
maintaining the MQTs when underlying tables are updated, ability to recognize usefulness of
MQT for a query, and the ability to determine when DB2 will actually use the MQT for a query.
Most of these types of problems are addressed by OLAP tools, but MQTs are the first step.

The main advantage of the MQT is that DB2 is able to recognize a summary query against the
source table(s) for the MQT, and rewrite the query to use the MQT instead. It is, however, your
responsibility to move data into the MQT, with via a REFRESH TABLE command, or by manually
moving the data yourself.

Volatile Tables
As of DB2 V8, volatile tables are a way to prefer index access over table space scans or non-
matching index scans for tables that have statistics that make them appear to be small. They
are good for tables that shrink and grow allowing matching index scans on tables that have
grown larger without new RUNSTATS.

60 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

They also improve support for cluster tables. Cluster tables are those tables that have groups
or clusters of data that logically belong together. Within each group rows need to be accessed
in same sequence to avoid lock contention during concurrent access. The sequence of access is
determined by primary key and if DB2 changes the access path lock contention can occur. To
best support cluster tables (volatile tables) DB2 will use index only access when possible. This
will minimize application contention on cluster tables by preserving the access sequence by
primary key. We need to be sure indexes are available for single table access and joins.

The keyword VOLATILE can be specified on the CREATE TABLE or the ALTER TABLE
statements. If specified you are basically forcing an access path of index accessing and no
list prefetch.

Clone Tables
In DB2 9 you can create a clone table on an existing base table at the current server by using
the ALTER TABLE statement. Although ALTER TABLE syntax is used to create a clone table,
the authorization granted as part of the clone creation process is the same as you would get
during regular CREATE TABLE processing. The schema (creator) for the clone table will be the
same as for the base table. You can create a clone table only if the base table is in a universal
table space.

To create a clone table, issue an ALTER TABLE statement with the ADD CLONE option.

ALTER TABLE base-table-name ADD CLONE clone-table-name

The creation or drop of a clone table does not impact applications accessing base table data.
No base object quiesce is necessary and this process does not invalidate plans, packages, or
the dynamic statement cache.

You can exchange the base and clone data by using the EXCHANGE statement. To exchange
table and index data between the base table and clone table issue an EXCHANGE statement
with the DATA BETWEEN TABLE table-name1 AND table-name2 syntax. This is in essence a
method of performing an online load replace!

After a data exchange, the base and clone table names remain the same as they were prior
to the data exchange. No data movement actually takes place. The instance numbers in the
underlying VSAM data sets for the objects (tables and indexes) do change, and this has the
effect of changing the data that appears in the base and clone tables and their indexes. For
example, a base table exists with the data set name *I0001.*. The table is cloned and the
clone’s data set is initially named *.I0002.*. After an exchange, the base objects are named
*.I0002.* and the clones are named *I0001.*. Each time that an exchange happens, the instance
numbers that represent the base and the clone objects change, which immediately changes the
data contained in the base and clone tables and indexes. You should also be aware of the fact
that when the clone is dropped and an uneven number of EXCHANGE statements have been
executed, the base table will have an *I0002.* data set name. This could be confusing.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 61

TABLE AND INDEX DESIGN FOR PERFORMANCE

Some Special Table Designs

In situations where you are storing large amounts of data or need to maximize transaction
performance there are many creative things you can do. Here are a few alternatives to
traditional table design.

UNION in View for Large Table Design

The amount of data being stored in DB2 is increasing in a dramatic fashion. Availability is also
an increasingly important feature of our databases. So, as we build these giant tables in our
system, we have to make sure that they are built in a way that makes them available 24 hours a
day, 7 days per week. These high demands are pushing database administrators to the edge.
They have to build large tables that are easy to access, hold significant quantities of data, are
easy to manage, and available all of the time.

Traditionally, our larger database tables have been placed into partitioned tablespaces.
Partitioning helps with database management because it’s easier to manage several small
objects versus one very large object. There are still some limits to partitioning. For example,
each partition is limited to a maximum size of 64GB, a partitioning index is required (DB2 V7
only), and if efficient alternate access paths to the data are desired then non-partitioning indexes
(NPSIs) are required. These NPSIs are not partitioned, and exist as single large database indexes.
Thus NPSIs can present themselves as an obstacle to availability (i.e. a utility operation against
a single partition may potentially make the entire NPSI unavailable), and as impairment to
database management as it is more difficult to manage such large database objects.

A UNION in a view can be utilized as an alternative to table partitioning in support of very large
database tables. In this type of design, several database tables can be created to hold different
subsets of the data that would have otherwise be held in a single table. Key values, similar to
what may be used in partitioning, can be used to determine which data goes into which of the
various tables. Take, for example, the following view definition:

62 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

CREATE VIEW V_ACCOUNT_HISTORY

(ACCOUNT_ID, PAYMENT_DATE, PAYMENT_AMOUNT,

PAYMENT_TYPE, INVOICE_NUMBER)

SELECT ACCOUNT_ID, PAYMENT_DATE, PAYMENT_AMOUNT,

PAYMENT_TYPE, INVOICE_NUMBER

FROM ACCOUNT_HISTORY1

WHERE ACCOUNT_ID BETWEEN 1 AND 100000000

UNION ALL

SELECT ACCOUNT_ID, PAYMENT_DATE, PAYMENT_AMOUNT,

PAYMENT_TYPE, INVOICE_NUMBER

FROM ACCOUNT_HISTORY2

WHERE ACCOUNT_ID BETWEEN 100000001 AND 200000000

UNION ALL

SELECT ACCOUNT_ID, PAYMENT_DATE, PAYMENT_AMOUNT,

PAYMENT_TYPE, INVOICE_NUMBER

FROM ACCOUNT_HISTORY3

WHERE ACCOUNT_ID BETWEEN 200000001 AND 300000000

UNION ALL

SELECT ACCOUNT_ID, PAYMENT_DATE, PAYMENT_AMOUNT,

PAYMENT_TYPE, INVOICE_NUMBER

FROM ACCOUNT_HISTORY4

WHERE ACCOUNT_ID BETWEEN 300000001 AND 999999999;

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 63

TABLE AND INDEX DESIGN FOR PERFORMANCE

By separating the data into different table, and creating the view over the tables we can create
a logical account history table with these distinct advantages over a single physical table:
• We can add or remove tables with very small outages, usually just the time it takes to drop
and recreate the view.
• We can partition each of the underlying tables, creating still smaller physical database
objects.
• NPSIs on each of the underlying tables could be much smaller and easier to manage then
they would be under a single table design.
• Utility operations could execute against an individual underlying table, or just a partition of
that underlying table. This greatly shrinks utility times against these individual pieces, and
improves concurrency. This truly gives us full partition independence.
• The view can be referenced in any SELECT statement in exactly the same way as a physical
table would be.
• Each underlying table could be as large as 16TB, logically setting the size limit of the table
represented by the view at 64TB.
• Each underlying table could be clustered differently, or could be a segmented or partitioned
tablespace.
• DB2 will distribute predicates against the view to every query block within the view, and
then compare the predicates. Any impossible predicates will result in the query block being
pruned (not executed). This is an excellent performance feature.

There are some limitations to UNION in a view that should be considered:

• The view is read-only. This means you’d have to utilize special program logic and possibly
even dynamic SQL to perform inserts, updates, and deletes against the base tables. If you
are using DB2 9, however, you can possible utilize INSTEAD OF triggers to provide this
functionality.
• Predicate transitive closure happens after the join distribution. So, if you are joining from a
table to a UNION in a view and predicate transitive closure is possible from the table to the
view then you have to code the redundant predicate yourself.
• DB2 can apply query block pruning for literal values and host variables, but not joined
columns. For that reason if you expect query block pruning on a joined column then you
should code a programmatic join (this is the only case). Also, in some cases the pruning
does not always work for host variables so you need to test.
• When using UNION in a view you’ll want to keep the number of tables in a query to a
reasonable number. This is especially true for joining because DB2 will distribute the join to
each query block. This multiplies the number of tables in the query, and this can increase
both bind time and execution time. Also, you could in some cases exceed the 225 table limit
in which case DB2 will materialize the view.
• In general, you’ll want to keep the number of tables UNIONed in the view to 15 or under.

64 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

Append Processing for High Volume Inserts

In situations in which you have very high insert rates you may want to utilize append
processing. When append processing is turned on, DB2 will use an alternate insert algorithm
whose intention it is to simply add data to the end of a partition or a table space. Append
processing can be turned of for DB2 V7 or DB2 V8 by setting the table space options PCTFREE
0 FREEPAGE 0 MEMBER cluster. Make sure to have APARS PQ86037 and PQ87381 applied. In
DB2 9 append processing is turned on via the APPEND YES option of the CREATE TABLE
statement. When append processing is turned on, DB2 will not respect cluster during inserts
and simply put all new data at the end of the table space.

If you have a single index for read access, then having append processing may mean more
random reads. This may require more frequent REORGs to keep the data organized for read
access. Also, if you are partitioning, and the partitioning key is not the read index then you will
still have random reads during insert to you non-partitioned index. You’ll need to make sure
you have adequate free space to avoid index page splits.

Append processing can also be used to store historical or seldom read audit information. In
these cases you would want to partition based upon an every ascending value (e.g. a date) and
have all new data go to the end of the last partition. In this situation all table space maintenance,
such as copies and REORGs, will be against the last partition. All other data will be static and
will not require maintenance. You will possibly need a secondary index, or each read query will
have to be for a range of values within the ascending key domain.

Building an Index on the Fly

In situations in which you are storing data via a key designed for a high speed insert strategy
with minimal table maintenance you’d also like to try to avoid secondary indexes. Scanning
billions of rows typically is not an option.

The solution may be to build a look-up table that acts as a sparse index. This look up table will
contain nothing more than your ascending key values. One example would be dates, say one
date per month for every month possible in our database. If the historical data is organized and
partitioned by the date, and we have only one date per month (to further sub-categorize the
data), then we can use our new sparse index to access the data we need. Using user-supplied
dates as starting and ending points the look up table can be used to fill the gap with the dates
in between. This gives us the initial path to the history data. Read access is performed by
constructing a key during SELECT processing. So, in this example we’ll access an account
history table (ACCT_HIST) that has a key on HIST_EFF_DTE, ACCT_ID, and our date lookup
table called ACCT_HIST_DATES, which contains one column and one row for each legitimate
date value corresponding to the HIST_EFF_DTE column of the ACCT_HIST table.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 65

TABLE AND INDEX DESIGN FOR PERFORMANCE

CURRENT DATA ACCESS Current data access is easy; we can retrieve the account history data
directly from the account history table.

SELECT {columns}

FROM ACCT_HIST

WHERE ACCT_ID = :ACCT-ID

AND HIST_EFF_DTE = :HIST-EFF-DTE;

RANGE OF DATA ACCESS Accessing a range of data is a little more complicated than simply
getting the most recent account history data. Here we need to use our sparse index history
date table to build the key on the fly. We apply the range of dates to the date range table, and
then join that to the history table.

SELECT {columns}

FROM ACCT_HIST HIST

INNER JOIN

ACCT_HIST_DATES DTE

ON HIST.HIST_EFF_DTE = DTE.EFF_DTE

WHERE HIST.ACCT_ID = :ACCT-ID

AND HIST.HIST_EFF_DTE BETWEEN :BEGIN-DTE AND :END-DTE;

66 ca.com
TABLE AND INDEX DESIGN FOR PERFORMANCE

FULL DATA ACCESS To access all of the data for an account we simply need a version of the
previous query without the date range predicate.

SELECT {columns}

FROM ACCT_HIST HIST

INNER JOIN

ACCT_HIST_DATES DTE

ON HIST.HIST_EFF_DTE = DTE.EFF_DTE

WHERE HIST.ACCT_ID = :ACCT-ID;

Denormalization “Light”
In many situations, especially those in which there is a conversion from a legacy flat file
based system to a relational database, there is a performance concern (or more importantly
a performance problem in that an SLA is not being met) for reading the multiple DB2 tables.
These are situations in which the application is expecting to read all of the data that was once
represented by a single record, but is now in many DB2 tables. In these situations many people
will begin denormalizing the tables. This is an act of desperation! Remember the reason your
moving your data into DB2 in the first place, and that is for all the efficiency, portability,
flexibility, and faster time to delivery for your new applications. By denormalizing you are
throwing these advantages away, and you may as well have stayed with your old flat file
based design.

In some situations, however, the performance of reading multiple tables compared to the
equivalent single record read just isn’t good enough. Well, instead of denormalization you could
possible employ a “denormalization light” instead. This type of denormalization can be applied
to parent and child tables, when the child table data is in an optional relationship to the parent.
Instead of denormalizing the optional child table data into the parent table simply instead add
a column to the parent table that basically indicates whether of not the child table has any data
for that parent key. This will require some additional application responsibility in maintaining
that indicator column. However, DB2 can utilize a during join predicate to avoid probing the
child table when there is no data for the parent key.

Take, for example, an account table and an account history table. The account may or may not
have account history, and so the following query would join the two tables together to list the
basic account information (balance) along with the history information if present:

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 67

TABLE AND INDEX DESIGN FOR PERFORMANCE

SELECT A.CURR_BAL, B.DTE, B.AMOUNT

FROM ACCOUNT A

LEFT OUTER JOIN

ACCT_HIST B

ON A.ACCT_ID = B.ACCT_ID

ORDER BY B.DTE DESC

In the example above the query will always probe the account history table in support of the
join, whether or not the account history table has data. We can employ our light denormalization
by adding an indicator column to the account table. Then we can use a during join predicate.
DB2 will only perform the join operation with the join condition is true. In this particular case
the access to the account history table is completely avoided when the indicator column a
value not equal to “Y”:

SELECT A.CURR_BAL, B.DTE, B.AMOUNT

FROM ACCOUNT A

LEFT OUTER JOIN

ACCT_HIST B

ON A.ACCT_ID = B.ACCT_ID

AND A.HIST_IND = ‘Y’

ORDER BY B.DTE DESC

DB2 is going to test that indicator column first before performing the join operation, and supply
nulls for the account history table when data is not present as indicated.

You can image now the benefit of this type of design when you are doing a legacy migration
from a single record system to 40 or so relational tables with lots of optional relationships. This
form of denormalizing can really improve performance in support of legacy system access,
while maintaining the relation design for efficient future applications.

68 ca.com
CHAPTER 5

EXPLAIN and Predictive

Analysis

It is important to know something about how your application will perform prior to the
application actually executing in a production environment. There are several ways in which we
can predict the performance of our applications prior to implementation, and several tools can
be used. The important thing you have to ask when you begin building your application is “Is
the performance important?” If not, then proceed with development at a rapid pace, and then
fix the performance once the application has been implemented. What if you can’t wait until
implementation to determine performance? Well, then you’re going to have to predict the
performance. This chapter will suggest ways to do that.

EXPLAIN Facility
The DB2 EXPLAIN facility is used to expose query access path information. This enables
application developers and DBAs to see what access path DB2 is going to take for a query, and
if any attempts at query tuning are needed. DB2 can gather basic access path information in a
special table called the PLAN_TABLE (DB2 V7, DB2 V8, DB2 9), as well as detailed information
about predicate stages, filter factor, predicate matching, and dynamic statements that are
cached (DB2 V8, DB2 9).

EXPLAIN can be invoked in one of the following ways:

• Executing the EXPLAIN SQL statement for a single statement
• Specifying the BIND option EXPLAIN(YES) for a plan or package bind
• Executing the EXPLAIN via the Optimization Service Center or via Visual EXPLAIN

When EXPLAIN is executed it can populate many EXPLAIN tables. The target set of EXPLAIN
tables depends upon the authorization id associated with the process. So, the creator (schema)
of the EXPLAIN tables is determined by the CURRENT SQLID of the person running the
EXPLAIN statement, or the OWNER of the plan or package at bind time. The EXPLAIN tables
are optional, and DB2 will only populate the tables that it finds under the SQLID or OWNER of
the process invoking the EXPLAIN. There are many EXPLAIN tables, which are documented in
the DB2 Performance Monitoring and Tuning Guide (DB2 9). Some of these tables are not
available in DB2 V7. The following tables can be defined manually, and the DDL can be found
in the DB2 sample library member DSNTESC:
• PLAN_TABLE The PLAN_TABLE contains basic access path information for each query
block of your statement. This includes, among other things, information about index usage,
and the number of matching index columns, which join method is utilized, which access
method is utilized, and whether or not a sort will be performed. The PLAN_TABLE forms the
basis for access path determination.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 69

EXPLAIN AND PREDICTIVE ANALYSIS

• DSN_STATEMNT_TABLE The statement table contains estimated cost information for the
cost of a statement. If the statement table is present when you EXPLAIN a query, then it will
be populated with the cost information that corresponds to the access path information for
the query that is stored in the PLAN_TABLE. For a given statement, this table will contain the
estimated processor cost in milliseconds, as well as the estimated processor cost in service
units. It places the cost values into two categories:
– Category A DB2 had enough information to make a cost estimation without using
any defaults.
– Category B DB2 had to use default values to make some cost calculations.
The statement table can be used to compare estimated costs when you are attempting to
modify statements for performance. Keep in mind, however, that this is a cost estimate, and
is not truly reflective of how your statement will be used in an application (given input
values, transaction patterns, etc). You should always test your statements for performance
in addition to using the statement table and EXPLAIN.
• DSN_FUNCTION_TABLE The function table contains information about user-defined
functions that are a part of the SQL statement. Information from this table can be
compared to the cost information (if populated) in the DB2 System Catalog table,
SYSIBM.SYSROUTINES, for the user-defined functions.
• DSN_STATEMENT_CACHE_TABLE This table is not populated by a normal invocation of
EXPLAIN, but instead by the EXPLAIN STMTCACHE ALL statement. Issuing the statement
will result in DB2 reading the contents of the dynamic statement cache, and putting runtime
execution information into this table. This includes information about the frequency of
execution of these statements, the statement text, the number of rows processed by the
statement, lock and latch requests, I/O operations, number of index scans, number of sorts,
and much more! This is extremely valuable information about the dynamic queries executing
in a subsystem. This table is available only for DB2 V8 and DB2 9.

EXPLAIN Tables Utilized by Optimization Tools

There are many more EXPLAIN tables than those listed here. These additional EXPLAIN tables
are typically populated by the optimization tools that use them. However, at least some of
them you can make and use yourself without having to use the various optimization tools.
These additional tables are documented in the DB2 Performance Monitoring and Tuning Guide
(DB2 9), however some of them are utilized in Version 8.

What EXPLAIN Does and Does Not Tell You

The PLAN_TABLE is the key table for determining the access path for a query. Here is some of
the critical information it provides:
• METHOD This column indicates the join method, or whether or not an additional sort step
is required.
• ACCESSTYPE This column indicates whether or not the access is via a table space scan, or
via index access. If it is by index access the specific type of index access is indicated.
• MATCHCOLS If an index is used for access this column indicates the number of columns
matched against the index.

70 ca.com
EXPLAIN AND PREDICTIVE ANALYSIS

• INDEXONLY A value of “Y” in this column indicates that the access required could be fully
served by accessing the index only, and avoiding any table access.
• SORTN####, SORTC#### These columns indicate any sorts that may happen in support
of a UNION, grouping, or a join operation, among others.
• PREFETCH This column indicates whether or not prefetch may play a role in the access.

By manually running EXPLAIN, and examining the PLAN_TABLE, you can get good information
about the access path, indexes that are used, join operations, and any sorting that may be
happening as part of your query. If you have additional EXPLAIN tables created (those created
by Visual Explain among other tools) then those tables are populated automatically either by
using those tools, or by manually running EXPLAIN. You can also query those tables manually,
especially if you don’t have the remote accessed required from a PC to use those tools. The
DB2 Performance Monitoring and Tuning Guide (DB2 9) documents all of these tables. These
tables provide detailed information about such things as predicate stages, filter factor,
partitions eliminated, parallel operations, detailed cost information, and more.

There is some information, however, that EXPLAIN does not tell you about your queries. You
have to be aware of this to effectively do performance tuning and predictive analysis. Here are
some of the things you cannot get from EXPLAIN:
• INSERT indexes EXPLAIN does not tell you the index that DB2 will use for an INSERT
statement. Therefore, it’s important that you understand your clustering indexes, and
whether or not DB2 will be using APPEND processing for your inserts. This understanding is
important for INSERT performance, and the proper organization of your data. See Chapter 4
for more information in this regard.
• Access path information for enforced referential constraints If you have INSERTS,
UPDATES, and DELETES in the program you have EXPLAINed, then any database enforced
RI relationships and associated access paths are not exposed in the EXPLAIN tables.
Therefore, it is your responsibility to make sure that proper indexes in support of the RI
constraints are established and in use.
• Predicate evaluation sequence The EXPLAIN tables do not show you the order in which
predicates of the query are actually evaluated. Please see Chapter 3 of this guide for more
information on predicate evaluation sequence.
• The statistics used The optimizer used catalog statistics to help determine the access path
at the time the statement was EXPLAINed. Unless you have historical statistics that happen
to correspond to the time the EXPLAIN was executed, then you don’t know what the
statistics looked like at the time of the EXPLAIN, and if they are different now and can
change the access path.
• The input values If you are using host variables in your programs then EXPLAIN knows
nothing about the potential input values to those host variables. This makes it important for
you to understand these values, what the most common values are, and if there is data skew
relative to the input values.
• The SQL statement The SQL statement is not captured in the EXPLAIN tables, although
some of the predicates are. If you EXPLAINed a statement dynamically, or via one of the
tools, then you know what the statement looks like. However, if you’ve EXPLAINed a package
or plan, then you are going to need to see the program source code.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 71

EXPLAIN AND PREDICTIVE ANALYSIS

• The order of input to your transactions Sure the SQL statement looks OK from an access
path perspective, but what is the order of the input data to the statement? What ranges are
being supplied? How many transactions are being issued? Is it possible to order the input or
the data in the tables, in a manner in which it’s most efficient. This is not covered in EXPLAIN
output. These things are discussed further, however, throughout this guide.
• The program source code In order to fully understand the impact of the access path that a
statement has used you need to see how that statement is being used in the application
program. So, you should always be looking at the program source code of the program the
statement is embedded in. How many times will the statement be executed? Is the
statement in a loop? Can we avoid executing the statement? Is the program issuing 100
statements on individual key values when a range predicate and one statement would
suffice? Is the programming performing programmatic joins? These are questions that can
only be answered by looking at the program source code. The EXPLAIN output may show
a perfectly good and efficient access path, but the statement itself could be completely
unnecessary. (this is where a tool — or even a trace — can be very helpful in order to verify
if the SQL statements executed really are what’s expected).

Optimization Service Center and Visual EXPLAIN

As the complexity of managing and tuning various workloads continues to escalate, many
database administrators (DBAs) are falling farther behind in their struggle to maintain quality
of service while also keeping costs in check. There are many tools currently available (such
as IBM Statistics Advisor, IBM Query Advisor etc.) designed to ease the workload of DBAs
through a rich set of autonomic tools that help optimize query performance and workloads.
You can use your favorite tool to identify and analyze problem SQL statements and to receive
expert advice about statistics that you can gather to improve the performance of problematic
and poorly performing SQL statements on a DB2 subsystem. It provides:
• The ability to snap the statement cache
• Collect statistics information
• Analyze indexes
• Group statements into workloads
• Monitor workloads
• An easy-to-understand display of a selected access path
• Suggestions for changing a SQL statement
• An ability to invoke EXPLAIN for dynamic SQL statements
• An ability to provide DB2 catalog statistics for referenced objects of an access path, or for a
group of statements
• A subsystem parameter browser with keyword find capabilities
• The ability to graphically create optimization hints (a feature not found in the V8 Visual
Explain product)

72 ca.com
EXPLAIN AND PREDICTIVE ANALYSIS

The OSC can be used to analyze previously generated explains or to gather explain data and
explain dynamic SQL statements.

The OSC is available for DB2 V8, and DB2 9. If you are using DB2 V7, then you can use the
DB2 Visual Explain product, which provides a subset of the functionality of the OSC.

DB2 Estimator
The DB2 Estimator product is another useful predictive analysis tool. This product runs on a
PC, and provides a graphical user interface for entering information about DB2 tables, indexes,
and queries. Table and index definitions and statistics can be entered directly, or imported view
DDL files. SQL statements can be imported from text files.

The table definitions and statistics can be used to accurately predict database sizes. SQL
statements can be organized into transactions, and then information about DASD models, CPU
size, access paths, and transaction frequencies can be set. Once all of this information is input
into Estimator, then capacity reports can be produced. These reports will contain estimates of
the DASD required, as well as the amount of CPU required for an application. These reports
are very helpful during the initial stage of capacity planning. That is, before any actual real
programs or test data is available. The DB2 Estimator product will no longer be available after
DB2 V8, and so you should download it today!

Predicting Database Performance

Lots of Developer and DBA time can be spent in the physical design of a database, and
database queries based upon certain assumptions about performance. A lot of this time is
wasted because the assumptions being made are not accurate. A lot of time is spent in
meetings where people are saying things such as “Well that’s not going to work”, “I think the
database will choose this access path”, or “We want everything as fast as possible.” Then,
when the application is finally implemented, and performs horribly, people scratch their heads
saying “…but we thought of everything!”

Another approach for large systems design is to spend little time considering performance
during the development and set aside project time for performance testing and tuning. This
frees the developers from having to consider performance in every aspect of their program-
ming, and gives them incentive to code more logic in their queries. This makes for faster
development time, and more logic in the queries means more opportunities for tuning (if all
the logic was in the programs, then tuning may be a little harder to do). Let the logical
designers do their thing, and let the database have a first shot at deciding the access path.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 73

EXPLAIN AND PREDICTIVE ANALYSIS

If you choose to make performance decisions during the design phase, then each performance
decision should be backed by solid evidence, and not assumptions. There is no reason why a
slightly talented DBA or developer can’t make a few test tables, generate some test data, and
write a small program or two to simulate a performance situation and test their assumptions
about how the database will behave. This gives the database the opportunity to tell you how to
design for performance. Reports can then be generated, and given to managers. It’s much
easier to make design decisions based upon actual test results.

Tools that you can use for testing statements, design ideas, or program processes include, but
are not limited to:
• REXX DB2 programs
• COBOL test programs
• Recursive SQL to generate data
• Recursive SQL to generate statements
• Data in tables to generate more data
• Data in tables to generate statements

Generating Data
In order to simulate program access you need data in tables. You could simply type some data
into INSERT statements, insert them into a table, and then use data from that table to generate
more data. Say, for example, that you have to test various program processes against a
PERSON_TABLE table and a PERSON_ORDER table. No actual data has been created yet, but
you need to test the access patterns of incoming files of orders. You can key some INSERT
statements for the parent table, and then use the parent table to propagate data to the child
table. For example, if the parent table, PERSON_TABLE, contained this data:

PERSON_ID NAME

1 JOHN SMITH

2 BOB RADY

74 ca.com
EXPLAIN AND PREDICTIVE ANALYSIS

Then the following statement could be used to populate the child table, PERSON_ORDER, with
some test data:

INSERT INTO PERSON_ORDER

(PERSON_ID, ORDER_NUM, PRDT_CODE, QTY, PRICE)

SELECT PERSON_ID, 1, ‘B100’, 10, 14.95

FROM YLA.PERSON_TABLE

UNION ALL

SELECT PERSON_ID, 2, ‘B120’, 3, 1.95

FROM YLA.PERSON_TABLE

The resulting PERSON_ORDER data would look like this:

PERSON_ID ORDER_NUM PRDT_CDE QTY PRICE

1 1 B100 10 14.95

1 2 B120 3 1.95

2 1 B100 10 14.95

2 2 B120 3 1.95

The statements could be repeated over and over to add more data, or additional statements
can be executed against the PERSON_TABLE to generate more PERSON_TABLE data.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 75

EXPLAIN AND PREDICTIVE ANALYSIS

Recursive SQL (DB2 V8, DB2 9) is an extremely useful way to generate test data. Take a look at
the following simple recursive SQL statement:

WITH TEMP(N) AS

(SELECT 1

FROM SYSIBM.SYSDUMMY1

UNION ALL

SELECT N+1

FROM TEMP

WHERE N < 10)

SELECT N

FROM TEMP

This statement generates the numbers 1 through 10, one row each. We can use the power of
recursive SQL to generate mass quantities of data that can then be inserted into DB2 tables,
and ready for testing. The following is a piece of a SQL statement that was used to insert
300,000 rows of data into a large test lookup table. The table was quickly populated with data,
and a test conducted to determine the performance. It was quickly determined that the
performance of this large lookup table would not be adequate, but that couldn’t have been
known for sure without testing:

76 ca.com
EXPLAIN AND PREDICTIVE ANALYSIS

WITH LASTPOS (KEYVAL) AS

(VALUES (0)

UNION ALL

SELECT KEYVAL + 1

FROM LASTPOS

WHERE KEYVAL < 9)

,STALETBL (STALE_IND) AS

(VALUES 'S', 'F')

SELECT STALE_IND, KEYVAL

,CASE STALE_IND WHEN 'S' THEN

CASE KEYVAL WHEN 0 THEN 1

WHEN 1 THEN 2 WHEN 2 THEN 3

WHEN 3 THEN 4 WHEN 4 THEN 4

WHEN 5 THEN 6 WHEN 6 THEN 7

WHEN 7 THEN 8 WHEN 8 THEN 9

WHEN 9 THEN 10 END

WHEN 'F' THEN

CASE KEYVAL WHEN 0 THEN 11

WHEN 1 THEN 12 WHEN 2 THEN 13

WHEN 3 THEN 14 WHEN 4 THEN 15

WHEN 5 THEN 16 WHEN 6 THEN 17

WHEN 7 THEN 18 WHEN 8 THEN 19

WHEN 9 THEN 20 END

END AS PART_NUM

FROM LASTPOS INNER JOIN

STALETBL ON 1=1;

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 77

EXPLAIN AND PREDICTIVE ANALYSIS

Generating Statements
Just as data can be generated so can statements. You can write SQL statements that generate
statements. Say, for example, that you needed to generate singleton select statements against
the EMP table to test a possible application process or scenario. You could possibly write a
statement such as this to generate those statements:

SELECT 'SELECT LASTNAME, FIRSTNME ' CONCAT

'FROM EMP WHERE EMPNO = ''' CONCAT

EMPNO CONCAT ''';'

FROM SUSAN.EMP

WHERE WORKDEPT IN ('C01', 'E11')

AND RAND() < .33

The Above query will generate SELECT statements for approximately 33% of the employees in
departments “C01” and “E01”. The output would look something like this:

----------------------------------------------------------

SELECT LASTNAME, FIRSTNME FROM EMP WHERE EMPNO = '000030';

SELECT LASTNAME, FIRSTNME FROM EMP WHERE EMPNO = '000130';

SELECT LASTNAME, FIRSTNME FROM EMP WHERE EMPNO = '200310';

78 ca.com
EXPLAIN AND PREDICTIVE ANALYSIS

You could also use recursive SQL statements to generate statements. The following statement
was used during testing of high performance INSERTs to an account history table. The
following statement generated 50,000 random insert statements:

WITH GENDATA (ACCT_ID, HIST_EFF_DTE, ORDERVAL) AS

(VALUES (CAST(2 AS DEC(11,0)), CAST('2003-02-01' AS DATE), CAST(1 AS

FLOAT))

UNION ALL

SELECT ACCT_ID + 5, HIST_EFF_DTE, RAND()

FROM GENDATA

WHERE ACCT_ID < 249997)

SELECT 'INSERT INTO YLA.ACCT_HIST (ACCT_ID, HIST_EFF_DTE)' CONCAT

' VALUES(' CONCAT CHAR(ACCT_ID) CONCAT ',' CONCAT ''''

CONCAT CHAR(HIST_EFF_DTE,ISO) CONCAT '''' CONCAT ');'

FROM GENDATA ORDER BY ORDERVAL;

Determining Your Access Patterns

If you have access to input data or input transactions then it would be wise to build small
sample test programs to determine the impact of these inputs on the database design. This
would involve simple REXX or COBOL programs that contain little or no business logic, but
instead just simple queries that simulated the anticipated database access. Running these
programs could then give you ideas as to the impact of random versus sequential processing,
or the impact of sequential processing versus skip sequential processing. It could also give
you an idea of how buffer settings could impact performance, as well as whether or not
performance enhancers such as sequential detection or index lookaside will be effective.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 79

EXPLAIN AND PREDICTIVE ANALYSIS

Simulating Access with Tests

Sitting around in a room and debating access paths and potential performance issues is a
waste of time. Each potential disagreement among DBAs and application developers should be
tested, the evidence analyzed, and a decision made. Write a simple COBOL program to try
access to a table to determine what the proper clustering is, how effective compression will be
for performance, whether or not one index performs over another, and whether or not adding
an index will adversely impact INSERTs and DELETEs.

In one situation it was debated as to whether or not an entire application interface should
utilize large joins between parent and child tables, or that all access should be via individual
table access (programmatic joins) for the greatest in flexibility. Coding for both types of access
would be extra programming effort, but what is the cost of the programmatic joins for this
application? Two simple COBOL programs were coded against a test database; one with a two
table programmatic join, and the other with the equivalent SQL join. It was determined that the
SQL join consumed 30% less CPU than the programmatic join.

80 ca.com
CHAPTER 6

Monitoring

It’s critical to design for performance when building applications, databases, and SQL
statements. You’ve designed the correct SQL, avoided programmatic joins, clustered commonly
accessed table in the same sequence, and have avoided inefficient repeat processing. Now,
your application is in production and is running fine. Is there more you can save? Most
certainly!

Which statement is the most expensive? Is it the tablespace scan that runs once per day, or
the matching index scan running millions of times per day? Are all your SQL statements sub-
second responders, and so you don’t need tuning? What is the number one statement in terms
of CPU consumption? All of these questions can be answered by monitoring your DB2
subsystems, and the applications accessing them.

DB2 provides facilities for monitoring the behavior of the subsystem, as well as the applications
that are connected to them. This is primarily via the DB2 trace facility. DB2 has several different
types of traces. In this chapter we’ll discuss the traces that are important for monitoring
performance, as well as how to use them effectively for proactive performance tuning.

DB2 Traces
DB2 provides a trace facility to help track and record events within a DB2 subsystem. There are
six types of traces:
• Statistics
• Accounting
• Audit
• Performance
• Monitor
• Global

This chapter will cover the statistics, accounting and performance traces as they apply to
performance monitoring. These traces should play an integral part in your performance
monitoring process.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 81

MONITORING

Statistics Trace
The data collected in the statistics trace allows you to conduct DB2 capacity planning and to
tune the entire set of DB2 programs. The statistics trace reports information about how much
the DB2 system services and database services are used. It is a system wide trace and should
not be used for charge-back accounting. Statistics trace classes 1, 3, 4, 5, and 6 are the default
classes for the statistics trace if statistics is specified yes in installation panel DSNTIPN. If the
statistics trace is started using the START TRACE command, then class 1 is the default class.

The statistics trace can collect information about the number of threads connected, the
amount of SQL statements executed, and amount of storage consumed within the database
manager address space, deadlocks, timeouts, logging, buffer pool utilization, and much more.
This information is collected at regular intervals for an entire DB2 subsystem. The interval is
typically 10 or 15 minutes per record.

Accounting Trace
The accounting trace provides data that allows you to assign DB2 costs to individual
authorization IDs and to tune individual programs. The DB2 accounting trace provides
information related to application programs, including such things as:
• Start and stop times
• Number of commits and aborts
• The number of times certain SQL statements are issued
• Number of buffer pool requests
• Counts of certain locking events
• Processor resources consumed
• Thread wait times for various events
• RID pool processing
• Distributed processing
• Resource limit facility statistics

Accounting times are usually the prime indicator of performance problems, and most often
should be the starting point for analysis. DB2 times are classified as follows:
• Class 1: This time shows the time the application spent since connecting to DB2, including
time spent outside DB2.
• Class 2: This shows the elapsed time spent in DB2. It is divided into CPU time and
waiting time.
• Class 3: This elapsed time is divided into various waits, such as the duration of suspensions
due to waits for locks and latches or waits for I/O.

82 ca.com
MONITORING

DB2 trace begins collecting this data at successful thread allocations to DB2 and writes a
completed record when the thread terminates or in some cases when the authorization ID
changes. Having the accounting trace active is critical for proper performance monitoring,
analysis, and tuning. When an application connects to DB2 it is executing across address
spaces, and the DB2 address spaces are shared by perhaps thousands of users across many
address spaces. The accounting trace provides information about the time spent within DB2,
as well as the overall application time. Class 2 time is a component of class 1 time, and class 3
time a component of class 2 time.

Accounting data for class 1 (the default) is accumulated by several DB2 components during
normal execution. This data is then collected at the end of the accounting period; it does not
involve as much overhead as individual event tracing. On the other hand, when you start class
2, 3, 7, or 8, many additional trace points are activated. Every occurrence of these events is
traced internally by DB2 trace, but these traces are not written to any external destination.
Rather, the accounting facility uses these traces to compute the additional total statistics that
appear in the accounting record when class 2 or class 3 is activated. Accounting class 1 must
be active to externalize the information.

We recommend you set accounting classes 1,2,3,7,8. Be aware that this can add between 4%
and 5% of your overall system CPU consumption. However, if you are already writing account-
ing classes 1,2,3, then adding 7 and 8 typically should not add much overhead. Also, if you are
using an online performance monitor then it could already have these classes started. If that is
the case then adding SMF as a destination for these classes should not add any CPU overhead.

Performance Trace
The performance trace provides information about a variety of DB2 events, including events
related to distributed data processing. You can use this information to further identify a
suspected problem or to tune DB2 programs and resources for individual users or for DB2 as a
whole. To start a performance trace, you must use the –START TRACE(PERFM) command.
Performance traces cannot be automatically started. Performance traces are expensive to run,
and consume a lot of CPU. They also collect a very large volume of information. Performance
traces are usually run via an online monitor tool, or the output from the performance trace can
be sent to SMF and then analyzed using a monitor reporting tool, or sent to IBM for analysis.

Because performance traces can consume a lot of resources and generate a lot of data, there
are a lot of options when starting the trace to balance the information desired with the
resources consumed. This includes limited the trace data collected by plan, package, trace
class, and even IFCID.

Performance traces are typically utilized by online monitor tools to track a specific problem for
a given plan or package. Reports can then be produced by the monitor software, and can detail
SQL performance, locking, and many other detailed activities. Performance trace data can also
be written to SMF records, and batch reporting tools can read those records to produce very
detailed information about the execution of SQL statements in the application.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 83

MONITORING

Accounting and Statistics Reports

Once DB2 trace information has been collected you can create reports by reading the SMF
records. These reports are typically produced by running reporting software provided by a
performance reporting product. These reports are a critical part of performance analysis and
tuning. You should get familiar with the statistics and accounting reports. They are the best
gauge as to the health of a DB2 subsystem, and the applications using it. It is also the best way
to monitor performance trends, and to proactively detect potential problems before they
become critical.

Statistics Report
Since statistics records are collected typically at 10 minute or 15 minute intervals quite a few
records can be collected on a daily basis. Your reporting software should be able to produce
either summary reports, which can gather and summarize the data for a period of time, or
detail reports, which can report on every statistics interval. Start with a daily summary report,
and look for specific problems within the DB2 subsystem. Once you detect a problem then
you can produce a detailed report to determine the specific period of time that the problem
occurred, and also coordinate the investigation with detailed accounting reports for the same
time period in an effort to attribute the problem to a specific application or process. Some of
the things to look for in a statistics report:
• RID Pool Failures There should be a section of the statistics report that reports the usage
of the RID pool for things such as list prefetch, multiple index access, and hybrid joins. The
report will also indicate RID failures. There can be RDS failures, DM failures, and failures
due to insufficient size. If you are getting failures due to insufficient storage you can increase
the RID pool size. However, if you are getting RDS or DM failures in the RID pool then there
is a good chance that the access path selected is reverting to a table space scan. In these
situations it is important to determine which applications are getting these RID failures.
Therefore, you need to produce a detailed statistics report that can identify the time of the
failures, and also produce detailed accounting reports that will show which threads are
getting the failures. Further investigation will have to be performed to determine the packages
within the plan, and DB2 EXPLAIN can be used to determine which statements are using list
prefetch, hybrid join, or multi-index access. You may have to test the queries to determine if
they are indeed the one’s getting these failures, and if they are you’ll have to try to influence
the optimizer to change the access path (see Chapter 3 for SQL tuning).
• Bufferpool Issues One of the most valuable pieces of information coming out of the
statistics report is the section covering buffer utilization and performance. For each buffer
pool in use the report will include the size of the pool, sequential and random getpages,
prefetch operations, pages written, and number of sequential I/O’s, random I/O’s, and write
I/O’s, plus much more. Also reported are the number of times certain buffer thresholds have
been hit. One of the things to watch for are the number of synchronous reads for sequential
access, which may be an indication that the number of pages is too small and pages for a
sequential prefetch are stolen before they are used. Another thing to watch is whether or not
any critical thresholds are reached, if there are write engines not available, and whether or
not deferred write thresholds are triggering. It’s also important to monitor the number of
getpages per synchronous I/O, as well as the buffer hit ratio. Please see Chapter 8 for
information about subsystem tuning.

84 ca.com
MONITORING

• Logging Problems The statistics report will give important information about logging. This
includes the number of system checkpoints, number of reads from the log buffer, active log
data sets, or archived log data sets, number of unavailable output buffers, and total log
writes. This could give you an indication as to whether or not you need to increase log buffer
sizes, or investigate frequent application rollbacks or other activities that could cause
excessive log reads. Please see Chapter 8 for information about subsystem tuning.
• EDM Pool Hit Ratio The statistics report will show how often database objects such as
DBD’s, cursor tables, and package tables are requested as well as how often those requests
have to be satisfied via a disk read to one of the directory tables. You can use this to
determine if the EDM pool size needs to be increased. You also get statistics about the use
of dynamic statement cache and the number of times statement access paths were reused
in the dynamic statement cache. This could give you a good indication about the size of
your cache, and its effectiveness, but it could also give you an indication of the potential
reusability of the statements in the cache. Please see Chapter 8 for more information about
the EDM pool, and Chapter 3 for information about tuning dynamic SQL.
• Deadlocks and Timeouts The statistics report will give you a subsystem wide perspective
on the number of deadlocks and timeouts your applications have experienced. You can use
this as an overall method of detecting deadlocks and timeouts across all applications. If the
statistics summary report shows a positive count you can use the detailed report to find out
at what time the problems are occurring. You can also use accounting reports to determine
which applications are experiencing the problem.

This has only been a sample of the fields on the statistics report, and the valuable information
they provide. You should be using your statistics reports on a regular basis, and using your
monitoring software documentation, along with the DB2 Administration Guide (DB2 V7,
DB2 V8) or DB2 Performance Monitoring and Tuning Guide (DB2 9) to interpret the
information provided.

Accounting Report
An accounting report will read the SMF accounting records to produce thread or application
level information from the accounting trace. These reports typically can summarize information
at the level of a plan, package, correlation id, authorization id, and more. In addition, you can
also have the option to produce one report per thread. This accounting detail report can give
very detailed performance information for the execution of a specific application process. If you
have accounting classes 1,2,3,7, and 8 turned on, then the information will be reported at both
the plan and package level.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 85

MONITORING

You can use the accounting report to find specific problems within certain applications,
programs, or threads. Some of the things to look for in an accounting report include:
• Class 1 and Class 2 Timings Class 1 times (elapsed and CPU) include the entire application
time, including the time spent within DB2. Class 2 is a component of class 1, and represents
the amount of time the application spent in DB2. The first question to ask when an
application is experiencing a performance problem is “where is the time being spent?” The
first indication of the performance issue being DB2 will be a high class 2 time relative to the
class 1 time. Within class 2 you could be having a CPU issue (CPU time represents the
majority of class 2 time), or a wait issue (CPU represents very little of the overall class 2
time, but class 3 wait represents most of the time), or maybe your entire system is CPU
bound (Class 2 overall elapsed is not reflected in class 2 CPU and class 3 wait time
combined).
• Buffer Usage The accounting report contains buffer usage at the thread level. This
information can be used to determine how a specific application or process is using the
buffer pools. If you have situations in which certain buffers have high random getpage counts
you may want to look at which applications are causing those high number of random
getpages. You can use this thread level information to determine which applications are
accessing buffers randomly versus sequentially. Then perhaps you can see which objects the
application uses, and use this information to separate sequentially accessed objects from
randomly accessed objects into different buffer pools (see Chapter 8 on subsystem tuning).
The buffer pool information in the accounting report will also indicate just how well the
application is utilizing the buffers. The report can be used during buffer pool tuning to
determine the impact of buffer changes on an application.
• Package Execution Times If accounting classes 7 and 8 are turned on then the account
report will show information about the DB2 processing on a package level. This information
is very important for performance tuning because it allows you to determine which programs
in a poorly performing application should be reviewed first.
• Deadlocks, Timeouts, and Lock Waits The accounting report includes information about the
number of deadlocks and timeouts that occurred on a thread level. It also reports the time
the thread spent waiting on locks. This will give you a good indication as to whether or not
you need to do additional investigation into applications that are having locking issues.
• Excessive Synchronous I/O’s Do you have a slow running job or online process? Exactly
what is slow about that job or process? The accounting report will tell you if there are a large
number of excessive random synchronous I/O’s being issued, and how much time the
application spends waiting on I/O. The information in the report can also be used to
approximately determine the efficiency of your DASD by simply dividing the number of
synchronous I/O’s into the total synchronous I/O wait time.
• RID Failures The accounting report does give thread level RID pool failure information. This
is important in determining if you have access path problems in a specific application.

86 ca.com
MONITORING

• High Getpage Counts and High CPU Time Often it is hard to determine if an application is
doing repeat processing when there are not a lot of I/O’s being issued. You should use the
accounting report to determine if your performance problem is related to an inefficient repeat
process. If the report shows a very high getpage count, or that the majority of elapsed time is
actually class 2 CPU time, then that may be an indication of an inefficient repeat process in
one of the programs for the plan. You can use the package level information to determine
which program uses the most CPU, and try to identify any inefficiencies in that program.

Online Performance Monitors

DB2 can write accounting, statistics, monitor, and performance trace data to line buffers. This
allows for online performance reporting software to read those buffers, and report on DB2
subsystem and application performance in real time. These online monitors can be used to
review subsystem activity and statistics, thread level performance information, deadlocks and
timeouts, I/O activity, and dynamically execute and report on performance traces.

Overall Application Performance Monitoring

As much as we hate to hear it, there’s no silver bullet for improving overall large system
performance. We can tune the DB2 subsystem parameters, logging and storage, but often
we’re only accommodating a bad situation. Here the "80/20" rule applies; you'll eventually
have to address application performance.

There’s a way to quickly assess application performance and identify the significant offenders
and SQL statements causing problems. You can quickly identify the "low-hanging fruit," report
on it to your boss, and change the application or database to support a more efficient path to
the data. Management support is a must, and an effective manner of communicating
performance tuning opportunities and results is crucial.

Setting the Appropriate Accounting Traces

DB2 accounting traces will play a valuable role in reporting on application performance tuning
opportunities. DB2 accounting traces 1, 2, 3, 7, and 8 must be set to monitor performance at
the package level. Once you do that, you can further examine the most expensive programs
(identified by package) to look for tuning changes. This reporting process can serve as a quick
solution to identifying an application performance problem, but can be incorporated into a
long-term solution that identifies problems and tracks changes.

There’s been some concern about the performance impact of this level of DB2 accounting.
The IBM DB2 Administration Guide (DB2 V7, DB2 V8) or the DB2 Performance Monitoring
and Tuning Guide (DB2 9) states that the performance impact of setting these traces is
minimal and the benefits can be substantial. Tests performed at a customer site demonstrated
an overall system impact of 4.3 percent for all DB2 activity when accounting classes 1, 2, 3, 7,
and 8 are started. In addition, adding accounting classes 7 and 8 when 1, 2, and 3 are already
started has nominal impact, as does the addition of most other performance monitor
equivalent traces (i.e. your online monitor software).

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 87

MONITORING

Summarizing Accounting Data

To effectively communicate application performance information to management, the
accounting data will must be organized and summarized up to the application level. You need a
reporting tool that formats DB2 accounting traces from System Management Facilities (SMF)
files to produce the type of report you’re interested in. Most reporting tools can produce DB2
accounting reports at a package summary level. Some can even produce customized reports
that can filter only the information you want out of the wealth of information in trace records.

You can process whatever types of reports you produce so that a concentrated amount of
information about DB2 application performance can be extracted. This information is reduced
to the amount of elapsed time and CPU time the application consumes daily and the number
of SQL statements each package issues daily. This highly specific information will be your first
clue as to which packages provide the best DB2 tuning opportunity. The following example is
from a package level report with the areas of interest highlighted (Package Name, Total DB2
Elapsed, Total SQL Count, Total DB2 TCB):

- AVERAGE ---- -- TOTAL ----

PACKAGE EXECUTIONS HHH:MM:SS.TTT HHH:MM:SS.TTT
------------------ DB2 TCB...... 0.008 25:18.567
COLL ID CPG2SU01 I/O.......... 0.205 010:28:50.335
PROGRAM FHPRDA2 LOCK/LATCH... 0.000 43.652
OTHER RD I/O. 0.000 1:29.015
HHH:MM:SS.TTT OTHER WR I/O. 0.000 0.000
AVG DB2 ELAPSED 0.214 DB2 SERVICES. 0.000 0.001
TOTAL DB2 ELAPSED 010:55:40.211 LOG QUIESCE.. 0.000 0.000
DRAIN LOCK... 0.000 0.000
AVG SQL COUNT 86.9 CLAIM RELEASE 0.000 0.000
TOTAL SQL COUNT 15948446 ARCH LOG READ 0.000 0.000
AVG STORPROC EXECUTED 0.0 PG LATCH CONT 0.000 0.000
TOT STORPROC EXECUTED 0 WT DS MSGS 0.000 0.000
AVG UDFS SCHEDULED 0.0 WT GLBL CONT 0.000 0.000
TOT UDFS SCHEDULED 0 GLB CHLD L-LK 0.000 0.000
GLB OTHR L-LK 0.000 0.000
GLB PSET P-LK 0.000 0.000
GLB PAGE P-LK 0.000 0.000
GLB OTHR P-LK 0.000 0.000
OTHER DB2.... 0.000 58.196

88 ca.com
MONITORING

If you lack access to a reporting tool that can filter out just the pieces of information desired,
you can write a simple program in any language to read the standard accounting reports and
pull out the information you need. REXX is an excellent programming language well-suited to
this type of "report scraping," and you can write a REXX program to do such work in a few
hours. You could write a slightly more sophisticated program to read the SMF data directly to
produce similar summary information if you wish to avoid dependency on the reporting
software. Once the standard reports are processed and summarized, all the information for a
specific interval (say one day) can appear in a simple spreadsheet. You can sort the
spreadsheet by CPU descending. With high consumers at the top of the report, the low
hanging fruit is easy to spot. The following spreadsheet can be derived by extracting the fields
of interest from a package level summary report:

Package Executions Total Elapsed Total CPU Total SQL Elaps/Execution CPU/Execution Elapsed/SQL CPU/SQL
ACCT001 246745 75694.2992 5187.4262 1881908 0.3067 0.021 0.0402 0.0027
ACCT002 613316 26277.2022 4381.7926 1310374 0.0428 0.0071 0.02 0.0033
ACCTB01 8833 4654.4292 2723.1485 531455 0.5269 0.3082 0.0087 0.0051
RPTS001 93 6998.7605 2491.9989 5762 75.2554 26.7956 1.2146 0.4324
ACCT003 169236 33439.2804 2198.0959 1124463 0.1975 0.0129 0.0297 0.0019
RPTS002 2686 2648.3583 2130.2409 2686 0.9859 0.793 0.9859 0.793
HRPK001 281 4603.1262 2017.7179 59048 16.3812 7.1804 0.0779 0.0341
HRPKB01 21846 3633.5143 2006.6083 316746 0.1663 0.0918 0.0114 0.0063
HRBKB01 505 2079.5351 1653.5773 5776 4.1178 3.2744 0.36 0.2862
CUSTB01 1 4653.9935 1416.6254 7591111 4653.9935 1416.6254 0.0006 0.0001
CUSTB02 1 3862.1498 1399.9468 7971317 3862.1498 1399.9468 0.0004 0.0001
CUST001 246670 12636.0232 1249.7678 635911 0.0512 0.005 0.0198 0.0019
CUSTB03 280 24171.1267 1191.0164 765906 86.3254 4.2536 0.0315 0.0015
RPTS003 1 5163.3568 884.0148 1456541 5163.3568 884.0148 0.0035 0.0006
CUST002 47923 10796.5509 875.252 489288 0.2252 0.0182 0.022 0.0017
CUST003 68628 3428.4162 739.4523 558436 0.0499 0.0107 0.0061 0.0013
CUSTB04 2 1183.2068 716.2694 3916502 591.6034 358.1347 0.0003 0.0001
CUSTB05 563 1232.2111 713.9306 1001 2.1886 1.268 1.2309 0.7132

Look for some simple things to choose the first programs to address. For example, package
ACCT001 consumes the most CPU per day, and issues nearly 2 million SQL statements.
Although the CPU consumed per statement on average is low, the sheer quantity of
statements issued indicates an opportunity to save significant resources. If just a tiny amount
of CPU can be saved, it will quickly add up. The same applies to package ACCT002 and
packages RPTS001 and RPTS002. These are some of the highest consumers of CPU and they
also have a relatively high average CPU per SQL statement. This indicates there may be some
inefficient SQL statements involved. Since the programs consume significant CPU per day,
tuning these inefficient statements could yield significant savings.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 89

MONITORING

ACCT001, ACCT002, RPTS001, and RPTS002 represent the best opportunities for saving CPU,
so examine those first. Without this type of summarized reporting, it’s difficult to do any sort of
truly productive tuning. Most DBAs and systems programmers who lack these reports and look
only at online monitors or plan table information are really just shooting in the dark.

Reporting to Management
To do this type of tuning, you need buy-in from management and application developers. This
can sometimes be the most difficult part because, unfortunately, most application tuning
involves costly changes to programs. One way to demonstrate the potential ROI for program-
ming time is to report the cost of application performance problems in terms of dollars. This is
easy and amazingly effective!

The summarized reports can report on information on the application level. An in-house
naming standard can be used to combine all the performance information from various
packages into application-level summaries. This lets you classify applications and address
the ones that use the most resources.

For example, if the in-house accounting application has a program naming standard where all
program names begin with "ACCT," then the corresponding DB2 package accounting infor-
mation can be grouped by this header. Thus, the DB2 accounting report data for programs
ACCT001, ACCT002, and ACCT003 can be grouped together, and their accounting information
summarized to represent the "ACCT" application.

Most capacity planners have formulas for converting CPU time into dollars. If you get this
formula from the capacity planner, and categorize the package information by application, you
can easily turn your daily package summary report into an annual CPU cost per application.
The following shows a simple chart developed using an in-house naming standard and a CPU-
to-dollars formula. Give this report to the big guy and watch his head spin! This is a really great
tool for getting those resources allocated to get the job done.

ANNUAL CPU COST

90 ca.com
MONITORING

Make sure you produce a "cost reduction" report, in dollars, once a phase of the tuning has
completed. This makes it perfectly clear to management what the tuning efforts have
accomplished and gives incentive for further tuning efforts. Consider providing a visual
representation of your data. A bar chart with before and after results can be highly effective in
conveying performance tuning impact.

Finding the Statements of Interest

Once you’ve found the highest consuming packages and obtained management approval to
tune them, you need additional analysis of the programs that present the best opportunity.
Ask questions such as:
• Which DB2 statements are executing and how often?
• How much CPU time is required?
• Is there logic in the program that’s resulting in an excessive number of unnecessary
database calls?
• Are there SQL statements with relatively poor access paths?

Involve managers and developers in your investigation. It's much easier to tune with a team
approach where different team members can be responsible for different analysis.

There are many ways to gather statement-level information:

• Get program source listings and plan table information
• Watch an online monitor
• Run a short performance trace against the application of interest

Performance traces are expensive, sometimes adding as much as 20 percent to the overall
CPU costs. However, a short-term performance trace may be an effective tool for gathering
information on frequent SQL statements and their true costs.

If plan table information isn’t available for the targeted package, then you can rebind that
package with EXPLAIN(YES). If it's hard to get the outage to rebind EXPLAIN(YES) or a plan
table is available for a different owner id, you could also copy the package with EXPLAIN(YES)
(for example execute a BIND into a special/dummy collection-id) rather than rebinding it.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 91

MONITORING

The following example shows PLAN_TABLE data for two of the most expensive programs in
our example.

Q_QB_PL PNAME MT TNAME T_NO AT MC ACC_NM IX NJ_CUJOG PF

000640-01-01 ACCT001 0 PERSON_ACCT 1 I 1 AAIXPAC0 N NNNNN S

000640-01-02 ACCT001 3 0 0 N NNNYN

001029-01-01 RPTS001 0 PERSON 1 I 1 AAIXPRC0 N NNNNN S

001029-01-02 RPTS001 4 ACCOUNT 2 I 1 CCIXACC0 N NNNNN L

001029-01-03 RPTS001 1 ACCT_ORDER 3 I 2 CCIXAOC0 N NNNNN

Here, our most expensive program issues a simple SQL statement with matching index access
to the PERSON_ACCT table, and it orders the result, which results in a sort (Method=3). The
programmer, when consulted, advised that the query rarely returns more than a single row of
data. In this case, a bubble sort in the application program replaced the DB2 sort. The bubble
sort algorithm was almost never used because the query rarely returned more than one row,
and the CPU associated with DB2 sort initialization was avoided. Since this query was
executing many thousands of times per day, the CPU savings were substantial.

Our high-consuming reporting program is performing a hybrid join (Method=4). While we

don't frown on hybrid joins, our system statistics were showing us that there were Relational
Data Server (RDS) subcomponent failures in the subsystem. We wondered whether this query
was causing an RDS failure, and reverted to a tablespace scan. This turned out to be true. We
tuned the statement to use nested loop over hybrid join, the RDS failures and subsequent
tablespace scan were avoided, and CPU and elapsed time improved dramatically.

While these statements may have not caught someone's eye just by looking at EXPLAIN
results, when combined with the accounting data, they screamed for further investigation.

The Tools You Need

The application performance summary report identifies applications of interest and provides
the initial tool to report to management. You can also use these other tools, or pieces of
documentation, to identify and tune SQL statements in your most expensive packages:
• Package Report A list of all packages in your system sorted by the most expensive first
(usually by CPU). This is your starting point for your application tuning exercise. You use this
report to identify the most expensive applications or programs. You also use the report to
sell the idea of tuning to management and to drill down to more detailed levels in the other
reports.

92 ca.com
MONITORING

• Trace or Monitor Report You can run a performance trace or watch your online monitor for
the packages identified as high consumers in your package report. This type of monitoring
will help to drill down to the high-consuming SQL statements within these packages.
• Plan Table Report Run extractions of plan table information for the high-consuming programs
identified in your package report. You may quickly find some bad access paths that can be
tuned quickly. Don't forget to consider the frequency of execution as indicated in the package
report. Even a simple thing such as a small sort may be really expensive if executed often.
• Index Report Produce a report of all indexes on tables in the database of interest. This
report should include the index name, table name, columns names, columns sequence,
cluster ratio, clustering, first key cardinality, and full key cardinality. Use this report when
tuning SQL statements identified in the plan table or trace/monitor report. There may be
indexes you can take advantage of, add or change — or even drop. Indexes not used will
create an overhead for Insert, Delete, Update processing as well as utilities.
• DDL or ERD You're going to need to know about the database. This includes relationships
between tables, column data types, and knowing where data is. An Entity Relationship
Diagram (ERD) is the best tool for this, but if none is available, you can print out the Data
Definition Language (DDL) SQL statements used to create the tables and indexes. If the DDL
isn’t available, you can use a tool such as DB2LOOK (yes, you can use this against a
mainframe database) to generate the DDL.

Don’t overlook the importance of examining the application logic. This has to do primarily with
the quantity of SQL statements being issued. The best performing SQL statement is the one
that is never issued, and it's surprising how often application programs will go to the database
when they don't have to. The program may be executing the world’s best-performing SQL
statements, but if the data isn't used, then they're really poor-performing statements.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 93

Page intentionally left blank
CHAPTER 7

Application Design and Tuning

for Performance

While the majority of performance improvements will be realized via proper application design,
or by tuning the application, nowhere else does “it depends” matter most than when dealing
with applications. This is because the performance design and tuning techniques you apply will
vary depending upon the nature, needs, and characteristics of your application. This chapter
will describe some general recommendations for improving performance in applications.

Issuing Only the Necessary Statements

The best performing statement is the statement that never executes. The first thing that you
have to ask yourself is “is this statement necessary?” or “have I already read that data before?”
One of the major issues with applications today is the quantity of SQL statements issued per
transaction. The most expensive thing you can do is to leave your allied address space (DB2
distributed address space for remote queries) to go to the DB2 address spaces for a SQL
request. You need to avoid this whenever possible.

Caching Data
If you are accessing code tables for validating data, editing data, translating values, or
populating drop-down boxes then those code tables should be locally cached for better
performance. This is particularly true for any code tables that rarely change (if the codes are
frequently changing then perhaps it’s not a code table). For a batch COBOL job for example
read the code tables you’ll use into in-core tables. For larger code tables you can employ a
binary or quick search algorithm to quickly look up values. For CICS applications set up the
code in VSAM files, and use them as CICS data tables. This is much faster than using DB2
to look up the value for every CICS transaction. The VSAM files can be refreshed on regular
intervals via a batch process, or on an as-needed basis. If you are using a remote application,
or Windows based clients you should read the code tables once when the application starts,
and cache the values locally to populate various on screen fields, or to validate the data
input on a screen.

If other situations, and especially for object or service oriented applications you should always
check if an object has already been read in before reading it again. This will avoid blindingly
and constantly rereading the data every time a method is invoked to retrieve the data. If you
are concerned that an object is not “fresh” and that you may be updating old data then you
can employ a concept call optimistic locking. With optimistic locking you don’t have to
constantly reread the object (the table or query to support the object). Instead you read it
once, and when you go to update the object you can check the update timestamp to see if
someone else has updated the object before you. This technique is described further in the
locking section of this chapter.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 95

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Utilizing DB2 for System Generated Keys

One of the ways you can reduce the quantity of SQL issued, as well as a major source of
locking contention in your application is to use DB2 system generated key values to satisfy
whatever requirements you may have for system generated keys.

Traditionally people have used a “next-key” table, which contained one row of one column with
a numeric data type, to generate keys. This typically involved reading the column, incrementing
and updating the column, and then using the new value as a key to another table. These next-
key tables typically wound up being a huge application bottleneck.

There are several ways to generate key values inside DB2, two of which are identity columns
(DB2 V7, DB2 V8, DB2 9) and sequence objects (DB2 V8, DB2 9). Identity columns are
attached to tables, and sequence objects are independent of tables. If you are on DB2 V7 then
your only choice of the two is the identity column. However, there are some limitations to the
changes you can make to identity columns in DB2 V7 so you are best placing them in their own
separate table of one row and one column, and using them as a next key generator table.

The high performance solution for key generation in DB2 is the sequence object. Make sure
that when you use sequence objects (or identity columns) that you utilize the CACHE and
ORDER settings according to your high performance needs. These settings will impact the
number of values that are cached in advance of a request, as well as whether or not the order
the values are returned is important. The settings for a high level of performance in a data
sharing group would be, for example, CACHE 50 NO ORDER.

When using sequence objects or identity columns (or even default values, triggers, ROWIDs),
the GENERATE_UNIQUE function, the RAND function, and more) you can reduce the number
of SQL statements your application issues by utilizing a SELECT from a result table (DB2 V8,
DB2 9). In this case that result table will be the result of an insert. In the following example
we use a sequence object to generate a unique key, and then return that unique key back to
the application:

SELECT ACCT_ID

FROM FINAL TABLE

(INSERT INTO UID1.ACCOUNT (ACCT_ID,NAME, TYPE, BALANCE)

VALUES

(NEXT VALUE FOR ACCT_SEQ,‘Master Card’,‘Credit’,50000))

Check the tips chapter for a cool tip on using the same sequence object for batch and online
key assignment!

96 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Avoiding Programmatic Joins

If you are using a modern generic application design where there is an access module for each
DB2 table, or perhaps you are using an object oriented or service oriented design, then you are
using a form of programmatic joins. In almost every single situation in which a SQL join can be
coded instead of a programmatic join, then the SQL join will outperform the programmatic join.

There is a certain set of trade-offs that have to be considered here:

• Flexible program and table access design versus improved performance for SQL joins
• The extra programming time for multiple SQL statements to perform single table
access when required or joins when required versus the diminished performance of
programmatic joins
• The extra programming required to add control breaks in the fetch loop versus the
diminished performance of programmatic joins

DB2 does provide a feature known as an INSTEAD OF trigger (DB2 9) that allows somewhat of
a mapping between the object world and multiple tables in a database. You can create a view
that joins two tables that are commonly accessed together. Then the object based application
can treat the view as a table. Since the view is on a join, it is a read-only view. However, you
can define an INSTEAD OF trigger on that view to allow INSERTs, UPDATEs, and DELETEs
against the view. The INSTEAD OF trigger can be coded to perform the necessary changes to
the tables of the join using the transition variables from the view.

In our tests of a simple two table SQL join versus the equivalent programmatic join (FETCH
loop within a FETCH loop in a COBOL program), the two table SQL join used 30% less CPU
than the programmatic join.

Multi-Row Operations
In an effort to reduce the quantity of SQL statements the application is issuing DB2 provides
for some multi-row operations. This includes:
• Multi-row fetch (DB2 V8, DB2 9)
• Multi-row insert (DB2 V8, DB2 9)
• MERGE statement (DB2 9)

These are in addition to the possibility of doing a mass INSERT, UPDATE, or DELETE operation.

MULTI-ROW FETCH Multi-row fetching gives us the opportunity to return multiple rows (up to
32,767) in a single API call with a potential CPU performance improvement somewhere around
50%. It works for static or dynamic SQL, and scrollable or non-scrollable cursors. There is also
support for positioned UPDATEs and DELETEs. The sample programs DSNTEP4 (which is
DSNTEP2 with multi-row fetch) and DSNTIAUL also can exploit multi-row fetch.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 97

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

There are two reasons to take advantage of the multi-row fetch capability:
1. To reduce the number of statements issued between your program address space
and DB2.
2. To reduce the number of statements issued between DB2 and the DDF address space.

The first way to take advantage of multi-row fetch is to program for it in your application code.
The second way to take advantage of multi-row fetch is in our distributed applications that are
using block fetching. Once in compatibility mode in DB2 V8 the blocks used for block fetching
are built using the multi-row capability without any code change on our part. This results in
great savings for our distributed SQLJ applications. In one situation the observed benefit or this
feature was when a remote SQLJ application migrated from DB2 V7 to DB2 V8 it did not have
a CPU increase.

Coding for a multi-row fetch is quite simple. The basic changes include:
• Adding the phrase “WITH ROWSET POSITIONING” to a cursor declaration
• Adding the phrases “NEXT ROWSET” and “FOR n ROWS” to the FETCH statement
• Changing the host variables to host variable arrays (for COBOL this is as simple as adding
an OCCURS clause)
• Placing a loop within you fetch loop to process the rows

These changes are quite simple, and can have a profound impact on performance. In our tests
of a sequential batch program the use of 50 row fetch (the point of diminishing return for our
test) of 39 million rows of a table reduced CPU consumption by 60% over single-row fetch. In
a completely random test where we expected on average 20 rows per random key, our 20 row
fetch used 25% less CPU than the single-row fetch. Keep in mind, however, that multi-row
fetch is a CPU saver, and not necessarily an elapsed time saver.

When using multi-row fetch, the GET DIAGNOSTICS statement is not necessary, and should
be avoided due to high CPU overhead. Instead use the SQLCODE field of the SQLCA to
determine whether your fetch was successful (SQLCODE 000), if the fetch failed (negative
SQLCODE), or if you hit end of file (SQLCODE 100). If you received an SQLCODE 100 then you
can check the SQLERRD3 field of the SQLCA to determine the number of rows to process.

MULTI-ROW INSERT As with multi-row fetch reading multiple rows per FETCH statement,
a multi-row insert can insert multiple rows into a table in a single INSERT statement. The
INSERT statement simply needs to contain the “FOR n ROWS” clause, and the host variables
referenced in the VALUES clause need to be host variable arrays. IBM states that multi-row
inserts can result in as much as a 25% CPU savings over single row inserts. In addition, multi-
row inserts can have a dramatic impact on the performance of remote applications in that the
number of statements issued across a network can be significantly reduced.

The multi-row insert can be coded as ATOMIC, meaning that if one insert fails then the entire
statement fails, or it can be coded as NOT ATOMIC ON SQLERROR CONTINUE, which means
that any one failure of any of the inserts will only impact that one insert of the set.

98 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

As with the multi-row fetch, the GET DIAGNOSTICS statement is not initially necessary, and
should be avoided for performance reasons unless it needed. In the case of a failed non-atomic
multi-row insert you’ll get a SQLCODE of -253 if one or more of the inserts failed. Only then
should you use GET DIAGNOSTICS to determine which one failed. Remember, if you get a
SQLCODE of zero then all the inserts were a success, and there is no need for additional
analysis.

MERGE STATEMENT Many times applications are interfacing with other applications. In these
situations an application may receive a large quantity of data that applies to multiple rows of
a table. Typically the application would in this case perform a blind update. That is, the
application would simply attempt to update the rows of data in the table, and if any update
failed because a row was not found, then the application would insert the data instead. In other
situations, the application may read all of the existing data, compare that data to the new
incoming data, and then programmatically insert or update the table with the new data.

DB2 9 supports this type of processing via the MERGE statement. The MERGE statement
updates a target (table or view, or the underlying tables of a fullselect) using the specified
input data. Rows in the target that match the input data are updated as specified, and rows
that do not exist in the target are inserted. MERGE can utilize a table or an array of variables
as input.

Since the MERGE operates against multiple rows, it can be coded as ATOMIC or NOT
ATOMIC. The NOT ATOMIC option will allow rows that have been successfully updated or
inserted to remain if others have failed. The GET DIAGNOSITICS statement should be used
along with NOT ATOMIC to determine which updates or inserts have failed.

The following example shows a MERGE of rows on the employee sample table:

MERGE INTO EMP AS EXISING_TBL

USING (VALUES (:EMPNO, :SALARY, :COMM, :BONUS)

FOR :ROW-CNT ROWS) AS INPUT_TBL(EMPNO, SALARY, COMM, BONUS)

ON INPUT_TBL.EMPNO = EXISTING_TBL.EMPNO

WHEN MATCHED THEN

UPDATE SET SALARY = INPUT_TBL.SALARY

,COMM = INPUT_TBL.COMM

,BONUS = INPUT_TBL.BONUS

WHEN NOT MATCHED THEN

INSERT (EMPNO, SALARY, COMM, BONUS)

VALUES (INPUT_TBL.EMPNO, INPUT_TBL.SALARY, INPUT_TBL.COMM,

INPUT_TBL.BONUS)

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 99

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

As with the multi-row insert operation the use of GET DIAGNOSTICS should be limited.

Placing Data Intensive Business Logic in the Database

There are many advanced features of DB2 UDB for z/OS that let you take advantage of the
power of the mainframe server.
• Advanced and Complex SQL Statements
• User-Defined Functions (UDFs)
• Stored Procedures
• Triggers and Constraints

These advanced features allow applications to be written quickly by pushing some of the logic
of the application into the database server. Most of the time advanced functionality can be
incorporated into the database using these features at a much lower development cost than
coding the feature into the application itself. A feature such as database enforced referential
integrity (RI) is a perfect example of something that is quite easy to implement in the
database, but would take significantly longer time to code in a program.

These advanced database features also allow application logic to be placed as part of the
database engine itself, making this logic more easily reusable enterprise wide. Reusing existing
logic will mean faster time to market for new applications that need that logic, and having the
logic centrally located makes it easier to manage than client code. Also, in many cases having
data intensive logic located on the database server will result in improved performance as that
logic can process the data at the server, and only return a result to the client.

Using advanced SQL for performance was addressed in Chapter 3 of this guide, and so let’s
address the other features here.

USER-DEFINED FUNCTIONS Functions are a useful way of extending the programming power
of the database engine. Functions allow us to push additional logic into our SQL statements.
User-Defined scalar functions work on individual values of a parameter list, and return a single
value result. A table function can return an actual table to a SQL statement for further
processing (just like any other table). User-defined functions (UDF) provide a major
breakthrough in database programming technology. UDFs actually allow developers and
DBAs to extend the capabilities of the database. This allows for more processing to be pushed
into the database engine, which in turns allows these types of processes to become more
centralized and controllable. Virtually any type of processing can be placed in a UDF, including
legacy application programs. This can be used to create some absolutely amazing results, as
well as push legacy processing into SQL statements. Once your processing is inside SQL
statements you can put those SQL statements anywhere. So that anywhere you can run your
SQL statements (say, from a web browser) you can run your programs! So, just like complex
SQL statements, UDFs place more logic into the highly portable SQL statements.

100 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Also just like complex SQL, UDFs can be a performance advantage or disadvantage. If the
UDFs process large amounts of data, and return a result to the SQL statement, there may be a
performance advantage over the equivalent client application code. However, if a UDF is used
to process data only then it can be a performance disadvantage, especially if the UDF is
invoked many times or embedded in a table expression, as data type casting (for SQL scalar
UDFs compared to the equivalent expression coded directly in the SQL statement) and task
switch overhead (external UDFs run in a stored procedure address space) are expensive (DB2
V8 relieves some of this overhead for table functions). Converting a legacy program into a UDF
in about a day’s time, invoking that program from a SQL statement, and then placing that SQL
statement where it can be access via a client process may just be worth that expense!

Simply put, if the UDF results in the application program issuing fewer SQL statements, or
getting access to a legacy process then chances are that the UDF is the right decision.

STORED PROCEDURES Stored procedures are becoming more prevalent on the mainframe, and
can be part of a valuable implementation strategy. Stored procedures can be a performance
benefit for distributed applications, or a performance problem. In every good implementation
there are trade-offs. Most of the trade-offs involve sacrificing performance for things like
flexibility, reusability, security, and time to delivery. It is possible to minimize the impact of
distributed application performance with the proper use of stored procedures.

Since stored procedures can be used to encapsulate business logic in a central location on
the mainframe, they offer a great advantage as a source of secured, reusable code. By using a
stored procedure the client will only need to have authority to execute the stored procedures
and will not need authority to the DB2 tables that are accessed from the stored procedures. A
properly implemented stored procedure can help improve availability. Stored procedures can be
stopped, queuing all requestors. A change can be implemented while access is prevented, and
the procedures restarted once the change has been made. If business logic and SQL access is
encapsulated within stored procedures there is less dependency on client or application server
code for business processes. That is, the client takes care of things like display logic and edits,
and the stored procedure contains the business logic. This simplifies the change process, and
makes the code more reusable. In addition, like UDFs stored procedures can be used to access
legacy data stores, and quickly web enable our legacy processes.

The major advantage to stored procedures is when they are implemented in a client/server
application that must issue several remote SQL statements. The network overhead involved in
sending multiple SQL commands and receiving result sets is quite significant, therefore proper
use of stored procedures to accept a request, process that request with encapsulated SQL
statements and business logic, and return a result will lessen the traffic across the network and
reduce the application overhead. If a stored procedure is coded in this manner then it can be a
significant performance improvement. Conversely, if the stored procedures contain only a few
or one SQL statement the advantages of security, availability, and reusability can be realized,
but performance will be worse than the equivalent single statement executions from the client
due to task switch overhead.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 101

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

DB2 9 offers a significant performance improvement for stored procedures with the
introduction of native SQL procedures. These unfenced SQL procedures will execute as run
time structures rather than be converted into external C program procedures (as with DB2 V7
and DB2 V8). Running these native SQL procedures will eliminate the task switch overhead of
executing in the stored procedure address space. This represents a significant performance
improvement for SQL procedures that contain little program logic, and few SQL statements.

TRIGGER AND CONSTRAINTS Triggers and constraints can be used to move application logic
into the database. The greatest advantage to triggers and constraints is that they are generally
data intensive operations, and these types of operations are better performers when placed
close to the data. These features consist of:
• Triggers
• Database Enforced Referential Integrity (RI)
• Table Check Constraints

A trigger is a database object that contains some application logic in the form of SQL
statements that are invoked when data in a DB2 table is changed. These triggers are installed
into the database, and are then dependent upon the table on which they are defined. SQL
DELETE, UPDATE, and INSERT statements can activate triggers. They can be used to replicate
data, enforce certain business rules, and to fabricate data. Database enforced RI can be used to
ensure that relationships from tables are maintained automatically. Child table data cannot be
created unless a parent row exists, and rules can be implemented to tell DB2 to restrict or
cascade deletes to a parent when child data exists. Table check constraints are used to ensure
values of specific table columns, and are invoked during LOAD, insert, and update operations.

Triggers and constraints ease the programming burden because the logic, in the form of SQL
is much easier to code than the equivalent application programming logic. This helps make
the application programs smaller and easier to manage. In addition, since the triggers and
constraints are connected to DB2 tables, then are centrally located rules and universally
enforced. This helps to ensure data integrity across many application processes. Triggers can
also be used to automatically invoke UDFs and stored procedures, which can introduce some
automatic and centrally controlled intense application logic.

There are wonderful advantages to using triggers and constraints. They most certainly provide
for better data integrity, faster application delivery time, and centrally located reusable code.
Since the logic in triggers and constraints is usually data intensive their use typically
outperforms the equivalent application logic simply due to the fact that no data has to be
returned to the application when these automated processes fire. There is one trade-off for
performance, however. When triggers, RI, or check constraints are used in place of application
edits they can be a serious performance disadvantage. This is especially true if several edits on
a data entry screen are verified at the server. It could be as bad as one trip to the server and
back per edit. This would seriously increase message traffic between the client and the server.
For this reason, data edits are best performed at the client when possible.

102 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

If is important to understand that when you are working with triggers you need to respect the
triggers when performance schema migrations, or changes to the triggering tables. The triggers
will, in some situations, have to be recreated in the same sequence they were originally
created. In certain situations trigger execution sequence may be important, and if there are
multiple triggers of the same type against a table then they will be executed in the order they
were defined.

Organizing Your Input

One way to insure the fastest level of performance for large scale batch operations is to make
sure that the input data to the process is organized in a meaningful manner. That is, the data is
sorted according to the cluster of the primary table. Better yet, all of the tables accessed by the
large batch process should have the same cluster as the input. This could mean pre-sorting
data into the proper order prior to processing. If you code a generic transaction processor that
handles both online and batch processing then you could be asking for trouble. If online is truly
random then you can organize the tables and batch input files for the highest level of batch
processing, and it should have little or no impact on the online transactions.

Remember, a locally executing batch process that is processing the input data in the same
sequence as the cluster or your tables, and bound with RELEASE(DEALLOCATE) will utilize
several performance enhancers, especially dynamic prefetch and index lookaside, to
significantly improve the performance of these batch processes.

Searching
Search queries, as well as driving cursors (large queries that provide the input data to a batch
process), can be expensive queries. Here there is, once again, a trade off between the amount
of program code you are willing to write and the performance of your application.

If you code a generic search query, you will get generic performance. In the following example
the SELECT statement basically supports a direct read, a range read, and a restart read in one
statement. In order to enable this type of generic access a generic predicate has to be coded.
In most cases this means that for every SQL statement issued more data will be read than is
needed. This is due to the fact that DB2 has a limited ability to match on these types of
predicates. In the following statement the predicate supports a direct read, sequential read,
and restart read for at least one part of a three part compound key:

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 103

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

WHERE COL1 = WS-COL1-MIN

AND (( COL2 >= WS-COL2-MIN

AND COL2 <= WS-COL2-MAX

AND COL3 >= WS-COL3-MIN

AND COL3 <= WS-COL3-MAX)

( COL2 > WS-COL2-MIN

AND COL2 <= WS-COL2-MAX ))

OR ( COL1 > WS-COL1-MIN

AND COL1 <= WS-COL1-MAX )

These predicates are very flexible, however they are not the best performing. The predicate in
this example most likely results in a non-matching index scan even though an index on COL1,
COL2, COL3 is available. This means that the entire index will have to be searched each time
the query is executed. This is not a bad access path for a batch cursor that is reading an entire
table in a particular order. For any other query, however, it is a detriment. This is especially true
for online queries that are actually providing three columns of data (all min and max values are
equal). For larger tables the CPU and elapsed time consumed can be significant.

The very best solution is two have separate predicates for various numbers of key columns
provided. This will allow DB2 to have matching index access for each combination of key
parts provided.

FIRST KEY ACCESS:

WHERE COL1 = WS-COL1

TWO KEYS PROVIDED:

WHERE COL1 = WS-COL1

AND COL2 = WS-COL2

104 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

THREE KEYS PROVIDED:

WHERE COL1 = WS-COL1

AND COL2 = WS-COL2

This will dramatically increase the number of SQL statements coded within the program, but
will also dramatically increase the statement performance.

If the additional statements are not desired then there is another choice for the generic
predicates. This would involve adding a redundant Boolean term predicate. These Boolean
term predicates will enable DB2 to match on one column of the index. Therefore, for this
WHERE clause:

WHERE COL1 = WS-COL1-MIN

AND (( COL2 >= WS-COL2-MIN

AND COL2 <= WS-COL2-MAX

AND COL3 >= WS-COL3-MIN

AND COL3 <= WS-COL3-MAX)

( COL2 > WS-COL2-MIN

AND COL2 <= WS-COL2-MAX ))

OR ( COL1 > WS-COL1-MIN

AND COL1 <= WS-COL1-MAX )

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 105

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

An additional redundant predicate can be added:

WHERE (COL1 = WS-COL1-MIN

AND (( COL2 >= WS-COL2-MIN

AND COL2 <= WS-COL2-MAX

AND COL3 >= WS-COL3-MIN

AND COL3 <= WS-COL3-MAX)

( COL2 > WS-COL2-MIN

AND COL2 <= WS-COL2-MAX ))

OR ( COL1 > WS-COL1-MIN

AND COL1 <= WS-COL1-MAX ))

AND ( COL1 => WS-COL1-MIN

AND COL1 <= WS-COL1-MAX )

The addition of this redundant predicate does not affect the result of the query, but allows DB2
to match on the COL1 column.

Name searching can be a challenge. Once again we are faced with multiple queries to solve
multiple conditions, or one large generic query to solve any request. In many cases it pays to
study the common input fields for a search, and then code specific queries that match those
columns only, and are supported by an index. Then, the generic query can support the less
frequently searched on fields.

We have choices for coding our search queries. Let’s say that we need to search for two
variations of a name to try and find someone in our database. The following query can be
coded to achieve that (in this case a name reversal):

106 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

SELECT PERSON_ID

FROM PERSON_TBL

WHERE (LASTNAME = ‘RADY’ AND

FIRST_NAME = ‘BOB’) OR

(LASTNAME = ‘BOB’ AND

FIRST_NAME = ‘RADY’);

This query gets the job done, but uses multi-index access at best. Another way you could code
the query is as follows:

SELECT PERSON_ID

FROM PERSON_TBL

WHERE LASTNAME = ‘RADY’

AND FIRST_NAME = ‘BOB’

UNION ALL

SELECT PERSON_ID

FROM PERSON_TBL

WHERE LASTNAME = ‘BOB’

AND FIRST_NAME = ‘RADY’

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 107

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

This query gets better index access, but will probe the table twice. The next query uses a
common table expression (DB2 V8, DB2 9) to build a search list, and then divides that table
into the person table:

WITH PRSN_SEARCH(LASTNAME, FIRST_NAME) AS

(SELECT ‘RADY’, ‘BOB’ FROM SYSIBM.SYSDUMMY1

UNION ALL

SELECT ‘BOB’, ‘RADY’ FROM SYSIBM.SYSDUMMY1)

SELECT PERSON_ID

FROM PERSON_TBL A, PRSN_SEARCH B

WHERE A.LASTNAME = B.LASTNAME

AND A.FIRST_NAME = B.FIRST_NAME

This query gets good index matching and perhaps reduced probes. Finally, the next query
utilizes a during join predicate to probe on the first condition and only apply the second
condition if the first finds nothing. That is, it will only execute the second search if the first finds
nothing and completely avoid the second probe into the table. Keep in mind that this query
may not produce the same results as the previous queries due to the optionality of the search:

SELECT COALESCE(A.PERSON_ID, B.PERSON_ID)

FROM SYSIBM.SYSDUMMY1
LEFT OUTER JOIN

PERSON_TBL A

ON IBMREQD = ‘Y’

AND (A.LASTNAME = ‘RADY’ OR A.LASTNAME IS NULL)

AND (A.FIRST_NAME = ‘BOB’ OR A.FIRST_NAME IS NULL)

LEFT OUTER JOIN

(SELECT PERSON_ID

FROM PERSON_TBL

WHERE LASTNAME = ‘BOB’ AND FIRSTNME = ‘RADY’) AS B

ON A.EMPNO IS NULL

108 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Which search query is the best for your situation? Test and find out! Just keep in mind that
when performance is a concern there are many choices.

Existence Checking
What is the best for existence checking within a query? Is it a join, a correlated subquery, or a
non-correlated subquery? Of course it depends on your situation, but these types of existence
checks are always better resolved in a SQL statement then with separate queries in your
program. Here are the general guidelines:

Non-correlated subqueries, such as this:

SELECT SNAME

FROM S

WHERE S# IN

(SELECT S# FROM SP

WHERE P# = ‘P2’)

Are generally good when there is no available index for inner select but there is on outer table
column (indexable). Or when there is no index on either inner or outer columns. We also like
non-correlated subqueries when there is relatively a small amount of data provided by the
subquery. Keep in mind that DB2 can transform a non-correlated subquery to a join.

Correlated subqueries, such as this:

SELECT SNAME

FROM S

WHERE EXISTS

(SELECT * FROM SP

WHERE SP.P# = ‘P2’

AND SP.S# = S.S#)

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 109

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Are generally good when there is a supporting index available on inner select and there is a
cost benefit in reducing repeated executions to inner table and distinct sort for join. They may
be a benefit as well if the inner query could return a large amount of data if coded as a non-
correlated subquery as long as there is a supporting index for the inner query. Also, the
correlated subquery can outperform the equivalent non-correlated subquery if an index on the
outer table is not used (DB2 chose a different index based upon other predicates) and one
exists in support of the inner table.

Joins, such as this:

SELECT DISTINCT SNAME

FROM S, SP

WHERE S.S# = SP.S#

AND SP.P# = ‘P2’

May be best if supporting indexes are available and most rows hook up in the join. Also, if the
join results in no extra rows returned then the DISTINCT can also be avoided. Joins can provide
DB2 the opportunity to pick the best table access sequence, as well as apply predicate
transitive closure.

Which existence check method is best for your situation? We don’t know, but you have choices
and should try them out! It should also be noted that as of DB2 9 it is possible to code and
ORDER BY and FETCH first in a subquery, which can provide even more options for existence
checking in subqueries!

For singleton existence checks you can code FETCH FIRST and ORDER BY clauses in a
singleton select. This could provide the best existence checking performance in a stand
alone query:

SELECT 1 INTO :hv-check

FROM TABLE

WHERE COL1 = :hv1

FETCH FIRST 1 ROW ONLY

110 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Avoiding Read-Only Cursor Ambiguity and Locking

Lock avoidance was introduced in DB2 to reduce the overhead of always locking everything.
DB2 will check to be sure that a lock is probably necessary for data integrity before acquiring a
lock. Lock avoidance is critical for performance. Its effectiveness is controlled by application
commits. Lock avoidance is also generally used with both isolation levels of cursor stability
(CS) and repeatable read (RR) for referential integrity constraint checks. For a plan or package
bound with RR, a page lock is required for the dependent page if a dependent row is found.
This will be held in order to guarantee repeatability of the error on checks for updating primary
keys for when deleting is restricted.

We need to bind our programs with CURRENTDATA(NO) and ISOLATION(CS) in DB2 V7 in

order to allow for lock avoidance. In DB2 V8 and DB2 9 CURRENTDATA(YES) can also avoid
locks. Although DB2 will consider ambiguous cursors as read-only, it is always best to code
you read-only cursors with the clause FOR FETCH ONLY or FOR READ ONLY.

Lock avoidance also needs frequent commits so that other processes do not have to acquire
locks on updated pages, and this also allows for page reorganization to occur to clear the
“possibly uncommitted” (PUNC) bit flags in a page. Frequent commits allow the commit log
sequence number (CLSN) on a page to be updated more often since it is dependent on the
begin unit of recovery in the log, the oldest begin unit of recovery being required.

The best way to avoid taking locks in your read-only cursors is to read uncommitted. Use the
WITH UR clause in your statements to avoid taking or waiting on locks. Keep in mind however
that using WITH UR can result in the reading of uncommitted, or dirty, data that may eventually
be rolled back. If you are using WITH UR in an application that will update the data, then an
optimistic locking strategy is your best performing option.

Optimistic Locking
With high demands for full database availability, as well as high transaction rates and levels of
concurrency, reducing database locks is always desired. With this in mind, many applications
are employing a technique called “optimistic locking” to achieve these higher levels of
availability and concurrency. This technique traditionally involves reading data with an
uncommitted read or with cursor stability. Update timestamps are maintained in all of the data
tables. This update timestamp is read along with all the other data in a row. When a direct
update is subsequently performed on the row that was selected, the timestamp is used to
verify that no other application or user has changed the data between the point of the read and
the update. This places additional responsibility on the application to use the timestamp on all
updates, but the result is a higher level of DB2 performance and concurrency.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 111

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Here is a hypothetical example of optimistic locking. First the application reads the data from a
table with the intention of subsequently updating:

SELECT UPDATE_TS, DATA1

FROM TABLE1

WHERE KEY1 = :WS-KEY1

WITH UR

Here the data is has been changed and the update takes place.

UPDATE TABLE1

SET DATA1 = :WS-DATA1, UPDATE_TS = :WS-NEW-UPDATE-TS

WHERE KEY1 = :WS-KEY1

AND UPDATE_TS = :WS-UPDATE-TS

If the data has changed then the update will get a SQLCODE of 100, and restart logic will have
to be employed for the update. This requires that all applications respect the optimistic locking
strategy and update timestamp when updating the same table.

As of DB2 9, IBM has introduced built-in support for optimistic locking via the ROW CHANGE
TIMESTAMP. When a table is created or altered, a special column can be created as a row
change timestamp. These timestamp columns will be automatically updated by DB2 whenever
a row of a table is updated. This built-in support for optimistic locking takes some of the
responsibility (that of updating the timestamp) out of the hands of the various applications
that might be updating the data.

112 ca.com
APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Here is how the previous example would look when using the ROW CHANGE TIMESTAMP for
optimistic locking:

SELECT ROW CHANGE TIMESTAMP FOR TABLE1, DATA1

FROM TABLE1

WHERE KEY1 = :WS-KEY1

WITH UR

Here the data has been changed and update takes place.

UPDATE TABLE1

SET DATA1 = :WS-DATA1

WHERE KEY1 = :WS-KEY1

AND ROW CHANGE TIMSTAMP FOR TABLE1 = :WS-UPDATE-TS

Commit Strategies and Heuristic Control Tables

Heuristic control tables are implemented to allow great flexibility in controlling concurrency
and restartability. Our processing has become more complex, our tables have become larger,
and our requirements are now 5 9’s availability (99.999%). In order to manage our
environments and its objects, we need to have dynamic control of everything. How many
different control tables, and what indicator columns we put in them will vary depending on the
objects and the processing requirements.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 113

APPLICATION DESIGN AND TUNING FOR PERFORMANCE

Heuristic control/restart tables have rows unique to each application process to assist in
controlling the commit scope using the number of database updates or time between commits
as their primary focus. There can also be an indicator in the table to tell an application that it is
time to stop at the next commit point. These tables are accessed every time an application
starts a unit-of-recovery (unit-of-work), which would be at process initiation or at a commit
point. The normal process is for an application to read the control table at the very beginning of
the process to get the dynamic parameters and commit time to be used. The table is then used
to store information about the frequency of commits as well as any other dynamic information
that is pertinent to the application process, such as the unavailability of some particular
resource for a period of time. Once the program is running, it both updates and reads from the
control table at commit time. Information about the status of all processing at the time of the
commit is generally stored in the table so that a restart can occur at that point if required.

Values in these tables can be changed either through SQL in a program or by a production
control specialist to be able to dynamically account for the differences in processes through
time. For example, you would probably want to change the commit scope of a job that is
running during the on-line day vs. when it is running during the evening. You can also set
indicators to tell an application to gracefully shut down, run different queries with different
access paths due to a resource being taken down, or go to sleep for a period of time.

114 ca.com
CHAPTER 8

Tuning Subsystems

There are several areas in the DB2 subsystem that you can examine for performance
improvements. These areas are components of DB2 that aid in application processing. This
chapter describes those DB2 components and presents some tuning tips for them.

Buffer Pools
Buffer pools are areas of virtual storage that temporarily store pages of table spaces or indexes.
When a program accesses a row of a table DB2 places the page containing that row in a buffer.
When a program changes a row of a table DB2 must write the data in the buffer back to disk
(eventually) normally either at a DB2 system checkpoint or a write threshold. The write
thresholds are either a vertical threshold at the page set level or a horizontal threshold at the
buffer pool level.

The way buffer pools work is fairly simple by design, but it is tuning these simple operations
that can make all the difference in the world to the performance of our applications. The data
manager issues GETPAGE requests to the buffer manager who hopefully can satisfy the
request from the buffer pool instead of having to retrieve the page from disk. We often trade
CPU for I/O in order to manage our buffer pools efficiently. Buffer pools are maintained by
subsystem, but individual buffer pool design and use should be by object granularity and in
some cases also by application.

DB2 buffer pool management by design allows the following: ability to ALTER and DISPLAY
buffer pool information dynamically without requiring a bounce of the DB2 subsystem. This
improves availability by allowing us to dynamically create new buffer pools when necessary
and to also dynamically modify or delete buffer pools. We may find we need to do ALTERs of
buffer pools a couple times during the day because of varying workload characteristics. We will
discuss this when we look at tuning the buffer pool thresholds. Initial buffer pool definitions set
at installation/migration but are often hard to configure at this time because the application
process against the objects is usually not detailed at installation. But regardless of what is set
at installation we can use ALTER any time after the install to add/delete new buffer pools,
resize the buffer pools or change any of the thresholds. The buffer pool definitions are stored
in BSDS (Boot Strap Dataset) and we can move objects between buffer pools via an ALTER
INDEX/TABLESPACE and a subsequent START/STOP command of the object.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 115

TUNING SUBSYSTEMS

Pages
There are three types of pages in virtual pools:
• Available pages: pages on an available queue (LRU, FIFO, MRU) for stealing
• In-Use pages: pages currently in use by a process that are not available for stealing. In Use
counts do not indicate the size of the buffer pool, but this count can help determine
residency for initial sizing
• Updated pages: these pages are not ‘in-use’, not available for stealing, and are considered
‘dirty pages’ in buffer pool waiting to be externalized

There are four page sizes and several bufferpools to support each size:
BP0 – BP49 4K pages
BP8K0 – BP8K9 8K pages
BP16K0 – BP16K9 16K pages
BP32K0 – BP32K9 32K pages

Work file table space pages are only 4K or 32K. There is a DSNZPARM called DSVCI that
allows the control interval to match to the actual page size.

Our asynchronous page writes per I/O will change with each page size accordingly.
4K Pages 32 Writes per I/O
8K Pages 16 Writes per I/O
16K Pages 8 Writes per I/O
32K Pages 4 Writes per I/O

With these new page sizes we can achieve better hit ratios and have less I/O because we can
fit more rows on a page. For instance if we have a 2200 byte row (maybe for a data warehouse),
a 4K page would only be able to hold 1 row, but if an 8K page was used 3 rows could fit on a
page, 1 more than if 4K pages were used and one less lock also if required. However, we do not
want to use these new page sizes as a band-aid for what may be a poor design. You may want
to consider decreasing the row size based upon usage to get more rows per page.

Virtual Buffer Pools

We can have buffer pools up to 80 virtual buffer pools. This allows for up to 50 4K page buffer
pools (BP0 - BP49), up to 10 32K page buffer pools (BP32K - BP32K9), up to 10 8K page
buffer pools and up to 10 16K page buffer pools. The size of the buffer pools is limited by the
physical memory available on your system, with a maximum size for all buffer pools of 1TB. It
does not take any additional resources to search a large pool versus a small pool. If you exceed
the available memory in the system, then the system will begin swapping pages from physical
memory to disk, which can have severe performance impacts.

116 ca.com
TUNING SUBSYSTEMS

Buffer Pool Queue Management

Pages used in the buffer pools are processed in two categories: Random (pages read one at a time)
or Sequential (pages read via prefetch). These pages are queued separately: LRU — Random Least
Recently Used queue or SLRU — Sequential Least Recently Used Queue (Prefetch will only steal
from this queue). The percentage of each queue in a buffer pool is controlled via the VPSEQT
parameter (Sequential Steal Threshold). This becomes a hard threshold to adjust, and often
requires two settings — for example, one setting for batch processing and a different setting for
on-line processing. The way we process our data between our batch and on-line processes often
differs. Batch is usually more sequentially processed, whereas on-line is processed more randomly.

DB2 breaks up these queues into multiple LRU chains. This way there is less overhead for
queue management because the latch that is taken at the head of the queue (actually on the
hash control block which keeps the order of the pages on the queue) will be latched less
because the queues are smaller. Multiple subpools are created for a large virtual buffer pool
and the threshold is controlled by DB2, not to exceed 4000 VBP buffers in each subpool. The
LRU queue is managed within each of the subpools in order to reduce the buffer pool latch
contention when the degree of concurrency is high. Stealing of these buffers occurs in a
round-robin fashion through the subpools.

FIFO — First-in, first-out can also be used instead of the default of LRU. With this method the
oldest pages are moved out regardless. This decreases the cost of doing a GETPAGE operation
and reduces internal latch contention for high concurrency. This would only be used where there
is little or no I/O and where table space or index is resident in the buffer pool. We will have
separate buffer pools with LRU and FIFO objects and this can be set via the ALTER BUFFERPOOL
command with a new PGSTEAL option of FIFO. LRU is the PGSTEAL option default.

I/O Requests and Externalization

Synchronous reads are physical pages that are read in one page per I/O. Synchronous writes
are pages written one page per I/O. We want to keep synchronous read and writes to only
what is truly necessary, meaning small in occurrence and number. If not, we may begin to see
buffer pool stress (maybe too many checkpoints). DB2 will begin to use synchronous writes
if the IWTH threshold (Immediate Write) is reached (more on this threshold later in this
chapter) or if 2 system checkpoints pass without a page being written that has been updated
and not yet committed.

Asynchronous reads are several pages read per I/O for such prefetch operations such as
sequential prefetch, dynamic prefetch or list prefetch. Asynchronous writes are several pages
per I/O for such operations as deferred writes.

Pages are externalized to disk when the following occurs:

• DWQT threshold reached
• VDWQT threshold reached
• Dataset is physically closed or switched from R/W to R/O
• DB2 takes a checkpoint (LOGLOAD or CHKFREQ is reached)
• QUIESCE (WRITE YES) utility is executed
• If page is at the top of LRU chain and another update is required of the same page by
another process

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 117

TUNING SUBSYSTEMS

We want to control page externalization via our DWQT and VDWQT thresholds for best
performance and avoid surges in I/O. We do not want page externalization to be controlled by
DB2 system checkpoints because too many pages would be written to disk at one time causing
I/O queuing delays, increased response time and I/O spikes. During a checkpoint all updated
pages in the buffer pools are externalized to disk and the checkpoint recorded in the log
(except for the work files).

Checkpoints and Page Externalization

DB2 checkpoints are controlled through the DSNZPARM — CHKFREQ. The CHKFREQ
parameter is the number of minutes between checkpoints for a value of 1 to 60, or the number
of log records written between DB2 checkpoints for a value of 200 to 16,000,000. The default
value is 500,000. Often we may need different settings for this parameter depending on our
workload. For example we may want it higher during our batch processing. However, this is a
hard parameter to set often because it requires a bounce of the DB2 subsystem in order to take
effect. Recognizing the importance of the ability to change this parameter based on workloads.
The SET LOG CHKTIME command to allows us to dynamically set the CHKFREQ parameter.
There have been other options added to the -SET LOG command to be able to SUSPEND and
RESUME logging for a DB2 subsystem. SUSPEND causes a system checkpoint to be taken in a
non-data sharing environment. By obtaining the log-write latch any further log records are
prevented from being created and any unwritten log buffers will be written to disk. Also, the
BSDS will be updated with the high-written RBA. All further database updates are prevented
until update activity is resumed by issuing a -SET LOG command to RESUME logging, or until a
-STOP DB2 command is issued. These are single-subsystem only commands so they will have
to be entered for each member when running in a data sharing environment.

In very general terms during an on-line processing, DB2 should checkpoint about every 5 to 10
minutes, or some other value based on investigative analysis of the impact on restart time after
a failure. There are two real concerns for how often we take checkpoints:
• The cost and disruption of the checkpoints
• The restart time for the subsystem after a crash

Many times the costs and disruption of DB2 checkpoints are overstated. While a DB2
checkpoint is a tiny hiccup, it does not prevent processing from proceeding. Having a
CHKFREQ setting that is too high along with large buffer pools and high thresholds, such as
the defaults, can cause enough I/O to make the checkpoint disruptive. In trying to control
checkpoints, some users increased the CHKFREQ value and made the checkpoints less
frequent, but in effect made them much more disruptive. The situation is corrected by
reducing the amount written and increasing the checkpoint frequency which yields much
better performance and availability. It is not only possible, but does occur at some installations,
that a checkpoint every minute did not impact performance or availability. The write efficiency
at DB2 checkpoints is the key factor needed to be observed to see if CHKFREQ can be reduced.
If the write thresholds (DWQT/VDQWT) are doing their job, then there is less work to
perform at each checkpoint. Also using the write thresholds to cause I/O to be performed in a
level, non-disruptive fashion is also helpful for the non-volatile storage in storage controllers.

118 ca.com
TUNING SUBSYSTEMS

However, even if we have our write thresholds (DWQT/VDQWT) set properly, as well as our
checkpoints, we could still see an unwanted write problem. This could occur if we do not have
our log datasets properly sized. If the active log data sets are too small then active log switches
will occur often. When an active log switch takes place a checkpoint is taken automatically.
Therefore, our logs could be driving excessive check point processing resulting in constant
writes. This would prevent us from achieving a high ratio of pages written per I/O because the
deferred write queue would not be allowed to fill as it should.

Sizing
Buffer pool sizes are determined by the VPSIZE parameter. This parameter determines the
number of pages to be used for the virtual pool. DB2 can handle large bufferpools efficiently, as
long as enough real memory is available. If insufficient real storage exists to back the bufferpool
storage requested, then paging can occur. Paging can occur when the bufferpool size exceeds
the available real memory on the z/OS image. DB2 limits the total amount of storage allocated
for bufferpools to approximately twice the amount of real storage (but less is recommended).
There is a maximum of 1TB total for all bufferpools (provided the real storage is available).

In order to size bufferpools it is helpful to know the residency rate of the pages for the object(s)
in the bufferpool.

Sequential vs. Random Processing

The VPSEQT (Virtual Pool Sequential Steal Threshold) is the percentage of the virtual buffer
pool that can be used for sequentially accessed pages. This is to prevent sequential data from
using all the buffer pool and keep some space available for random processing. The value is 0
to 100% with a default of 80%. That would indicate that 80% of the buffer pool is to be set
aside for sequential processing, and 20% for random processing. This parameter needs to be
set according to how your objects in that buffer pool are processed.

One tuning option often used is altering the VPSEQT to 0 to set the pool up for just random
use. When the VPSEQT is altered to 0, the SLRU will no longer be valid and the buffer pool is
now totally random. Since only the LRU will be used, all pages on the SLRU have to be freed.
This will also disable prefetch operations in this buffer pool and this is beneficial for certain
strategies. However, there are problems with this strategy for certain buffer pools and this will
be addressed later.

Writes
The DWQT (Deferred Write Threshold), also known as the Horizontal Deferred Write
Threshold, is the percentage threshold that determines when DB2 starts turning on write
engines to begin deferred writes (32 pages/Async I/O). The value can be from 0 to 90%.
When the threshold is reached, write engines (up to 600 write engines as of this publication)
begin writing pages out to disk. Running out of write engines can occur if the write thresholds
are not set to keep a constant flow of updated pages being written to disk. This can occur and
if it is uncommon then it is okay, but if this occurs daily then there is a tuning opportunity. DB2
turns on these write engines, basically one vertical pageset, queue at a time, until a 10%
reverse threshold is met. When DB2 runs out of write engines it can be detected in the
statistics reports in the WRITE ENGINES NOT AVAILABLE indicator on Statistics report.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 119

TUNING SUBSYSTEMS

When setting the DWQT threshold a high value is useful to help improve hit ratio for updated
pages, but will increase I/O time when deferred write engines begin. We would use a low value
to reduce I/O length for deferred write engines, but this will increase the number of deferred
writes. This threshold should be set based on the referencing of the data by the applications.

If we choose to set the DWQT to zero so that all objects defined to the buffer pool are
scheduled to be written immediately to disk then DB2 actually uses its own internal
calculations for exactly how many changed pages can exist in the buffer pool before it is
written to disk.

32 pages are still written per I/O, but it will take 40 dirty pages (updated pages) to trigger the
threshold so that the highly re-referenced updated pages, such as space map pages, remain in
the buffer pool.

When implementing LOBs (Large Objects), a separate buffer pool should be used and this
buffer pool should not be shared (backed by a group buffer pool in a data sharing environment).
The DWQT should be set to 0 so that for LOBS with LOG NO, force-at-commit processing
occurs and the updates continually flow to disk instead of surges of writes. For LOBs defined
with LOG YES, DB2 could use deferred writes and avoid massive surges at checkpoint.

The DWQT threshold works at a buffer pool level for controlling writes of pages to the buffer
pools, but for a more efficient write process you will want to control writes at the pageset/
partition level. This can be controlled via the VDWQT (Vertical Deferred Write Threshold). The
percentage threshold that determines when DB2 starts turning on write engines and begins the
deferred writes for a given data set. This helps to keep a particular pageset/partition from
monopolizing the entire buffer pool with its updated pages. The value is 0 to 90% with a
default of 10%. The VDWQT should always be less than the DWQT.

A good rule of thumb for setting the VDWQT is that if less than 10 pages are written per I/O,
set it to 0. You may also want to set it to 0 to trickle write the data out to disk. It is normally
best to keep this value low in order to prevent heavily updated pagesets from dominating the
section of the deferred write area. Either a percentage of pages or actual number of pages,
from 0 to 9999, can be specified for the VDWQT. You must set the percentage to 0 to use the
number specified. Set to 0,0 and system uses MIN(32,1%) for good for trickle I/O.

If we choose to set the VDWQT to zero, 32 pages are still written per I/O, but it will take 40
dirty pages (updated pages) to trigger the threshold so that the highly re-referenced updated
pages, such as space map pages, remain in the buffer pool.

It is a good idea to set the VDWQT using a number rather than a percentage because if
someone increases the buffer pool that means that now more pages for a particular pageset
can occupy the buffer pool and this may not always be optimal or what you want.

120 ca.com
TUNING SUBSYSTEMS

When looking at any performance report, showing the amount of activity for the VDWQT
and the DWQT, you would want to see the VDWQT being triggered most of the time
(VERTIC.DEFER.WRITE THRESHOLD), and the DWQT extremely less (HORIZ.DEFER.WRITE
THRESHOLD). There can be no general ratios since that would depend on the both the activity
and the number of objects in the buffer pools. The bottom line is that we would want to be
controlling I/O by the VDWQT, with the DWQT watching for and controlling activity across the
entire pool and in general writing out rapidly queuing up pages. This will also assist in limiting
the amount of I/O that checkpoint would have to perform.

Parallelism
THE VPPSEQT Virtual Pool Parallel Sequential Threshold is the percentage of VPSEQT setting
that can be used for parallel operations. The value is 0 to 100% with a default of 50%.
If this is set to 0 then parallelism is disabled for objects in that particular buffer pool. This can
be useful in buffer pools that cannot support parallel operations. The VPXPSEQT — Virtual
Pool Sysplex Parallel Sequential Threshold is a percentage of the VPPSEQT to use for inbound
queries. It also defaults to 50% and if it is set to 0, Sysplex Query Parallelism is disabled when
originating from the member the pool is allocated to. In affinity data sharing environments this
is normally set to 0 to prevent inbound resource consumption of work files and bufferpools.

Stealing Method
The VPSTEAL threshold allows us to choose a queuing method for the buffer pools. The default
is LRU (Least Recently Used), but FIFO (First In- First Out) is also an option. This option turns
off the overhead for maintaining the queue and may be useful for objects that can completely
fit in the bufferpool or if the hit ratio is less than 1%.

Page Fixing
You can use the PGFIX keyword with the ALTER BUFFERPOOL command to fix a buffer pool in
real storage for an extended period of time. The PGFIX keyword has the following options:
• PGFIX(YES) The buffer pool is fixed in real storage for the long term. Page buffers are fixed
when they are first used and remain fixed.
• PGFIX(NO) The buffer pool is not fixed in real storage for the long term. Page buffers are
fixed and unfixed in real storage, allowing for paging to disk. PGFIX(NO) is the default option.

The recommendation is to use PGFIX(YES) for buffer pools with a high I/O rate, that is, a high
number of pages read or written. For buffer pools with zero I/O, such as some read-only data
or some indexes with a nearly 100% hit ratio, PGFIX(YES) is not recommended. In these cases,
PGFIX(YES) does not provide a performance advantage.

Internal Thresholds
The following thresholds are a percent of unavailable pages to total pages, where unavailable
means either updated or in use by a process.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 121

TUNING SUBSYSTEMS

SPTH
THE SPTH Sequential Prefetch Threshold is checked before a prefetch operation is scheduled
and during buffer allocation for a previously scheduled prefetch. If the SPTH threshold is
exceeded prefetch will either not be scheduled or will be canceled. PREFETCH DISABLED —
NO BUFFER (Indicator on Statistics Report) will be incremented every time a virtual buffer
pool reaches 90% of active unavailable pages, disabling sequential prefetch. This value should
always be zero. It this value is not 0, then it is a clear indication that you are probably
experiencing degradation in performance due to all prefetch being disabled. To eliminate this
you may want to increase the size of the buffer pool (VPSIZE). Another option may be to have
more frequent commits in the application programs to free pages in the buffer pool, as this will
put the pages on the write queues.

DMTH
THE DMTH Data Manager Threshold (also referred to as Buffer Critical Threshold) occurs
when 95% of all buffer pages are unavailable (in use). The Buffer Manager will request all
threads to release any possible pages immediately. This occurs by a setting GETPAGE/
RELPAGE processing by row instead of page. After a GETPAGE and single row is processed
then a RELPAGE is issued. This will cause CPU to become high for objects in that buffer pool
and I/O sensitive transaction can suffer. This can occur if the buffer pool is too small. You can
observe when this occurs by seeing a non-zero value in the DM THRESHOLD REACHED
indicator on a statistics reports. This is checked every time a page is read or updated. If this
threshold is not reached then DB2 will access the page in the virtual pool once for each page
(no matter how many rows used). If this threshold has been reached then DB2 will access the
page in the virtual pool once for every ROW on the page that is retrieved or updated. This can
lead to serious performance degradation.

IWTH
The Immediate Write Threshold (IWTH) is reached when 97.5% of buffers are unavailable
(in use). If this threshold is reached then synchronous writes begin and this presents a
performance problem. For example if there are 100 rows in page and if there are 100 updates
then 100 synchronous writes will occur, one by one for each row. Synchronous writes are not
concurrent with SQL, but serial, so the application will be waiting while the write occurs
(including 100 log writes which must occur first). This causes large increases in I/O time. It is
not recorded explicitly in a statistic reports, but DB2 will appear to be hung and you will see
synchronous writes begin to occur when this threshold is reached. Be careful with some
various monitors that send exception messages to the console when synchronous writes occur
and refers to it as IWTH reached — not all synchronous writes are caused by this threshold
being reached. This is simply being reported incorrectly. See the following note.

Note: Be aware that if looking at some performance reports, the IWTH counter can also be
incremented when dirty pages are on the write queue have been re-referenced, which has
caused a synchronous I/O before the page could be used by the new process. This threshold
counter can also be incremented if more than two checkpoints occur before an updated page
is written since this will cause a synchronous I/O to write out the page.

122 ca.com
TUNING SUBSYSTEMS

Virtual Pool Design Strategies

Separate buffer pools should be used based upon their type of usage by the applications (such
as buffer pools for objects that are randomly accessed vs. those that are sequentially accessed).
Each one of these buffer pools will have its own unique settings and the type of processing may
even differ between the batch cycle and the on-line day. These are very generic breakouts just
for this example. Actual definitions would be much finer tuned, less generic.

MORE DETAILED EXAMPLE OF BUFFER POOL OBJECT BREAKOUTS

BP0 Catalog and directory — DB2 only use

BP1 Work files (Sort)
BP2 Code and reference tables — heavily accessed
BP3 Small tables, heavily updated — trans tables, work tables
BP4 Basic tables
BP5 Basic indexes
BP6 Special for large clustered, range scanned table
BP7 Special for Master Table full index (Random searched table)
BP8 Special for an entire database for a special application
BP9 Derived tables and “saved” tables for ad-hoc
BP10 Staging tables (edit tables for short lived data)
BP11 Staging indexes (edit tables for short lived data)
BP12 Vendor tool/utility object

Tuning with the -DISPLAY Buffer Pool Command

In several cases the buffer pools can be tuned effectively using the DISPLAY BUFFERPOOL
command. When a tool is not available for tuning, the following steps can be used to help
tune buffer pools.
1. Use command and view statistics
2. Make changes (i.e. thresholds, size, object placement)
3. Use command again during processing and view statistics
4. Measure statistics

The output contains valuable information such as prefetch information (Sequential, List,
Dynamic Requests) Pages Read, Prefetch I/O and Disablement (No buffer, No engine). The
incremental detail display shifts the time frame every time a new display is performed.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 123

TUNING SUBSYSTEMS

RID Pool
The RID (Row Identifier) pool is used for storing and sorting RIDs for operations such as:
• List Prefetch
• Multiple Index Access
• Hybrid Joins
• Enforcing unique keys while updating multiple rows

The RID pool is looked at by the optimizer for prefetch and RID use. The full use of RID POOL
is possible for any single user at run time. Run time can result in a table space scan being
performed if not enough space is available in the RID. For example, if you want to retrieve
10,000 rows from 100,000,000 row table and there is no RID pool available, then a scan of
100,000,000 rows would occur, at any time and without external notification. The optimizer
assumes physical I/O will be less with a large pool.

Sizing
The default size of the RID pool is currently 8 MB with a maximum size of 10000 MB, and is
controlled by the MAXRBLK installation parameter. The RID pool could be set to 0, and this
would disable the types of operations that use the RID pool, and DB2 would not choose access
paths that the RID pool supports. The RID pool is created at start up time, but no space is
allocated until RID storage is actually needed. It is then allocated in 32KB blocks as needed,
until the maximum size you specified on installation panel DSNTIPC is reached. There are a
few guidelines for setting the RID pool size. You should have as large a RID pool as required as
it is a benefit for processing and can lead to performance degradation if it is too small. A good
guideline for sizing the RID pool is as follows:

Number of concurrent RID processing activities

X average number of RIDs x 2 x 5 bytes per RID

Statistics to Monitor
There are three statistics to monitor for RID pool problems:

RIDS OVER THE RDS LIMIT This is the number of times list prefetch is turned off because the
RID list built for a single set of index entries is greater that 25% of number of rows in the table.
If this is the case, DB2 determines that instead of using list prefetch to satisfy a query it would
be more efficient to perform a table space scan, which may or may not be good depending on
the size of the table accessed. Increasing the size of the RID pool will not help in this case. This
is an application issue for access paths and needs to be evaluated for queries using list prefetch.

124 ca.com
TUNING SUBSYSTEMS

There is one very critical issue regarding this type of failure. The 25% threshold is actually
stored in the package/plan at bind time, therefore it may no longer match the real 25% value,
and in fact could be far less. It is important to know what packages/plans are using list prefetch,
and on what tables. If the underlying tables are growing, then rebinding the packages/plans
that are dependent on it should be rebound after a RUNSTATS utility has updated the statistics.
Key correlation statistics and better information about skewed distribution of data can also
help to gather better statistics for access path selection and may help avoid this problem.

RIDS OVER THE DM LIMIT This occurs when over 28 million RIDS were required to satisfy a
query. Currently there is a 28 million RID limit in DB2. The consequences of hitting this limit
can be fallback to a table space scan. In order to control this, you have a couple of options:
• Fix the index by doing something creative
• Add an additional index better suited for filtering
• Force list prefetch off and use another index
• Rewrite the query
• Maybe it just requires a table space scan

INSUFFICIENT POOL SIZE This indicates that the RID pool is too small.

The SORT Pool

Sorts are performed in 2 phases:
• Initialization
– DB2 builds ordered sets of ‘runs’ from the given input
• Merge
– DB2 will merge the ‘runs’ together

DB2 allocates at startup a sort pool in the private area of the DBM1 address space. DB2 uses
a special sorting technique called a tournament sort. During the sorting processes it is not
uncommon for this algorithm to produce logical work files called runs, which are intermediate
sets of ordered data. If the sort pool is large enough then the sort completes in that area.
More often than not the sort cannot complete in the sort pool and the runs are moved into the
work file database, especially if there are many rows to sort. These runs are later merged to
complete the sort. When work file database is used for holding the pages that make up the
sort runs, you could experience performance degradation if the pages get externalized to the
physical work files since they will have to be read back in later in order to complete the sort.

Size
The sort pool size defaults to 2MB unless specified. It can range in size from 240KB to 128MB
and is set with an installation DSNZPARM. The larger the Sort Pool (Sort Work Area) is, the
fewer sort runs are produced. If the sort pool is large enough then the buffer pools and sort
work files may not be used. If buffer pools and work file database are not used then the better
performance will be due to less I/O. We want to size sort pool and work file database large
because we do not want sorts to have pages being written to disk.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 125

TUNING SUBSYSTEMS

The EDM Pool

The EDM pool (Environmental Descriptor Manager) is made up of three components, each of
which is in its own separate storage, and each contains many items including the following:
• EDM Pool
– CTs – cursor tables (copies of the SKCTs)
– PTs – package tables (copies of the SKPTs)
– Authorization cache block for each plan
– Except those with CACHESIZE set to 0
• EDM Skeleton Pool
– SKCTs – skeleton cursor tables
– SKPTs – skeleton package tables
• EDM DBD cache
– DBDs – database descriptors
• EDM Statement Cache
– Skeletons of dynamic SQL for CACHE DYNAMIC SQL

Sizing
If the pool is too small, then you will see increased I/O activity in the following DB2 table
spaces, which support the DB2 directory:

DSNDB01.DBD01

DSNDB01.SPT01

DSNDB01.SCT02

Our main goal for the EDM pool is to limit the I/O against the directory and catalog. If the pool
is too small, then you will also see increased response times due to the loading of the SKCTs,
SKPTs, and DBDs, and re-preparing the dynamic SQL statements because they could not
remained cached. By correctly sizing the EDM pools you can avoid unnecessary I/Os from
accumulating for a transaction. If a SKCT, SKPT or DBD has to be reloaded into the EDM pool
this is additional I/O. This can happen if the pool pages are stolen because the EDM pool is too
small. Pages in the pool are maintained on an LRU queue, and the least recently used pages
get stolen if required. A DB2 performance monitor statistics report can be used to track the
statistics concerning the use of the EDM pools.

If a new application is migrating to the environment it may be helpful to look in

SYSIBM.SYSPACKAGES to give you an idea of the number of packages that may have to
exist in the EDM pool and this can help determine the size.

126 ca.com
TUNING SUBSYSTEMS

Efficiency
We can measure the following ratios to help us determine if our EDM pool is efficient. Think of
these as EDM pool hit ratios:
• CT requests versus CTs not in EDM pool
• PT requests versus PTs not in EDM pool
• DBD requests versus DBDs not in EDM pool

What you want is a value of 5 for each of the above (1 out of 5). An 80% hit ratio is what you
are aiming for.

Dynamic SQL Caching

If we are going to use dynamic SQL caching we are going to have to pay attention to our EDM
statement cache pool size. Cached statements are not backed by disk and if its pages are
stolen, and the statement is reused, it will have to be prepared again. Static plans and packages
can be flushed from EDM by LRU but are backed by disk and can be retrieved when used again.
There are statistics to help monitor cache use and trace fields show effectiveness of cache, and
can be seen on the Statistics Long Report. In addition, the dynamic statement cache can be
“snapped” via the EXPLAIN STMTCACHE ALL Explain statement. The results of this statement
are placed in the DSN_STATEMENT_CACHE_TABLE described earlier in this chapter.

In addition to the global dynamic statement caching in a subsystem, an application can also cache
statements at the thread level via the KEEPDYNAMIC(YES) bind parameter in combination
with not re-preparing the statements. In these situations the statements are cached at the
thread level in thread storage as well as at the global level. As long as there is not a shortage
of virtual storage the local application thread level cache is the most efficient storage for the
prepared statements. The MAXKEEPD subsystem parameter can be used to limit the amount
of thread storage consumed by applications caching dynamic SQL at the thread level.

Logging
Every system has some component that will eventually become the final bottleneck. Logging is
not to be overlooked when trying to get transactions through the systems in a high performance
environment. Logging can be tuned and refined, but the synchronous I/O associated with
logging and commits will always be there.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 127

TUNING SUBSYSTEMS

Log Reads
When DB2 needs to read from the log it is important that the reads perform well because
reads are normally performed during recovery, restarts and rollbacks — processes that you do
not want taking forever. An input buffer will have to be dedicated for every process requesting
a log read. DB2 will first look for the record in the log output buffer. If it finds the record there it
can apply it directly from the output buffer. If it is not in the output buffer then DB2 will look for
it in the active log data set and then the archive log data set. When it is found the record is
moved to the input buffer so it can be read by the requesting process. You can monitor the
successes of reads from the output buffers and active logs in the statistics report. These reads
are the better performers. If the record has to be read in from the archive log, the processing
time will be extended. For this reason it is important to have large output buffers and active logs.

Log Writes
Applications move log records to the log output buffer using two methods — no wait or force.
The ‘no wait’ method moves the log record to the output buffer and returns control to the
application, however if there are no output buffers available the application will wait. If this
happens it can be observed in the statistics report when the UNAVAILABLE ACTIVE LOG BUFF
has a non-zero value. This means that DB2 had to wait to externalize log records due to the
fact that there were no available output log buffers. Successful moves without a wait are
recorded in the statistics report under NO WAIT requests.

A force occurs at commit time and the application will wait during this process, which is
considered a synchronous write.

Log records are then written from the output buffers to the active log datasets on disk either
synchronously or asynchronously. To know how often this happens you can look at the WRITE
OUTPUT LOG BUFFERS in the statistics report.

In order to improve the performance of the log writes there are a few options. First we can
increase the number of output buffers available for writing active log datasets, which is
performed by changing an installation parameter (OUTBUFF). You would want to increase this
if you are seeing that there are unavailable buffers. Providing a large buffer will improve
performance for log reads and writes.

Locking and Contention

DB2 uses locking to ensure consistent data based on user requirements and to avoid losing
data updates. You must balance the need for concurrency with the need for performance. If at
all possible, you want to minimize:

SUSPENSION An application process is suspended when it requests a lock that is already held
by another application process and cannot be shared. The suspended process temporarily
stops running.

128 ca.com
TUNING SUBSYSTEMS

TIME OUT An application process is said to time out when it is terminated because it has been
suspended for longer than a preset interval. DB2 terminates the process and returns a -911 or
-913 SQL code.

DEADLOCK A deadlock occurs when two or more application processes hold locks on
resources that the others need and without which they cannot proceed.

To monitor DB2 locking problems:

• Use the -DISPLAY DATABASE command to find out what locks are held or waiting at any
moment
• Use EXPLAIN to monitor the locks required by a particular SQL statement, or all the SQL in
a particular plan or package
• Activate a DB2 statistics class 3 trace with IFCID 172. This report outlines all the resources
and agents involved in a deadlock and the significant locking parameters, such as lock state
and duration, related to their requests
• Activate a DB2 statistics class 3 trace with IFCID 196. This report shows detailed
information on timed-out requests, including the holder(s) of the unavailable resources

The way DB2 issues locks is complex. It depends on the type of processing being done, the
LOCKSIZE parameter specified when the table was created, the isolation level of the plan or
package being executed, and the method of data access.

Thread Management
The thread management parameters on DB2 install panel DSNTIPE control how many threads
can be connected to DB2, and determine main storage size needed. Improper allocation of the
parameters on this panel directly affect main storage usage. If the allocation is too high, storage
is wasted. If the allocation is too low, performance degradation occurs because users are
waiting for available threads.

You can use a DB2 performance trace record with IFCID 0073 to retrieve information about
how often thread create requests wait for available threads. Starting a performance trace can
involve substantial overhead, so be sure to qualify the trace with specific IFCIDS, and other
qualifiers, to limit the data collected.

The MAX USERS field on DSNTIPE specifies the maximum number of allied threads that can
be allocated concurrently. These threads include TSO users, batch jobs, IMS, CICS, and tasks
using the call attachment facility. The maximum number of threads that can be accessing
data concurrently is the sum of this value and the MAX REMOTE ACTIVE specification.
When the number of users trying to access DB2 exceeds your maximum, plan allocation
requests are queued.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 129

TUNING SUBSYSTEMS

The DB2 Catalog and Directory

Make sure the DB2 catalog and directory are on separate volumes. Even better, make sure the
volumes are completely dedicated to the catalog and directory for best performance. If you
have indexes on the DB2 catalog, place them on a separate volume as well.

You can reorg the catalog and directory. You should periodically run RUNSTATs on the catalog
and analyze the appropriate statistics that let you know when you need to reorg.

Also, consider isolating the DB2 catalog and directory into their own buffer pool.

130 ca.com
CHAPTER 9

Understanding and Tuning

Your Packaged Applications

Many organizations today do not retain the manpower resources to build and maintain their
own custom applications. In many situations, therefore, these organizations are relying on “off
the shelf” packaged software solutions to help run their applications and automate some of
their business activities such as accounting, billing, customer management, inventory
management, and human resources. Most of the time these packaged application are referred
to as enterprise resource planning (ERP) applications. This chapter will offer some tips as to
how to manage and tune your DB2 database for a higher level of performance with these
ERP applications.

Database Utilization and Packaged Applications

You should keep in mind that these ERP applications are intentionally generic by design.
They are also typically database agnostic, which means that they are not written to a specific
database vendor or platform, but rather they are written with a generic set of standardized SQL
statements that can work on a variety of database management systems and platforms. So,
you should assume that your ERP application is taking advantage of few if any of the DB2 SQL
performance features, or specific database performance features, table or index designs.

These packaged applications are also typically written using an object oriented design, and
are created in such a way that they can be customized by the organization implementing the
software. This provides a great flexibility in that many of the tables in the ERP applications’
database can be used in a variety of different ways, and for a variety of purposes. With this
great flexibility also comes the potential for reduced database performance. OO design may
lead to an increase in SQL statements issued, and the flexibility of the implementation could
have an impact on table access sequence, random data access, and mismatching of predicates
to indexes.

Finally, when you purchase an application you very typically have no access to the source code,
or the SQL statements issued. Sometimes you can get a vendor to change a SQL statement
based upon information you provided about a performance problem, but this is typically not
the case. So, with that in mind the majority of your focus should be on first getting the
database organized in a manner that best accommodates the SQL statements, and then tune
the subsystem to provide the highest level of throughput. Of course, even though you can’t
change the SQL statements you should first be looking at those statements for potential
performance problems that can be improved by changing the database or subsystem.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 131

UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS

Finding and Fixing SQL Performance Problems

The first thing that needs to be done when working with a packaged application is to determine
where the performance problems exist. It is most likely that you will not be able to change the
SQL statements, as this will require getting the vendor involved and having them make a
change that will improve a query for you, and not negatively impact performance for other
customers. Even though you won’t be able to change the SQL statements, you can change
subsystem parameters, memory, and even change the tables and indexes for performance.
Before you begin your tuning effort you have to decide if you are tuning in order to improve the
response time for end users, or if you are tuning to save CPU dollars.

The best way to find the programs and SQL statements that have the potential for elapsed and
CPU time savings is to utilize the technique described in Chapter 6 called “Overall Application
Performance Monitoring”. This would be the technique for packaged applications that are using
static embedded SQL statements. However, if your packaged application is utilizing dynamic
SQL then this technique will be less effective. In that case you are better off utilizing one or more
of the techniques outlined in Chapter 3 “Recommendations for Distributed Dynamic SQL”.

Whether or not the application is using static or dynamic SQL it’s best to capture all of the SQL
statements being issued, and cataloging them in some sort of document or perhaps even in a
DB2 table. This can be achieved by querying SYSIBM.SYSSTMT table by plan for statements in
a plan, or the SYSIBM.SYSPACKSTMT table by package for statements in a package. You can
query these tables using the plan or package names corresponding to the application. If the
application is utilizing dynamic SQL then you can capture the SQL statements by utilizing
EXPLAIN STMTCACHE ALL (DB2 V8 or DB2 9), or by running a trace (DB2 V7, DB2 V8, DB2
9). Of course if you run a trace you can expect to have to parse through a significant amount
of data. Keep in mind that by capturing these dynamic statements you are only seeing the
statements that have been executing during the time you’ve monitored. For example, using
EXPLAIN STMTCACHE ALL statement will only capture the statements that are residing in
the dynamic statement cache at the time the statement is executed.

Once you’ve determined the packages and/or statements that are consuming the most
resources, or have the highest elapsed time, then you can begin your tuning effort. This will
involve subsystem and/or database tuning. You need to make a decision at this point as
subsystem tuning really has no impact on the application and database design. However, if you
make changes to the database, then you need to consider the fact that future product upgrades
or releases can undo the changes you have done.

132 ca.com
UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS

Subsystem Tuning for Packaged Applications

The easiest way to tune packaged applications is by performing subsystem tuning. This could
involve simply changing some subsystem parameters, or perhaps increasing memory. The
impact of the changes can be immediate, and in situations in which the subsystem is poorly
configured the impact can be dramatic. Please refer to Chapter 8 on subsystem tuning for the
full details of tuning such things as:
• Buffer Pools Buffer tuning can be a major contributor to performance with packaged
applications. Consider first organizing you buffers according to the general recommendations
in Chapter 8 of this guide. Then you can take a closer look at the access to individual page
sets within the pools. You can use the –DISPLAY BUFFERPOOL command to get information
about which page sets are accessed randomly versus sequentially. You can use this
information to further separate objects based upon their access patterns. You can identify
the code tables, and get them all in their own buffer pool with enough memory to make
them completely memory resident. You could also identify the most commonly accessed
page sets and separate them into their own buffer pool. Once you’ve made these changes
you can increase the amount of memory in the most highly accessed buffer pools (especially
for indexes), and monitor the buffer hit ratio for improvement. Continue to increase the
memory until you get a diminishing return on the improved hit ratio or you run out of
available real memory.
• Dynamic Statement Cache Is the application using dynamic SQL? If so, then you most
certainly should make sure that the dynamic statement cache is enabled, and given enough
memory to avoid binding on PREPARE. You can monitor the statement cache hit ratio by
looking at a statistics report. Statement prepare time can be a significant contributor to
overall response time, and you’ll want to minimize that. If the statement prepare time is a big
factor in query performance you could consider adjusting some DB2 installation parameters
that control the bind time. These parameters, MAX_OPT_STOR, MAX_OPT_CPU,
MAX_OPT_ELAP, and MXQBCE (DB2 V8, DB2 9) are what is known as “hidden” installation
parameters and will help control statement prepare costs, but most certainly should be
adjusted only under the advice of IBM.
• RID Pool Make sure to look at your statistics report to see how much the RID pool is
utilized, and if there are RID failures due to lack of storage. If so, you should increase the
size of the RID pool.
• DB2 Catalog If it isn’t already, the DB2 System Catalog should be in its own buffer pool. If
the application is using dynamic SQL then this pool should be large enough to avoid I/O for
statement binds.
• Workfile Database and Sort Pool Your packaged application is probably doing a lot of
sorting. If this is the case then you should make sure that your sort pool is properly sized to
deal with lots of smaller sorts. In addition, you should have several workfile table spaces
created in order to avoid resource contention between concurrent processes.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 133

UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS

• Memory and DSMAX These packaged applications typically have a lot of objects. You can
monitor to see which objects are most commonly accessed, and make those objects CLOSE
NO. Make sure your DSMAX installation parameter is large enough to avoid any closing of
datasets, and if you have enough storage make all of the objects CLOSE NO. You can also
make sure your output log buffer is large enough to avoid shortages if the application is
changing data quite often. You can consider using the KEEPDYNAMIC bind option for the
packages, but keep in mind that this can increase thread storage, and you’ll need to balance
the number of threads with the amount of virtual storage consumed by the DBM1 address
space. Make sure your buffer pools are page fixed.

Please refer to Chapter 8 of this guide for more subsystem tuning options. You can also refer to
the IBM redbook entitled “DB2 UDB for z/OS V8: Through the Looking Glass and What SAP
Found There” for additional tips on performance.

Table and Index Design Options for High Performance

Chapter 4 of this guide should be reviewed for general table and index design for performance.
However, there are certain areas that you can focus on for your packaged applications. The first
thing you have to do is find the portions of the application that have the biggest impact on
performance. You can use the techniques outlined in Chapters 3 and 6 of this guide to do that.
Finding the packages and statement that present themselves as performance problems can
then lead you to the most commonly accessed tables. Taking a look at how these tables are
accessed by the programs (DISPLAY BUFFERPOOL command, accounting report, statistics
report, performance trace, EXPLAIN) can give you clues as to potential changes for performance.
Are there a lot of random pages accessed? Are matching columns of an index scan not
matching on all available columns? Are there more columns in a WHERE clause then in the
index? Does it look like there is repetitive table access? Are the transaction queries using list
prefetch? If you are finding issues like these you can take action. In general, there are two
major things you can change:
• Indexes These packaged applications are generic in design, but your use of them will be
customized for you needs. This means that the generic indexes should be adjusted to fit
your needs. The easiest thing to do is add indexes in support of poor performing queries.
However, you should be aware of the impact that adding indexes will have on inserts, deletes,
possibly updates, and some utilities. That’s why having a catalog of all the statements is
important. Other things you can do is to change the order of index columns to better match
your most common queries in order to increase matchcols, as well as adding columns to
indexes to increase matchcols. You can also change indexes that contain variable length
columns to use the NOT PADDED (DB2 V8, DB2 9) option to improve performance of
those indexes.
• Clustering Since the packaged application can anticipate your most common access paths
you could have a problem with clustering. Understanding how the application accesses the
tables is important, but you can look at joins that occur commonly and consider changing
the clustering of those tables so that they match. This can have a dramatic impact on the
performance of these joins, but you have to make sure that no other processes are negatively
impacted. If tables are partitioned for a certain reason you could consider clustering within
each partition (DB2 V8, DB2 9) to keep the separation of data by partition, but improve the
access within each partition. Look for page sets that have a very high percentage of randomly
accessed pages in the buffer pool as a potential opportunity to change clustering.

134 ca.com
UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS

Of course you need to always be aware of the fact that any change you make to the database
can be undone on the next release of the software. Therefore, you should document every
change completely!

Monitoring and Maintaining Your Objects

The continual health of your objects is important. Keep a list of commonly accessed objects,
especially those that are changed often, and begin a policy of frequent monitoring of these
objects. This involves regularly running the RUNSTATS utility on these frequently changing
indexes and table spaces. Performance can degrade quickly as these objects become
disorganized. Make sure to set up a regular strategy of RUNSTATS, REORGS, and REBINDs (for
static SQL) to maintain performance. Understanding the quantity of inserts to a table space
can also guide you to adjusting the free space. Proper PCTFREE for table space can help avoid
expensive exhaustive searches for inserts. Proper PCTFREE is very important for indexes where
inserts are common. You don’t want these applications to be splitting pages so set the
PCTFREE high, especially if there is little or no sequential scanning of the index.

Real Time Statistics

You can utilize DB2’s ability to collect statistics in real time to help you monitor the activity
against you packaged application objects. Real time statistics allows DB2 to collects statistics
on table spaces and index spaces and then periodically write this information to two user-
defined tables (as of DB2 9 these tables are an integral part of the system catalog). The
statistics can be used by user written queries/programs, a DB2 supplied stored procedure or
Control Center, to make decisions for object maintenance.

STATISTICS COLLECTION DB2 is always collecting statistics for database objects. The statistics
are kept in virtual storage and are calculated and updated asynchronously upon externalization.
In order to externalize them the environment must be properly set up. A new set of DB2
objects must be created in order to allow for DB2 to write out the statistics.

SDSNSAMP(DSNTESS) contains the information necessary to set up these objects.

There are two tables (with appropriate indexes) that must be created to hold the statistics:
• SYSIBM.TABLESPACESTATS
• SYSIBM.INDEXSPACESTATS

These tables are kept in a database named DSNRTSDB, which must be started in order to
externalize the statistics that are being held in virtual storage. DB2 will then populate the tables
with one row per table space or index space, or one row per partition. For tables that are shared
in a data-sharing environment, each member will write its own statistics to the RTS tables.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 135

UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS

Some of the important statistics that are collected for table spaces include: total number of
rows, number of active pages, and time of last COPY, REORG, or RUNSTATS execution. Some
statistics that may help determine when a REORG is needed include: space allocated, extents,
number of inserts, updates, or deletes (singleton or mass) since the last REORG or LOAD
REPLACE, number of unclustered inserts, number of disorganized LOBs, number of overflow
records created since last REORG. There are also statistics to help for determining when
RUNSTATS should be executed. These include: number of inserts/updates/deletes (singleton
and mass) since the last RUNSTATS execution. Statistics collected to help with COPY
determination include: distinct updated pages and changes since the last COPY execution
and the RBA/LRSN of first update since last COPY.

There are also statistics gathered on indexes. Basic index statistics include: total number of
entries (unique or duplicate), number or levels, number of active pages, space allocated and
extents. Statistics that help to determine when a REORG is needed include: time when the last
REBUILD, REORG or LOAD REPLACE occurred. There are also statistics regarding the number
of updates/deletes (real or pseudo, singleton or mass)/inserts (random and those that were
after the highest key) since the last REORG or REBUILD. These statistics are of course very
helpful for determining how our data physically looks after certain process (i.e. batch inserts)
have occurred so we can take appropriate actions if necessary.

EXTERNALIZING AND USING REAL TIME STATISTICS There are different events that can trigger
the externalization of the statistics. DSNZPARM STATSINST (default 30 minutes) is used to
control the externalization of the statistics at a subsystem level.

There are several processes that will have an effect on the real time statistics. Those processes
include: SQL, Utilities and the dropping/creating of objects.

Once externalized, queries can then be written against the tables. For example, a query against
the TABLESPACESTATS table can be written to identify when a table space needs to be copied
due to the fact that greater than 30 percent of the pages have changed since the last image
copy was taken.

SELECT NAME

FROM SYSIBM.SYSTABLESPACESTATS

WHERE DBNAME = ‘DB1’ and

((COPYUPDATEDPAGES*100)/NACTIVE)>30

This table can be used to compare the last RUNSTATS timestamp to the timestamp of the last
REORG on the same object to determine when RUNSTATS is needed. If the date of the last
REORG is more recent than the last RUNSTATS, then it may be time to execute RUNSTATS.

136 ca.com
UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS

SELECT NAME

FROM SYSIBM.SYSTABLESPACESTATS

WHERE DBNAME = ‘DB1’ and

(JULIAN_DAY(REORGLASTTIME)>JULIAN_DAY(STATSLASTTIME))

This last example may be useful if you want to monitor the number of records that were
inserted since the last REORG or LOAD REPLACE that are not well-clustered with respect to
the clustering index. Ideally, ‘well-clustered’ means the record was inserted into a page that
was within 16 pages of the ideal candidate page (determined by the clustering index). The
SYSTABLESPACESTATS table value REORGUNCLUSTINS can be used to determine whether
you need to run REORG after a series of inserts.

SELECT NAME

FROM SYSIBM.SYSTABLESPACESTATS

WHERE DBNAME = ‘DB1’ and

((REORGUNCLUSTINS*100)/TOTALROWS)>10

There is also a DB2 supplied stored procedure to help with this process, and possibly even
work toward automating the whole determination/utility execution process. This stored
procedure, DSNACCOR, is a sample procedure which will query the RTS tables and determine
which objects need to be reorganized, image copied, updated with current statistics, have
taken too many extents, and those which may be in a restricted status. DSNACCOR creates
and uses its own declared temporary tables and must run in a WLM address space. The output
of the stored procedure provides recommendations by using a predetermined set of criteria
in formulas that use the RTS and user input for their calculations. DSNACCOR can make
recommendations for everything (COPY, REORG, RUNSTATS, EXTENTS, RESTRICT) or for
one or more of your choice and for specific object types (table spaces and/or indexes).

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 137

Page intentionally left blank
CHAPTER 10

Tuning Tips

Back against a wall? Not sure why DB2 is behaving in a certain way, or do you need an answer
to a performance problem. It’s always good to have a few ideas or tricks up your sleeve. Here
are some DB2 hints and tips in no particular order.

Code DISTINCT Only When Needed

The use of DISTINCT can cause excessive sorting, which can cause queries to become very
expensive. Also, the use of DISTINCT in a table expression forces materialization. Only a
unique index can help to avoid sorting for a DISTINCT.

We often see DISTINCT coded when it is not necessary and sometimes it comes from a lack of
understanding of the data and/or the process. Sometimes code generators create a DISTINCT
after every SELECT clause, no matter where the SELECT is in a statement. Also we have seen
some programmers coding DISTINCTs just as a safeguard. When considering the usage of
DISTINCT, the question to first ask is ‘Are duplicates even possible?’ If the answer is no, then
remove the DISTINCT and avoid the potential sort.

If duplicates are possible and not desired then use DISTINCT wisely. Try to use a unique index
to help avoid a sort. Consider Using DISTINCT as early as possible in complex queries in order
to get rid of the duplicates as early as possible. Also, avoid using DISTINCT more than once in
a query. A GROUP BY all columns can utilize non-unique indexes possibly avoid a sort. Coding
a GROUP BY all columns can be more efficient than using DISTINCT, but should be carefully
documented in your program and/or statement.

Don’t Let Java in Websphere Default the Isolation Level

ISOLATION(RS) is the default for Websphere applications. RS (Read Stability) is basically RR
(Repeatable Read) with inserts allowed. RS tells DB2 that when you read a page you intend to
read it or update it again later, and you want no one else to update that page until you commit,
causing DB2 to take share locks on pages and hold those locks. This is rarely ever necessary,
and can definitely cause increased contention.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 139

TUNING TIPS

Using Read Stability can lead to other known problems, such as the inability to get lock
avoidance, and potential share lock escalations because DB2 may escalate from an S lock on
the page to an S lock on the tablespace. Another problem that has been experienced is that
searched updates can deadlock under isolation RS. When an isolation level of RS is used, the
search operation for a searched update will read the page with an S lock. When it finds data to
update then it will change the S lock to an X lock. This could be exaggerated by a self-referencing
correlated subquery in the update (e.g. updating the most recent history or audit row). This is
because in that situation DB2 will read the data with an S lock, put the results in a workfile with
the RIDs, and then go back to do the update in RID sequence. As the transaction volume
increases these deadlocks are more likely to occur. These problems have been experienced
with applications that are running under Websphere and allowing it to determine the isolation
level, which defaults to RS.

So how do you control this situation? Well some possible solutions include the following.
1. Change the application server connect to use an isolation level of CS rather than RS. This
can be done by setting a JDBC database connection property. This is the best option, but
often the most difficult to get implemented.
2. Rebind the isolation RS package being used, with an isolation level of CS. This is a dirty
solution because there may be applications that require RS, or when the DB2 client
software is upgraded it may result in a new RS package (or a rebind of the package), and
we'll have to track that and perform a special rebind with every upgrade on every client.
3. Have the application change the statement to add WITH CS.
4. There is a zparm RRULOCK=YES option which acquires U, versus S locks for
update/delete with ISO(RS) or ISO(RR). This could help to avoid deadlocks for some
update statements.

Through researching this issue we have found no concrete reason why this is the default in
Websphere, but for DB2 applications this is wasteful and should be changed.

Locking a Table in Exclusive Mode Will NOT Always Prevent

Applications from Reading It
From some recent testing both in V7 and V8 we found that issuing the LOCK TABLE <table
name> IN EXCLUSIVE MODE would not prevent another application from reading the table
either with UR or CS.

In V7, this was the case if the tablespace was defined with LOCKPART(YES), and the V8 test
results were the same because now LOCKPART(YES) is the default for the tablespace.

Once an update was issued then the table was not readable.

This is working as designed and the reason for this tip is to make sure that applications are
aware that just issuing this statement does NOT immediately make a table unreadable.

140 ca.com
TUNING TIPS

Put Some Data in Your Empty Control Table

Do you have a table that is used for application control? Is this table heavily accessed, but
usually has no data in it? We often use tables to control application behavior with regards to
availability or special processing. Many times these control tables are empty, and will only
contain data when we want the application to behave in an unusual way. In testing with these
tables we discovered that you do not get index lookaside on an empty index. This can add a
significant number of getpage operations for a very busy application. If this is the case then
you should see if the application can tolerate some fake data in the table. If so you’ll then get
index lookaside and save a little CPU.

Use Recursive SQL to Assign Multiple Sequence Numbers in Batch

In some situations you may have a process that adds data to a database, but this process is
utilized by both online and batch applications. If you are using sequence objects to assign
system generated key values this works quite well for online, but can get expensive in batch.
Instead of creating an object with a large increment value why not keep an increment of 1 for
online, and grab a “batch” of sequence numbers in batch. You can do this by using recursive
SQL. Simply write a recursive common table expression that generates a number of rows
equivalent to the number of sequence values you want in a batch, and then as you select from
the common table expression you can get the sequence values. Using the appropriate settings
for block fetching in distributed remote, or by using multi-row fetch in local batch, you can
significantly reduce the cost of getting these sequence values for a batch process. Here’s what
a SQL statement would look like for 1000 sequence values:

WITH GET_THOUSAND (C) AS

(SELECT 1

FROM SYSIBM.SYSDUMMY1

UNION ALL

SELECT C+1

FROM GET_THOUSAND

WHERE C < 1000)

SELECT NEXT VALUE FOR SEQ1

FROM GET_THOUSAND;

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 141

TUNING TIPS

Code Correlated Nested Table Expression versus Left Joins to Views

In some situations DB2 may decide to materialize a view when you are left joining to that view.
If this is happening it is not likely that you’ll be able to get the performance desired, especially
for transaction environments when you transactions are processing little or not data. Consider
the following view and query:

CREATE VIEW HIST_VIEW (ACCT_ID, HIST_DTE, ACCT_BAL) AS

(SELECT ACCT_ID, HIST_DTE, MAX(ACCT_BAL)

FROM ACCT_HIST

GROUP BY ACCT_ID, HIST_DTE);

SELECT A.ACCT_ID, A.CUST_NME, B.HIST_DTE, B.ACCT_BAL

FROM ACCOUNT A

LEFT OUTER JOIN

HIST_VIEW B

ON A.ACCT_ID = B.ACCT_ID

WHERE A.ACCT_ID = 1110087;

Consider instead the following query which will strongly encourage DB2 to use the index on
the ACCT_ID column of the ACCT_HIST table:

SELECT A.ACCT_ID, A.CUST_NME, B.HIST_DTE, B.ACCT_BAL

FROM ACCOUNT A

LEFT OUTER JOIN

TABLE(SELECT ACCT_ID, HIST_DTE, MAX(ACCT_BAL)

FROM ACCT_HIST X

WHERE X.ACCT_ID = A.ACCT_ID

GROUP BY ACCT_ID, HIST_DTE) AS B

ON A.ACCT_ID = B.ACCT_ID

WHERE A.ACCT_ID = 1110087;

142 ca.com
TUNING TIPS

This recommendation is not limited to aggregate queries in views, but can apply to many
situations. Give it a try!

Use GET DIAGNOSTICS Only When Necessary for

Multi-Row Operations
Multi-row operations can be a huge CPU saver, as well as an elapsed time saver for sequential
applications. However, the GET DIAGNOSTICS statement is one of the most expensive
statements you can use in an application at about three times the cost of the other statements.
So, you don’t want to issue a GET DIAGNOSTICS after every multi-row statement. Instead use
the SQLCA first. If you get a non-negative SQLCODE from a multi-row operation then you can
use the GET DIAGNOSTICS statement to identify the details of the failure. For multi-row fetch
you can use the SQLERRD3 field to determine how many rows were retrieved when you get a
SQLCODE 100, so you never need to use a GET DIAGNOSTICS for multi-row fetch.

Database Design Tips for High Concurrency

In order to design high concurrency in a database there are a few tips to follow. Most of the
recommendations must be considered during design before the tables and table spaces are
physically implemented because to change after the data is in use would be difficult.
• Use segmented or universal table spaces, not simple
– Will keep rows of different tables on different pages, so page locks only lock rows for a
single table
• Use LOCKSIZE parameters appropriately
– Keep the amount of data locked at a minimum unless the application requires
exclusive access
• Consider spacing out rows for small tables with heavy concurrent access by using
MAXROWS =1
– Row level lock could also be used to help with this but the overhead is greater, especially
in a data sharing environment
• Use partitioning where possible
– Can reduce contention and increase parallel activity for batch processes
– Can reduce overhead in data sharing by allowing for the use of affinity routing to
different partitions
– Locks can be taken on individual partitions
• Use Data Partitioned Secondary Indexes (DPSI)
– Promotes partition independence and less contention for utilities
• Consider using LOCKMAX 0 to turn off lock escalation
– In some high volume environments this may be necessary. NUMLKTS will need to be
increased and the applications must commit frequently

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 143

TUNING TIPS

• Use volatile tables

– Reduces contention because an index will also be used to access the data by different
applications that always access the data in the same order
• Have an adequate number databases
– Reduce DBD locking if DDL, DCL and utility execution is high for objects in the same
database
• Use sequence objects
– Will provide for better number generation without the overhead of using a single
control table

Database Design Tips for Data Sharing

In DB2 data sharing the key to achieving high performance and throughput is to minimize
the amount of actual sharing across members and to design databases/applications to take
advantage of DB2 and data sharing. The best performing databases in data sharing are those
that are properly designed. The following are some tips for properly designing database for
data sharing use:
• Proper Partitioning Strategies
– Advantages
– Processing by key range
– Reducing or avoiding physical contention
– Reducing or avoiding logical contention
– Parallel processes are more effective
• Appropriate clustering and partitioning allows for better cloning of applications
across members
– Access is predetermined and sequential
– Key ranges can be divided among members
– Can better spread workload
– Reduce contention and I/O
– Can run utilities in parallel across members

144 ca.com
TUNING TIPS

Tips for Avoiding Locks

Lock avoidance has been around since Version 3, but are you getting it? Lock avoidance can
be a key component to high performance because to take an actual lock is about 400 CPU
instructions and 540 bytes of memory, to avoid a lock a latch is taken by the buffer manager
and the cost is significantly less. The process of lock avoidance compares the page log RBA
(last time a change was made on the page) to the CLSN(Commit Log Sequence Number) on
the log (last time all changes to the page set were committed) and then also checks the PUNC
(Possibly Uncommitted) bit on the page, and if it finally determines there are no outstanding
changes on the page, then the a latch is taken, instead of a lock. However, there are some
prerequisites to getting lock avoidance.
• Up to V8 you need to use CURRENTDATA(NO) on the bind. As of V8/V9
CURRENTDATA(YES) will allow the process of detection to begin
• You need to be bound ISOLATION(CS)
• Have a read-only cursor
• Commit often
– This is the most important point because all processes have to commit there changes in
order for the CLSN to get set. If the CLSN never gets set then this process can never work
and locks will always have to be taken. Hint: you can use the URCHKTH DSNZPARM to
find applications that are not committing

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 145

CA Performance
Handbook Supplement
for DB2 for z/OS
Table of Contents

SECTION 1 1
About This Supplement

SECTION 2 3
Ensure Efficient Application SQL for DB2 for z/OS
with CA Database Management
CA SQL-Ease® for DB2 for z/OS
CA Plan Analyzer® for DB2 for z/OS

SECTION 3 7
Ensure Efficient Object Access for DB2 for z/OS
with CA Database Management
CA Database Analyzer™ for DB2 for z/OS
CA Rapid Reorg® for DB2 for z/OS

SECTION 4 11
Monitor Applications for DB2 for z/OS with
CA Database Management
CA Insight™ Database Performance Monitor
for DB2 for z/OS
CA Detector® for DB2 for z/OS
CA Subsystem Analyzer for DB2 for z/OS
CA Thread Terminator Tool

SECTION 5 17
Further Improve Application Performance for
DB2 for z/OS with CA Database Management
CA Index Expert™ for DB2 for z/OS

About This Supplement

This supplement to the CA Performance Management Handbook for DB2 for z/OS provides
specific information on how CA Database Management for DB2 for z/OS addresses the
performance management challenges outlined in the handbook with an approach that reduces
CPU and DB2 resource overhead and streamlines labor intensive tasks with automated
processes. The supplement provides information to ensure efficient application SQL, ensure
efficient object access, monitor applications, and improve application performance.

About The CA Performance Management Handbook

for DB2 for z/OS
The handbook is a technical resource from CA that provides information to consider as you
approach database performance management, descriptions of common performance and
For more tuning issues during design, development, and implementation of a DB2 for z/OS application
and specific techniques for performance tuning and monitoring.
information,
visit • Chapter 1: Provides a overview information on database performance tuning
ca.com/db • Chapter 2: Provides descriptions of SQL access paths
• Chapter 3: Describes SQL tuning techniques
• Chapter 4: Explains how to design and tune tables and indexes for performance
• Chapter 5: Describes data access methods
• Chapter 6: Describes how to properly monitor and isolate performance problem
• Chapter 7: Describes DB2 application performance features
• Chapter 8: Explains how to adjust subsystem parameters, and configure subsystem
resources for performance
• Chapter 9: Describes tuning approaches when working with packaged applications
• Chapter 10: Offers tuning tips drawn from real world experience

The primary audiences for the handbook are physical and logical database administrators.
The handbook assumes a good working knowledge of DB2 and SQL, and is designed to help
you build good performance into the application, database, and the DB2 subsystem. It provides
techniques to help you monitor DB2 for performance, and to identify and tune production
performance problems.

Individuals can register to receive the handbook at ca.com/db.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 1

Page intentionally left blank
SECTION 2

Ensure Efficient Application SQL for DB2

for z/OS with CA Database Management

When developing DB2 applications it is critical that SQL statements within the application are
as efficient as possible to ensure good database and application performance. Inefficient SQL
can lead to poor application response times and high resource use and CPU costs within the
DB2 system in which it runs.

It is much more cost effective to develop efficient SQL from the outset –during development,
rather than to work to identify and fix inefficient SQL running in a live application.

Sometimes, the original design of the SQL is not entirely under your control. You may be
implementing Enterprise Resource Planning (ERP) applications using dynamic SQL and
packaged solutions on your systems. It is still important to ensure the SQL within these
applications is efficient. Tuning SQL in ERP applications can yield substantial performance
gains and cost savings.

For more Coding efficient SQL is not always easy since there are often a number of ways to obtain the
same result set. SQL should be written to filter data effectively while returning the minimum
information,
number of rows and columns to the application. Data filtering should be as efficient as possible
visit
using correctly coded and ordered predicates. Some SQL functions such as sorting should be
ca.com/db avoided if possible. If sorting is necessary, you should ensure your database design supports
indexes to facilitate the sort ordering.

Developing efficient SQL initially and subsequent tuning of existing SQL in your application is
very hard to do manually. CA provides two products to assist you with SQL development and
tuning, namely CA SQL-Ease® for DB2 for z/OS and CA Plan Analyzer® for DB2 for z/OS.

CA SQL-Ease® for DB2 for z/OS

CA SQL-Ease® for DB2 for z/OS (CA SQL-Ease) is designed to reduce the time and expertise
required to develop highly efficient SQL statements for DB2 applications. It saves time by
allowing developers to generate, test, analyze, and tune SQL online, within their ISPF editing
session, without needing to compile, link, bind and execute the program. Developer
productivity is improved as much of the manual coding effort is automated. CA SQL-Ease
improves SQL performance by analyzing SQL and using an expert rule system offers SQL
performance improvement recommendations as part of the development process so that SQL
is highly tuned before it is ever moved to production.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 3

When developing a new application, the programmer can use the GEN function to
automatically generate SELECT, INSERT, UPDATE or DELETE statements for a given table. The
programmer can choose which columns will be automatically included in the SQL statement
thus saving them having to type the column names. Program host variable memory structures
and DECLARE TABLE constructs can also be generated automatically. CA SQL-Ease supports
C, COBOL, assembler and PL/1 programming languages. The GEN function will also include
suggested filters in the WHERE clause based on indexes defined on the tables and example
join predicates for joined tables based on indexes and referential relationships.

The programmer can also use the NOTES function as an online SQL reference manual.
Once the programmer has finished formatting the SQL statement the SYNTAX function can
be used to check that everything is syntactically correct. The STAND function can be used to
convert the SQL statement into a standard format for easy viewing, understanding and
documentation purposes.

The PRED function allows the programmer to check how efficient the predicates coded for
the SQL statement are. It shows whether DB2 can use an index to evaluate the predicate and
whether it will be evaluated as stage 1 or stage 2.

The EXPLAIN function can be used to determine the access path that will be used by DB2 for
the SQL statement. It shows how the data will be accessed, whether by tablespace scan or
For more index access, whether prefetch is used and shows the level of DB2 locking. Estimated costs are
information, shown in ms, service units and TIMERONS.
visit
CA SQL-Ease also provides an Enhanced Explain function offering enhanced access path
ca.com/db
information and uses an expert system to provide recommendations to show how the
efficiency of the SQL could be improved. The expert system provides recommendations in 3
categories: SQL coding guidelines to improve the efficiency of the SQL statement, predicate
coding guidelines offering efficiency improvements for predicate filters and physical object
guidelines offering improvements to the object which will improve the SQL access to it.

CA SQL-Ease allows the programmer to manipulate the DB2 statistics for the object to allow
‘what-if’ scenarios to be played to see what effect different volumes of user data will have on
the efficiency of the SQL they are developing.

CA SQL-Ease also allows the programmer to execute the SQL statement to ensure the
expected result set is returned.

Enhanced Explain also offers eight different reports to fully document the SQL using summary
report, access path, cost estimate, predicate analysis, object dependency, RI dependency, tree
diagram and object statistics.

4 ca.com
CA Plan Analyzer® for DB2 for z/OS
CA Plan Analyzer for DB2 for z/OS (CA Plan Analyzer) is designed to improve DB2
performance by efficiently analyzing SQL and provides in-depth SQL reports and
recommendations to show you how to fix resource-hungry and inefficient SQL statements in
your application. SQL can be analyzed from any source and at any level from a single SQL
statement, a group of SQL statements right up to a complete DB2 application.

CA Plan Analyzer can analyze SQL from DB2 plans, packages, DBRMs, SQL statements and
QMF queries from the DB2 Catalog. It can also analyze statements from a DBRMLIB, a file,
an exported QMF query or statements entered online. You can create logical groups of SQL
sources using SQL from any of the sources listed by creating a strategy. The strategy stores
the logical grouping and means you can create a logical grouping for an application or a group
of applications if you wish.

CA Plan Analyzer is also integrated with CA Detector for DB2 for z/OS so SQL can also be
analyzed online directly from within CA Detector’s application performance management
functions. This means that inefficient SQL identified in real time by CA Detector can be
analyzed by CA Plan Analyzer to help you fix your performance problem. This is especially
useful for ERP environments where dynamic SQL is used. CA Detector allows you to capture
problematic dynamic SQL statements and pass them to CA Plan Analyzer to identify the cause
For more of the problem.
information, CA Plan Analyzer uses the same Enhanced Explain capability and expert system as CA SQL-
visit Ease to offer recommendations showing how you can improve the efficiency of your SQL by
ca.com/db making changes to SQL coding, predicate coding or by making changes to physical objects.
CA Plan Analyzer adds further expert system support by providing recommendations to
improve SQL efficiency based on making changes to plans and packages. The same eight
Enhanced Explain reports are available namely: summary report, access path, cost estimate,
predicate analysis, object dependency, RI dependency, tree diagram and object statistics.

One of CA Plan Analyzer’s most powerful features provides you with the ability to monitor SQL
statements and their efficiency over time. Each time you use the Explain features of CA Plan
Analyzer, the results are stored away within an historical database. This means that as you
enhance your applications, or make physical database object changes, CA Plan Analyzer can
compare your SQL access paths to those previously and highlight any changes which may
affect the performance of your application.

This feature is especially useful when you upgrade to a new DB2 version. Binding your plans
and packages on a new DB2 version may lead to different access paths. CA Plan Analyzer
can evaluate your access paths on the new DB2 version without needing to bind your plans
or packages. The powerful Compare feature allows you to identify changed SQL statements,
changed host variables and changed access paths with powerful filtering based on changes in
statement cost using ms, service units and TIMERONS. By using the Compare feature, you only
need to evaluate SQL statements whose access paths have changed rather than every SQL
statement in your entire application.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 5

CA Plan Analyzer also allows you to evaluate the efficiency of your application on another DB2
system without needing to move it there. This is especially useful when you have a new or
enhanced application and you want to evaluate how efficient the SQL will be with the large
data volumes of your production DB2 environment.

CA Plan Analyzer also gives you access to Statistics Manager for analyzing SQL performance
by projecting data statistics so you can create ‘what if’ scenarios based on different data
volumes within your DB2 objects.

CA Plan Analyzer further provides powerful features allowing you to identify problem SQL
within your applications. You can search for SQL whose access characteristics may be
undesirable, so for example you can find all statements that use exclusive (X) DB2 locking or
all statements that perform tablespace scans. Along similar lines you can search for problem
plans or packages which may use undesirable bind characteristics, such as all plans that use
uncommitted read as an isolation level.

In addition, Enhanced Explain CA Plan Analyzer provides a whole series of plan, package,
DBRM, statement and object reports allowing the DBA to administer the total SQL environment.
Working directly from any report, you can BIND, FREE, or REBIND some or all of the plans or
packages listed by entering primary or line commands. All utilities can be executed both online
and in batch processing.
For more
information, CA Plan Analyzer also provides a Stored Procedures Maintenance facility giving you the ability
visit to create, update, browse, delete, and start stored procedures and maintain your DB2
SYSPROCEDURES table.
ca.com/db

6 ca.com
SECTION 3

Ensure Efficient Object Access for DB2 for

z/OS with CA Database Management

Tuning your application SQL statements is critical but all that effort could be wasted if the
DB2 Catalog statistics for your application objects did not accurately reflect the actual
data volumes stored in your application objects, or if your physical DB2 objects became
excessively disorganized.

Efficient SQL access paths rely on accurate DB2 Catalog statistics. When you bind your
application plans and packages, or if you are using dynamic SQL, the DB2 optimizer will use
DB2 Catalog statistics to help determine which access path to use for a given SQL statement.
For example, if the number of rows in a table is very small, DB2 may choose to use a
tablespace scan rather than index access. This provides reasonable results as long as the data
volumes in that table remain small. If the actual data volumes in the table become very large,
using a tablespace scan would likely cause application performance to suffer.

Keeping your DB2 Catalog statistics current requires you to collect RUNSTATS on a regular
For more basis. This is even more important today if you have ERP applications using dynamic SQL
information, because data volumes can vary widely. Some tables can contain 0 rows one minute and then
visit an hour later hold over a million. When would you run RUNSTATS on such an object? Even
ca.com/db with dynamic SQL, the DB2 optimizer still has to rely on the DB2 Catalog statistics being
accurate to choose the best access path. The key then is to collect RUNSTATS regularly. Given
the immense volumes of data in large enterprises, running RUNSTATS can be a very expensive
and time-consuming exercise.

Application performance can also be badly impacted if your physical DB2 objects are
excessively disorganized. The more work that DB2 has to do to physically locate the desired
data rows, the longer your application will wait. ERP applications are notorious for spawning
highly disorganized data. First, ERP applications tend to perform high volumes of INSERT
statements which cause indexes to become badly disorganized and wasteful of space. Next,
since most ERP objects contain only variable length rows, any UPDATE to a row that causes
the row length to change can mean the row gets relocated away from its home page.
Finally, because of the high volumes of transactions common is such ERP systems, physical
objects can quickly become badly disorganized. This can impact the performance of your
application significantly.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 7

CA Database Analyzer™ for DB2 for z/OS
CA Database Analyzer™ for DB2 for z/OS (CA Database Analyzer) is designed to help you
keep DB2 databases and SQL applications performing optimally by ensuring DB2 objects never
become disorganized. Additionally, CA Database Analyzer can be used to replace RUNSTATS
and collect DB2 Catalog statistics faster. This allows you to collect statistics more frequently
with less impact to your system. Using CA Database Analyzer, there is never an excuse for
having out-of-date DB2 Catalog statistics.

CA Database Analyzer allows you to define Extract Procedures. These define a logical grouping
of DB2 objects and supports extensive wildcarding of object names. It is simple to specify that
you want to include all objects based on a creator ID of perhaps SAPR3, thus allowing all
objects to be included for SAP using the schema name of SAPR3. So, a simple Extract
Procedure definition could include many thousands of DB2 objects for a complete ERP
application. Since object names are evaluated at run time, any newly created objects will
automatically be included.

Extract Procedures can be used to collect statistics, including RUNSTATS statistics, for all
objects which meet the selection criteria. The process uses multi-tasking so multiple objects
are processed in parallel. The process runs very quickly and does not impact DB2 or your DB2
applications.
For more
In addition to collecting RUNSTATS statistics, an Extract Procedure will collect approximately
information, twice the number of data points for each object that RUNSTATS does. The other major feature
visit of statistics collection using an Extract Procedure is that all data points for every execution are
ca.com/db stored away within a database. This allows for online reporting and also provides the
opportunity for identifying trends in data volumes and allows you to forecast when an event
will occur based on the current growth.

The reporting function allows you to view the data points which have been collected for each DB2
object. Twenty-one different reports are available for tablespace objects while sixteen reports
are provided of index objects. Each report can be viewed as a data query, a graph, a trend or a
forecast. The latest data can be viewed or weekly, monthly, or in yearly aggregated values.

Another powerful feature in CA Database Analyzer gives you the ability to selectively generate
DB2 utility jobs based on object data point criteria. So, for example, you can specify that you
want to run a REORG if the cluster ratio of any tablespace is lower than say 60%.

This is achieved by creating an Action Procedure. An Action Procedure defines two things. You
may specify under which conditions you would like to trigger a DB2 utility. Over 100 example
triggers are provided in the tool and these can be evaluated against DB2 Catalog statistics,
data points with the database and DB2 Real Time Statistics. You may also set a threshold value
for many of the triggers. Once you have specified your triggers, you specify which DB2 utilities
you would like to trigger. This supports all of the IBM utility tools, all of the CA utility tools, and
various other utility programs and user applications.

By tying an Action Procedure to an Extract Procedure you link the objects you are interested in
to the triggers and trigger thresholds.

8 ca.com
Using Extract and Action Procedures in this way, it is easy to not only ensure you have the
latest statistics for your application objects, but you can also generate utility jobs, such as CA
Rapid Reorg to REORG objects which are badly disorganized — automatically. This means you
can REORG the objects which need to be reorganized and not waste time reorganizing other
objects which are not disorganized. This approach helps you keep your applications performing
at the optimum level whilst not wasting any of your valuable CPU resources.

CA Rapid Reorg® for DB2 for z/OS

CA Rapid Reorg® for DB2 for z/OS (CA Rapid Reorg) is a high-speed REORG utility that is
designed to reorganize DB2 objects in the smallest amount of time. While providing the
highest performance to cope with the huge data volumes common in today’s databases, CA
Rapid Reorg also maintains the highest levels of data availability while the REORG is running.
This means your applications can continue to run in parallel to the REORG process. CA Rapid
Reorg is also highly efficient during execution; this means CPU use is low and thus minimizes
the impact on system resources.

High levels of performance are provided by using multi-tasking and parallel processing of
partitions. Indexes can be built in parallel to the tablespace partitions and non-partitioning
For more indexes can either be completely rebuilt or updated. Sorting of data is performed efficiently and
data is sorted in its compressed form. Clever use of z/OS dataspaces allows buffer areas to be
information,
placed in fast storage rather than using slow work files. Lastly, a Log Monitor address space can
visit
be used to filter log records for use during the high speed log apply phase of an online REORG.
ca.com/db
With 24x7 application availability becoming the norm, online reorganization is a necessity.
An online REORG uses a shadow copy of the DB2 object where the reorganized data is written
and log records are applied. Once the REORG process has finished the shadow copy of the
DB2 object is switched to become the ‘live’ copy of the object. This switch phase can cause
problems for applications, particularly ERP applications where long running transactions can
hold locks for many hours without committing.

CA Rapid Reorg provides the greatest flexibility to control the switch phase for an online
REORG. Whatever REORG utility you use, there will be a short time when availability of the
DB2 object is suspended while the switch phase occurs. With CA Rapid Reorg, you get to
choose and control when access to the object is suspended. You can choose that the REORG
will wait until all application CLAIMS have completed before the REORG will suspend access.
Once access is suspended the switch will happen quickly using FASTSWITCH. Using this
approach you can avoid the situation with other REORG tools where a DRAIN is issued at the
switch phase and has to wait for long running transactions to complete while at the same time
preventing other SQL CLAIMS from the application from running. CA Rapid Reorg provides the
best control over application availability for online REORG.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 9

CA Rapid Reorg is integrated with the utility generation features of CA Database Analyzer.
CA Database Analyzer can be used to determine the most critical application objects which
need to be reorganized based on user supplied selection criteria. It can then generate the JCL
required to invoke CA Rapid Reorg.

Using CA Rapid Reorg you can maintain the performance of your applications by ensuring
the data is fully organized. CA Rapid Reorg is so efficient you may find you can afford to run
REORG much more frequently, thus improving your application performance further.

For more
information,
visit
ca.com/db

10 ca.com
SECTION 4

Monitor Applications for DB2 for z/OS with

CA Database Management

After you have spent time and effort tuning your application SQL to be as efficient as possible
and tuning your DB2 objects for high performance, it is always disappointing when your
application performance does not meet your expectations. There could be many reasons for
this. You may have bottlenecks within the DB2 system itself and you need to find them and
take corrective action. Your application workload and data volumes may not be what you
expected and these can cause performance bottlenecks. Additionally, these volumes can
change over time and impact performance at some point in the future. Lastly, if you are using
an ERP application using dynamic SQL, these are the most dynamic environments to manage
and are the hardest to maintain with good stable performance.

Monitoring your application performance and making changes to improve performance is a

daily task and can be a time consuming exercise. Additionally, it is hard to easily find where
the performance bottlenecks lie. You may have thousands of SQL statements within your
application. Which ones are causing the performance problems? Should you be worried about
For more the tablespace scan in your application which runs once a day or maybe the highly efficient
information, matching index scan which runs millions of times a day? How do you evaluate which one uses
visit the most CPU and DB2 resources?
ca.com/db
The only way to answer these questions is to monitor your DB2 systems and applications
focusing on your most serious performance problems and address those as they happen. The
problem however is that this is not easy. DB2 provides accounting traces, performance traces
and statistics reports, but these are difficult to interpret, are not automatic and their use
impacts the performance of the very systems you want to monitor. These resources also do not
give you all the information you need or collate the data into a useful format so that you can
identify where your performance problems lie. This is where automation is necessary and you
need tools to help you with this.

CA Insight™ Database Performance Monitor for DB2 for z/OS

CA Insight™ Database Performance Monitor for DB2 for z/OS (CA Insight DPM) is a
comprehensive real-time performance monitor for your DB2 systems and applications. It
collects data from z/OS subsystem interfaces, DB2 and z/OS control blocks, zIIP processor
fields and DB2 performance traces and provides you online access to critical performance
statistics. This allows you to quickly identify performance problems in real-time and to take
corrective actions. CA Insight DPM can monitor DB2 subsystems, DB2 connections from CICS,
IMS, NATURAL and network connected applications from outside the z/OS world, as well as
application statistics to assess and troubleshoot problems as they arise.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 11

CA Insight DPM is designed to operate with the lowest level of overhead. Rather than requiring
all account traces and all performance traces to be turned on, it dynamically enables traces as
they are required. The low overhead means you can afford to run CA Insight DPM 24 x 7,
meaning you can monitor your DB2 systems and applications all of the time and never miss a
performance problem.

CA Insight DPM provides a System Condition Monitor function. The System Condition Monitor
provides a single point for viewing overview status information for all DB2 systems you are
monitoring. This is the starting point for finding performance problems and then drilling down
to identify the cause.

DB2 system activity is collected based on user-defined time intervals. A data collector task is
used to collect the DB2 information for monitoring use. DB2 system information is displayed as
an accumulation of all intervals, or the difference or delta between the current and most recent
interval. Information is provided on buffer pool usage, EDM pool usage, storage usage, locks,
log activity and SQL activity counts. The data collector can be customized for each DB2
subsystem, making it easy to vary the set of collected performance information from one
subsystem to the next and get the precise amount of detail desired.

When viewing DB2 application thread activity, you can drill down into a thread for deeper
analysis of potential problems, including determining how long the thread has been active,
For more how much of that time is spent in DB2 and how much time is spent waiting for DB2 resources.
information, Thread information includes SQL text, timing information, SQL counts, buffer pool activity, lock
visit activity, Distributed Data Facility (DDF) data and Resource Limit Facility (RLF) data. Having
ca.com/db identified a poorly performing SQL statement, you can invoke EXPLAIN to identify the access
path and see why you have that problem. From there you can make the decision as to how you
will fix the problem.

One of the most powerful features of CA Insight DPM is the Exception Monitor. CA Insight
DPM includes hundreds of predefined exception conditions that can be selectively activated for
system events, application events and SQL events. When a DB2 processing limit is reached or
exceeded, the exception will be highlighted. This allows you to instantly see when a potential
performance issue exists and react to fix the problem. An exception can also be configured to
submit a user-defined Intelligent Module (IMOD) to automatically initiate corrective action.

By setting thresholds in the Exception Monitor, you do not need to constantly watch the
screens looking for potential performance problems. When they occur, CA Insight DPM will
notify you.

One further powerful feature of CA Insight DPM is the implementation of Insight Query
Language (IQL). All screen displays within CA Insight DPM are built using IQL. CA Insight
DPM is supplied with many built-in requests, all of which can be customized using the IQL
language. You may also write new on demand requests to address new requirements. This
makes CA Insight DPM very flexible and extensible for your environment and particular
monitoring needs.

12 ca.com
CA Detector® for DB2 for z/OS
CA Detector® for DB2 for z/OS (CA Detector) is an application performance analysis tool for
DB2 which provides unique insight into your application workload and resource use. It uniquely
allows you to view activity at the application, plan, package, DBRM and SQL statement level.

CA Detector allows you to identify the programs and SQL statements that most significantly
affect your DB2 system performance.

CA Detector collects its data exclusively from accounting trace information. Accounting trace
data imposes a significantly lower monitoring overhead on your DB2 system than performance
traces will. Additionally, the use of accounting data allows CA Detector to provide its unique
focus on performance, not at the DB2 system level, but at the application level. This means you
can tune your application to perform efficiently on your system, rather than tune your system
to make-up for deficiencies in your application.

CA Detector analyzes all SQL than runs on your system, both dynamic and static. Its low
overhead means that you can run Detector 24x7 and never miss an application performance
problem. All information collected by CA Detector is stored in a datastore which rolls into a
history datastore driven by the user defined interval period. This means you can store
information concerning performance of your system over time. Not only that, you can unload
this if you wish and process it to further refine your performance data.
For more
information, With CA Detector, you specify which plans, packages, DBRMs and SQL statements belong to
visit which application. This can be done using wildcards to include or exclude SQL sources from
ca.com/db your defined application. CA Detector aggregates the information for all of the SQL sources
within your application into a single view of performance data for your application. You can drill
down from the application level to view the plans, from there the packages and from there
down to the individual SQL statements.

The unique capability of CA Detector is that on any display it always shows the most resource
intensive application, plan, package or SQL statement. For example, if you know you have a
poorly performing application, you can drill down to see which is the most resource intensive
plan. From there you can drill down and see which is the most resource intensive package, then
SQL statement. From there you can view the SQL statement text and even jump to CA Plan
Analyzer to take advantage of all of the enhanced explain capabilities of that tool to help solve
the problem.

This capability allows you to find the operations within your application which are consuming
the most DB2 resources based on how your users are using the application. This can highlight
for example if that single daily execution of a tablespace scan in your application is actually any
worse in performance terms than the index scan which run millions of times a day.

In addition to viewing data at the application level, CA Detector allows you to view
performance data at the DB2 system level and at the DB2 member level when DB2 data-
sharing is used.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 13

Another powerful feature of CA Detector is the ability to view exceptions. Exceptions can be
defined by users using threshold values. Any SQL statement which exceeds the threshold value
will be highlighted in the exception display for further investigation. This feature provides an
automated method for monitoring for bad SQL which consumes vast amounts of DB2 resources.
These SQL statements are the statements to focus on to improve performance on a daily basis.

In a similar way, CA Detector has the ability to look for SQL error conditions occurring within
your application. You may have SQL transactions which are constantly receiving a negative
SQL code when trying to update a table. Unless the application is written to notify you of those
failures, you will never know they are happening. CA Detector will trap the SQL errors and
show them to you in the SQL Error display. From here you can proactively go and fix the problem.

With the increased use of dynamic SQL within applications, particularly in ERP applications, it
is even more important that you can manage the performance of dynamic SQL statements in
the same way as static SQL. CA Detector collects performance information for dynamic SQL in
exactly the same way as it does for static SQL. There is no need to set an exception to do this.
CA Detector collects the SQL text for dynamic SQL and can even help you identify when
similar dynamic statements are failing to take advantage of the statement cache because of
the use of constants or literals in the SQL text. Making simple changes to use parameter
markers and host variable can significantly improve the performance of your applications by
taking full advantage of the dynamic SQL cache.
For more
information,
visit
ca.com/db CA Subsystem Analyzer for DB2 for z/OS
CA Subsystem Analyzer for DB2 for z/OS (CA Subsystem Analyzer) is a complementary
product to CA Detector and uses the same collection interval process and datastore. While CA
Detector looks at how SQL is performing within your applications, CA Subsystem Analyzer is
designed to show you how those applications are impacting your DB2 resources, such as buffer
pools, EDM pools, RID pools, DASD volumes, datasets and even dataset extents.

CA Subsystem Analyzer works in the same way as CA Detector and always shows you the
most active item in the list, whether it is a list of DASD volumes or a list of buffer pools. With
CA Subsystem Analyzer, you can take a broad, system-wide view to identify and examine the
most active DB2 databases, tablespaces, tables, indexes, buffer pools and DASD volumes, then
drill down logically to look at specific details.

Tight process flow integration between CA Subsystem Analyzer and CA Detector means that
you can seamlessly and automatically transition between the two products. For example,
having drilled down to view the busiest DB2 table in your system using CA Subsystem
Analyzer, you can select that you want to view the SQL accessing that table. You are then taken
seamlessly into CA Detector to view those SQL statements where you can jump to CA Plan
Analyzer if you like to take advantage of the enhanced explain capabilities of that tool.

When trying to improve application performance, it is not always the application SQL which is
causing the problem. Sometimes you need to make changes to your DB2 system to improve
performance or remove a processing bottleneck.

14 ca.com
It is sometimes hard to determine which objects are being used most frequently and should
be isolated from each other to avoid contention. With CA Subsystem Analyzer it is easy. For
example, you may notice that a certain buffer pool has very high activity. You can drill down to
see which active objects are using that buffer pool. You will see a list of tablespaces showing
the tablespace with the highest activity at the top. You can even drill down further to view the
most active table and from there view the SQL activity occurring on that table if you like. You
may decide that one of the tablespaces should be moved to a different buffer pool to reduce
contention and improve performance.

CA Subsystem Analyzer allows you to view your system performance data at the DB2 system
level and at the DB2 member level when DB2 data-sharing is used.

CA Thread Terminator Tool

CA Thread Terminator is one of the tools supplied within the CA Value Pack.

CA Thread Terminator is a very powerful tool that helps you manage your DB2 system and
application in real-time. The tool allows you to make changes to your DB2 environment
immediately, whereas to make that change using normal DB2 techniques may normally require
you to re-cycle DB2.
For more
information, You can make a whole list of changes to your system parameters including changes to buffer
visit pool sizes, security parameters, logging parameters, application values, performance values,
ca.com/db storage sizes including EDM pool, thread parameters and operator parameters etc. You can
also add or delete active log datasets.

It is also possible to automate changes to these parameters by creating a schedule. A schedule

lets you specify the change you want to make and allows you to specify when you want that
change to take effect, maybe at 10 pm each day. The schedule will then implement that change
at that time. Using schedules, it is possible to make significant changes to your DB2 system
parameters to reflect the type of processing occurring at any point in the day. For example,
during the day you may need buffer pools tuned for random I/O to support online transactions,
whilst at night you need them to be tuned for batch transactions which tend to process
data sequentially.

CA Thread Terminator can also show you a list of all active DB2 threads for all DB2 systems.
From a thread, you can drill down and view the thread detail including the SQL text. A large
amount of information concerning in DB2 times, wait times and I/O counts can be seen for each
thread. For dynamic SQL you can see the SQL text and also the contents of the host variables.

A common problem, particularly with ERP applications with dynamic SQL, is run-away threads.
These are SQL requests which may run for many hours and never finish. The problem with
these is that they consume system resources and potentially hold locks preventing other
transactions and utilities, such as REORG, from continuing. With CA Thread Terminator, you
can firstly view the SQL and host variable for the run-away thread. You can evaluate the DB2
statistics and see the amount of resource the thread has consumed. You can also choose to
cancel that thread if you like to release the DB2 resources that it holds. The cancel request will
work even if the status of the thread is not in DB2.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 15

CA Thread Terminator also allows you to set-up thread monitor profiles. In these profiles, you
can set maximum threshold values for DB2 resources over which any thread will automatically
be cancelled. This provides you with an automated approach to identify and cancel run-away
threads in your application.

Another useful feature of CA Thread Terminator, allows you to terminate all threads accessing
a particular table or pageset. This is a useful feature if you are in a situation where you have to
REORG and object, by locks associated with threads are preventing the REORG from running.

For more
information,
visit
ca.com/db

16 ca.com
SECTION 5

Further Improve Application Performance for

DB2 for z/OS with CA Database Management

After designing and implementing an efficient SQL application in your production system,
sometimes performance does not meet your expectations. Alternatively, you may have
implemented an ERP application only to find that performance is poor.

Monitoring your application using tools like CA Detector and CA Insight DPM can highlight
where the performance problem lies. You may be able to make further SQL performance
improvements highlighted by the enhanced explain capability of CA Plan Analyzer.
Additionally, you may be able to make database and system improvements highlighted
by CA Subsystem Analyzer.

One area where performance can be improved further is effective index design. No doubt you
will have designed your indexes to support fast access to the table data. The problem however,
is that when you designed your indexes, you could never be certain what the distribution of
your data would look like. It is only after you are in production and when production data
For more volumes are loaded can you see how effective your indexes are. Worst still, if you are using an
information, ERP system such as SAP, you may have had no control over the design of indexes which are
visit implemented.
ca.com/db Whatever indexes you have in your application, it may be that they do not provide the
performance benefits you expected. You may also have indexes which the DB2 optimizer never
chooses to use. Both of these situations can impact application performance.

CA Index Expert™ for DB2 for z/OS

CA Index Expert™ for DB2 for z/OS (CA Index Expert) makes the extremely complex and
crucial job of index tuning much easier. This tool generates indexing recommendations based
on an expert system. CA Index Expert analyzes the SQL statements used in your applications,
along with execution frequency statistics, compares the analysis against a set of indexing rules,
and recommends the most appropriate indexes to optimize DB2 application performance.

With CA Index Expert, you can analyze an entire application quickly and efficiently, and make
intelligent index design decisions. This product saves time and reduces errors by analyzing
complex SQL column dependencies and reviewing SQL to determine which DB2 objects
are referenced.

CA Index Expert recommends indexing strategies at the application level with suggestions
based on actual SQL usage. This is an important benefit. Using data imported from CA Detector,
CA Index Expert knows how existing indexes and tables are referenced and how often.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 17

You can explore various ‘what-if’ scenarios to improve the use of indexes in your application.
Building efficient indexes can significantly improve the performance of your DB2 application.
This is of particular value when working with ERP applications such as SAP using dynamic SQL,
where performance improvements are needed, but the actual SQL is unavailable. In these
situations, customized indexes can be added and redundant indexes can even be dropped to
improve the performance based on your usage of the application.

Using CA Detector, it is easy to identify a problem SQL transaction. Analysis of the SQL using
the enhanced explain capability of CA Plan Analyzer may find that index access is not being
used for some reason. Remember, if this is an ERP application you may not be able to change
the SQL to resolve the problem. Using CA Index Expert you can obtain recommendations for
changes to the index design which would allow index access to be used. Implementation of this
index design will solve your application performance problem.

For more
information,
visit
ca.com/db

18 ca.com
CA, one of the world’s largest information technology (IT)
management software companies, unifies and simplifies
complex IT management across the enterprise for greater
business results. With our Enterprise IT Management vision,
solutions and expertise, we help customers effectively
govern, manage and secure IT.

HB05ESMDBMS01E MP321131007

Communication Protocol of CDM6240 Cash Dispenser
No ratings yet
Communication Protocol of CDM6240 Cash Dispenser
24 pages
Mainframe Interview Cases
From Everand
Mainframe Interview Cases
Krishna Rath
No ratings yet
Docker Certified Associate (DCA) Practice Tests - Four Full Length Practice Exams
No ratings yet
Docker Certified Associate (DCA) Practice Tests - Four Full Length Practice Exams
453 pages
DB2 SQL Tuning Best Practices
No ratings yet
DB2 SQL Tuning Best Practices
22 pages
A Guide to Db2 Performance for Application Developers: Code for Performance from the Beginning
From Everand
A Guide to Db2 Performance for Application Developers: Code for Performance from the Beginning
Craig S. Mullins
No ratings yet
HADR
No ratings yet
HADR
46 pages
Db2 E1 Training Material Ver1.0
No ratings yet
Db2 E1 Training Material Ver1.0
101 pages
db2 Perf Tune 115
100% (1)
db2 Perf Tune 115
702 pages
IMS-DB Basic Training For Application Developers
From Everand
IMS-DB Basic Training For Application Developers
Robert Wingate
No ratings yet
Ge Energy: Toolboxst User Guide
No ratings yet
Ge Energy: Toolboxst User Guide
278 pages
DB2 UDB Performance Tuning Guidelines
No ratings yet
DB2 UDB Performance Tuning Guidelines
32 pages
IBM DB2 9.7 Advanced Application Developer Cookbook
From Everand
IBM DB2 9.7 Advanced Application Developer Cookbook
Mohankumar Saraswatipura
No ratings yet
Natural DB2 PDF
100% (1)
Natural DB2 PDF
268 pages
Db2 SQL Tuning
No ratings yet
Db2 SQL Tuning
92 pages
DB2 zOS DSNTEP2 Analysis PDF
No ratings yet
DB2 zOS DSNTEP2 Analysis PDF
11 pages
DB2 Performance and Query Optimization
No ratings yet
DB2 Performance and Query Optimization
342 pages
Mainframe 2013 PDF
No ratings yet
Mainframe 2013 PDF
407 pages
Reina - DB2 Optimizer Guidelines Usage
No ratings yet
Reina - DB2 Optimizer Guidelines Usage
25 pages
Db2 SQL Tuning Tips
100% (1)
Db2 SQL Tuning Tips
11 pages
How To Make SMF Report
No ratings yet
How To Make SMF Report
114 pages
DB2 For z/OS: Utilities and Application Development
No ratings yet
DB2 For z/OS: Utilities and Application Development
36 pages
DB2
No ratings yet
DB2
28 pages
DB2 Self Test
No ratings yet
DB2 Self Test
12 pages
DB2 Query Tuning and Index Creation
No ratings yet
DB2 Query Tuning and Index Creation
82 pages
Db2 Built-In Mods 115
No ratings yet
Db2 Built-In Mods 115
242 pages
DB2 Utilities
No ratings yet
DB2 Utilities
22 pages
Zosmf User Exp
No ratings yet
Zosmf User Exp
64 pages
SMF: Old Dog Learning New Tricks: Ibm Z Systems 2016 Fall Naspa
No ratings yet
SMF: Old Dog Learning New Tricks: Ibm Z Systems 2016 Fall Naspa
32 pages
Db2 Tuning
No ratings yet
Db2 Tuning
4 pages
Introduction To DB2
No ratings yet
Introduction To DB2
19 pages
Db2 Stored Procedures and Udfs: A Primer: Quest Software
100% (1)
Db2 Stored Procedures and Udfs: A Primer: Quest Software
44 pages
CICS
No ratings yet
CICS
155 pages
IBM DB2 RUNSTATS Utility and Real-Time Statistics
No ratings yet
IBM DB2 RUNSTATS Utility and Real-Time Statistics
66 pages
DB2
No ratings yet
DB2
63 pages
db2 SQL Procedural Lang 115
No ratings yet
db2 SQL Procedural Lang 115
114 pages
z/OS Basics: Q & A On A Dump Using Ipcs: Jerry NG IBM Poughkeepsie
No ratings yet
z/OS Basics: Q & A On A Dump Using Ipcs: Jerry NG IBM Poughkeepsie
54 pages
Sortnew PDF
No ratings yet
Sortnew PDF
45 pages
Using IDCAMS To Manage VSAM Data Sets: Session 12998 Presented by Michael E. Friske
No ratings yet
Using IDCAMS To Manage VSAM Data Sets: Session 12998 Presented by Michael E. Friske
40 pages
IsPF - An Experienced User Shares His Secrets
100% (2)
IsPF - An Experienced User Shares His Secrets
84 pages
Cursor in DB2
No ratings yet
Cursor in DB2
41 pages
MVS Commands
No ratings yet
MVS Commands
6 pages
DB2 Universal Database: SSI-Hub Texas A&M
100% (3)
DB2 Universal Database: SSI-Hub Texas A&M
22 pages
DB2 Plan
No ratings yet
DB2 Plan
18 pages
Db2 DBA Planning
No ratings yet
Db2 DBA Planning
415 pages
Mainframe Careers:: Information Technology's Best-Kept Secret
No ratings yet
Mainframe Careers:: Information Technology's Best-Kept Secret
23 pages
Ims ppt1
100% (1)
Ims ppt1
74 pages
Tso-E Clist
100% (1)
Tso-E Clist
222 pages
VSAM
No ratings yet
VSAM
94 pages
DB2 Bootcamp HOLs Workbook v20120418
No ratings yet
DB2 Bootcamp HOLs Workbook v20120418
335 pages
Mvs
100% (5)
Mvs
98 pages
DFSORT
No ratings yet
DFSORT
958 pages
Sr. Mainframe DB2 Database Administrator
No ratings yet
Sr. Mainframe DB2 Database Administrator
5 pages
DB2 Batch
No ratings yet
DB2 Batch
13 pages
db2 Dev Routines 115 PDF
No ratings yet
db2 Dev Routines 115 PDF
244 pages
DB2 For SAP Migration
100% (1)
DB2 For SAP Migration
298 pages
PLI Basic Training Using VSAM, IMS and DB2
From Everand
PLI Basic Training Using VSAM, IMS and DB2
Robert Wingate
1/5 (1)
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
MVS JCL Utilities Quick Reference, Third Edition
From Everand
MVS JCL Utilities Quick Reference, Third Edition
Robert Wingate
5/5 (1)
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
Oracle Data Guard A Clear and Concise Reference
From Everand
Oracle Data Guard A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
COBOL for the Approved Workman
From Everand
COBOL for the Approved Workman
Wesley Sweetser, Jr
No ratings yet
Oracle Database Mastery: Comprehensive Techniques for Advanced Application
From Everand
Oracle Database Mastery: Comprehensive Techniques for Advanced Application
Adam Jones
No ratings yet
Mvs Jcl in Plain English
From Everand
Mvs Jcl in Plain English
Donna Kelly
5/5 (1)
Photoshop Lightroom For Beginners 23 May 2021
No ratings yet
Photoshop Lightroom For Beginners 23 May 2021
84 pages
Pioneer BDP-180
No ratings yet
Pioneer BDP-180
64 pages
Outdoor Photography - Issue 263 - December 2020
No ratings yet
Outdoor Photography - Issue 263 - December 2020
102 pages
Audio Esoterica April 2021
No ratings yet
Audio Esoterica April 2021
86 pages
Docker Certified Associate Part2 Image Creation Management and Registry
No ratings yet
Docker Certified Associate Part2 Image Creation Management and Registry
21 pages
Docker Certified Associate Part1 Orchestration
No ratings yet
Docker Certified Associate Part1 Orchestration
14 pages
Guide To Syncsort PDF
No ratings yet
Guide To Syncsort PDF
41 pages
CA Performance Center PDF
No ratings yet
CA Performance Center PDF
71 pages
CA Performance Center PDF
No ratings yet
CA Performance Center PDF
71 pages
Programming On Parallel Machines
100% (2)
Programming On Parallel Machines
347 pages
Aysnchronous Apex
No ratings yet
Aysnchronous Apex
18 pages
CST Jss 2 Mid-Term Question
No ratings yet
CST Jss 2 Mid-Term Question
7 pages
RC-800 Presentation 2014
100% (1)
RC-800 Presentation 2014
14 pages
Cisco Cobras For Unity Connection Unrestricted 0
No ratings yet
Cisco Cobras For Unity Connection Unrestricted 0
6 pages
Colocation America Announces Superior Connectivity at Los Angeles and New York Data Centers
No ratings yet
Colocation America Announces Superior Connectivity at Los Angeles and New York Data Centers
2 pages
File Handling Mcqs 1. To Open A File C:/test - TXT For Reading, We Should Give The
No ratings yet
File Handling Mcqs 1. To Open A File C:/test - TXT For Reading, We Should Give The
19 pages
Ax Pert Tutorial
No ratings yet
Ax Pert Tutorial
50 pages
ITEC582 Chapter18
No ratings yet
ITEC582 Chapter18
36 pages
Example: SQL Statements in COBOL and ILE COBOL Programs: Send Feedback Rate This Page
No ratings yet
Example: SQL Statements in COBOL and ILE COBOL Programs: Send Feedback Rate This Page
11 pages
Inverter Aurora ABB Power One Web Monitor WIM With
No ratings yet
Inverter Aurora ABB Power One Web Monitor WIM With
14 pages
Ascrngenpdf New
No ratings yet
Ascrngenpdf New
46 pages
Data Sheet: Communication Unit 560CMR01
No ratings yet
Data Sheet: Communication Unit 560CMR01
5 pages
Programming Coursework - Marking Scheme
No ratings yet
Programming Coursework - Marking Scheme
1 page
Adaudit Plus Datasheet
No ratings yet
Adaudit Plus Datasheet
2 pages
9600 S-RAM Rebuild Rev6
100% (3)
9600 S-RAM Rebuild Rev6
13 pages
Large and Fast: Exploiting Memory Hierarchy: The Hardware/Software Interface
No ratings yet
Large and Fast: Exploiting Memory Hierarchy: The Hardware/Software Interface
33 pages
Revised Aimcat
No ratings yet
Revised Aimcat
2 pages
Hive in Class Assignment Winter 2021
No ratings yet
Hive in Class Assignment Winter 2021
2 pages
Nys Internship Program
No ratings yet
Nys Internship Program
3 pages
Exercise 3.3: Configure Probes: Simpleapp - Yaml
No ratings yet
Exercise 3.3: Configure Probes: Simpleapp - Yaml
5 pages
Et200sp System Manual en-US en-US
No ratings yet
Et200sp System Manual en-US en-US
271 pages
System Verilog Material - Final
100% (3)
System Verilog Material - Final
285 pages
HP ATA SS Student Textbook
No ratings yet
HP ATA SS Student Textbook
830 pages
Python Advanced - Threads and Threading
No ratings yet
Python Advanced - Threads and Threading
9 pages
Chapter 1 (Computer MCQS)
No ratings yet
Chapter 1 (Computer MCQS)
10 pages
Vormetric Backup and Recover 5
No ratings yet
Vormetric Backup and Recover 5
14 pages
Latitude 3330 12275-1 - AUSTIN13 - CHIEFRIVER - MB - A00 - 0226
No ratings yet
Latitude 3330 12275-1 - AUSTIN13 - CHIEFRIVER - MB - A00 - 0226
106 pages
1xSxxP - ESS - Protocal - Rev - 23 - 20171128 SPF5k MHP5k
No ratings yet
1xSxxP - ESS - Protocal - Rev - 23 - 20171128 SPF5k MHP5k
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.