AN INTRODUCTION TO ORACLE SQL STATEMENT TUNING
Bill Magee
21st Feb 2003

THE ORACLE OPTIMISER

Introduction

Prior to beginning the actual work to satisfy a query Oracle takes a look around and decides on what it thinks is the best way to perform the query. A table may have a number of indexes, they may be composite, unique, functional etc. The statement may have a number of joins across many tables with many conditions and there could be various ways to perform the same operation.

The optimiser gathers information about the various options and considers how a full table scan, an index scan and different ordering of those scans would result in different performance and workload.

There are two main modes in which the optimiser can operate. The Optimiser Mode as it is known is set on the database instance by the DBA's. You can find out which mode is in use with the following query.

   SELECT VALUE
   FROM   v$parameter
   WHERE  NAME = 'optimizer_mode';

   NAME
   ====
   CHOOSE
The possible values of optimizer_mode are FIRST_ROWS and ALL_ROWS explicitly use the cost based optimiser. RULE uses the rule based optimiser and CHOOSE allows Oracle to choose on an individual query basis. The variations on the ALL_ROWS or FIRST_ROWS indicate to Oracle whether it should aim to complete the entire job with the minimum of work, or whether it should aim to return it's first response as soon as possible.

In a batch operation where no users are involved it is preferable to have the entire job completed with the minimum of work whereas when you have a user waiting for a result, it is preferable to have the first item in a list appear while Oracle is still working on obtaining the rest of the list.

As a very simple rule of thumb, databases on Oracle versions upto 7.3 are usually configured to use the rule based optimiser. From version 8 onwards, the cost based optimiser became the default, recommended method and is usually in use.

Table and index statistics

For rule based optimisation, table and index statistics are not required. For cost based optimisation the statistics need to exist on atleast one of the tables being referenced.

Statistics are needed by the cost based optimiser in order for it to determine the best access path to the data you require. Consider a table containing the gender and age of 20000 people. If you wanted to obtain all the males aged 34 and the table had an index on gender and an index on age there would be two possibilities.

Consider both approaches. We know there are 20,000 people in the table. If we used the index to find all males it is fair to say we would return about 10,000 rows. We would then have 10,000 rows with no index available and so would have to search through the 10,000 taking out non 34 year olds from the result set. If we used the index to find all 34 year olds first, it would be fair to say we would return about 200 rows which we would then have to look through manually finding the males.

It is clear then that somehow Oracle needs to know the makeup of an index, ie how selective it is for the given data (or index expression). Oracle documentation is not particularly clear to me on the terminology used. They refer to selectivity and cardinality. I'll probably get shot down for this, but in my mind I call it the granularity of the index.

The granularity of the gender index could be said to be about 0.5 and the granularity of the age index about 0.01.

These figures can be obtained by instructing Oracle to Analyze the table and it's indexes.

   ANALYZE TABLE PEOPLE COMPUTE STATISTICS;
   ANALYZE TABLE PEOPLE COMPUTE STATISTICS FOR ALL INDEXES;
The resulting values can be seen in the system views USER_TABLES and USER_INDEXES. From Oracle 8 onwards (I don't know about 7 and don't have one here to check) you can use the DBMS_UTILITY package to have an entire schema or instance analyzed. I should mention that I have seen reports of erratic behaviour on 9 onwards when using DBMS_UTILITY to analyse the schema. You should check this for yourself.

Since writing this, it has been brought to my attention that Tom Kyte (a virtual reference library of Oracle knowledge) has said that for gathering statistics ...

   DBMS_UTILITY PACKAGE  : is deprecated
   ANALYZE TABLE COMMAND : works but is not preferred
   DBMS_STATS PACKAGE    : is the preferred method
Certainly the ANALYZE TABLE command is very straight forward to use, and to get you moving forward into SQL statement tuning it should suffice for the moment. DBMS_STATS is far more powerful, but is also quite involved.

When instructing Oracle to perform object or schema analysis you can also indicate whether it should compute the statistics (perform exact counts to get the statistics) or estimate the statistics (work on a subset of the data to come up with a reasonable estimation). For very large tables it is usually best to estimate the statistics.

Refer to Oracle documentation for exact syntax of the ANALYZE TABLE command, DBMS_UTILITY functions and DBMS_STATS functions applicable to your database version.

When SQL is being optimised through the RULE based optimiser there is no benefit to having statistics on your tables or indexes. The statistics are not used to effect the actual access paths.

When using the cost based optimiser you should also ensure that your statistics are up to date. The statistics on a table or index are correct at the time that table or index was last analysed. If the demographics of your table have changed substantially you should bring the statistics up to date by having the statistics re-estimated or computed.

The rule based optimiser

The rule based optimiser, as it's name suggests follows a set of fairly comprehensive rules for accessing data. The rules are ranked in order of usual performance and Oracle will always use the highest ranked rules to perform the data access.

Essentially, each predicate (part of your where clause) causes workload. By looking at what indexes are available and what would be the likely workload against that index, Oracle forms it's execution plan. For instance, Oracle would usually execute an equals predicate first rather than a range predicate over an index. Quite simply the equals predicate is generally more likely to return less data and so have less disk I/O.

The rules are a little more complex than this and the Oracle documentation goes into some detail explaining each of the rules and their ranks.

Prior to version 8, the rule based optimiser was the optimiser of choice, and the default for a database. The cost based optimiser was still a little flaky and the reliability of the rule based optimiser to produce consistent access plans was a definate advantage.

The cost based optimiser

The cost based optimiser determines the cost of various different access methods against a query and will always use the method with the lowest cost.

The Oracle documentation both gives a measurable unit to the cost (one unit of Disk I/O but can also be configured to meas