Lesson 7	Oracle rule-based optimizer
Objective	Describe the Functions of Oracle rule-based Optimization

Oracle rule-based Optimization

The Rule-Based Optimizer (RBO) in Oracle Database, unlike the Cost-Based Optimizer (CBO), does not rely on data statistics and cost evaluations to determine the most efficient execution plan for a query. Instead, the RBO uses a pre-defined set of rules to make this determination. In that sense, it can be considered to use a form of "heuristics." These rules, or heuristics, are based on the structure of the SQL statement and the specific operations involved, such as the type of join used or whether an index is available. The RBO prioritizes operations based on these rules, irrespective of the data distribution, the size of the tables, or the existence of indexes. For example, some of the heuristics used by the RBO include the following:

An equality predicate on a unique key or primary key is given top priority.
An equality predicate on a non-unique key is ranked next.
Range scans on indexed columns follow.
Full table scans are considered last.

These heuristics are static, and the RBO does not adapt to changing data or workload characteristics. This is in contrast to the CBO, which dynamically adjusts its decisions based on data statistics and estimates of the cost of different operations.
It's important to note that Oracle has deprecated the RBO and no longer supports it as of Oracle Database 11g. Oracle strongly recommends using the CBO, which provides a more adaptable and efficient approach to SQL optimization based on the characteristics of the data and the workload.
In conclusion, while the Oracle Rule-Based Optimizer can be said to use heuristics in the form of predefined rules to determine the access path for SQL statements, it's no longer the recommended or supported method for query optimization in Oracle databases. The Cost-Based Optimizer, using data statistics and cost estimations, is the current standard.

Three rule-based heuristics

The rule-based optimizer was Oracle's first attempt at processing SQL, and dates back to the first versions of Oracle. Oracle uses heuristics to determine the best method for accessing the data. A heuristic is a set of rules that allows the Oracle engine to guess which access path will run the fastest. These heuristics include:

Rule-based heuristics

A ranking scheme	Oracle knows which access paths are faster than other access paths.
Using an index if an index exists on a target table	If Oracle detects that it can use an index to access the table, the optimizer will choose that index. Using an index will speed access to the data.
Ordering the table names in the FROM clause	This rule determines the driving table. The driving table is retrieved first, and the rows from the second table are then merged into the result set from the first table. Therefore, it is essential that the second table return the least amount of rows based on the WHERE clause.

Oracle Ranking Scheme for SQL

In Oracle databases, it's common to hear the assertion that a full-table scan is the most expensive operation that can be performed. While it's true that full-table scans can be resource-intensive, they are not inherently the most costly operation in all situations. The cost or efficiency of a database operation is highly context-dependent and varies based on the nature of the data, the specific query, the structure of the table, and the overall database design. A full-table scan involves reading every row in a table to identify those that satisfy a query. It can be resource-intensive because it might require a large amount of disk I/O, particularly for large tables. However, there are scenarios where a full-table scan is the most efficient operation. For instance, if a query needs to retrieve a significant proportion of the rows in a table, a full-table scan could be faster than an index-based retrieval.
Index operations can sometimes be more expensive than full-table scans, especially in scenarios where a large number of random I/O operations are required. This might happen when a query must retrieve a high percentage of rows from a large table, and the rows are not adjacent on disk. In such cases, using an index can result in many disk I/O operations to retrieve the individual rows, which can be slower than reading all the rows in a continuous sequence as in a full-table scan. Additionally, certain complex operations such as sorting, aggregations, and certain types of joins can be resource-intensive, depending on the amount and distribution of data involved. Similarly, operations involving large LOBs (Large Objects) or extensive manipulation of CLOBs and BLOBs can be quite costly.
So, while full-table scans can be expensive, it's not accurate to categorically state that they are the most expensive operation to execute on an Oracle table. The true cost of a database operation depends on a variety of factors, and different operations can be more or less efficient depending on the specific circumstances. Oracle's Cost-Based Optimizer takes these factors into account when deciding the best execution plan for a query.

This list describes the amount of time that is required by Oracle to perform a query. T he fastest row access by ROWID happens very quickly since Oracle knows the exact data block where the row resides. The slowest, a full-table scan, could take hours to execute, depending upon the number of rows in the table.
The following list reflects the fastest to slowest time required to perform a query:

Single row access by ROWID
Single row by cluster join
Single row by hash cluster with unique or primary key
Single row by unique or primary key
Cluster join
Hash cluster key
Indexed cluster key
Using a multi-column (concatenated) index
Using a single column index
A bounded index range scan
A full-table scan

Which Optimizer?

The Oracle optimizer has two primary modes of operation:

cost-based or
rule-based.

To set the optimizer goal, you can specify CHOOSE (for cost-based) or RULE (for rule-based) for the OPTIMIZER_MODE parameter in your database's init.ora file. You can override the optimizer's default operations at the query and session level.
Setting OPTIMIZER_MODE to RULE invokes the rule-based optimizer (RBO), which evaluates possible execution paths and rates the alternative execution paths based on a series of syntactical rules. In general, the RBO is seldom used by new applications, and is found primarily in applications developed and tuned for earlier versions of Oracle.
Setting OPTIMIZER_MODE to CHOOSE invokes the cost-based optimizer (CBO). You can use the analyze command to generate statistics about the objects in your database. The generated statistics include the number of rows in a table and the number of distinct keys in an index. Based on the statistics, the CBO evaluates the cost of the available execution paths and selects the execution path that has the lowest relative cost. If you use the CBO, you need to make sure that you run the analyze command frequently enough for the statistics to accurately reflect the data within your database. If a query references tables that have been analyzed and tables that have not been analyzed, the CBO may decide to perform full table scans of the tables that have not been analyzed. To reduce the potential for unplanned full table scans, you should use either the RBO or the CBO consistently throughout your database.

Simple Set of Access Rules

Using this simple set of access rules, Oracle usually chooses the fastest method for returning the requested rows from the query. However, sometimes the rule-based optimizer makes mistakes such as choosing the wrong index for a table. This situation led to the development of Oracle hints^[1] that can be used to modify the access plan. We will discuss hints in detail later in this course.

Driving Table

The driving table should be the table with the least amount of rows returned, not just the table with the smallest total number of rows.
While the rule-based optimizer was sufficient for most queries, there were times when it failed to choose the fastest access path to the data The following series of images describe an example of such a failure.

1) Assume that we have a table called EMPLOYEE with an index on the DEPARTMENT column

2) Let us assume that there are only three departments on our database — 2) Let us further assume that there are only three departments on our database, 1) accounting, 2) finance, and 3) marketing.

3) Let us assume that accounting and finance has 2,500 employees each, but the marketing department has 10,000 employees. In other words, the rows are not equally distributed among the departments.

5) In this case, the fastest method for returning these rows is a full-table scan because two-thirds of the rows in the table are being requested by the query. Because the rule-based Optimizer does not know the distribution of values, it will choose an index range scan to service this query. When two-thirds of the rows are being accessed, an index range scan will be slower than a full-table scan.

Failure of rule-based Optimization

An Oracle Remote Database Administrator (DBA) may need to look at the default optimizer mode of their database in various circumstances. The default optimizer mode governs the general behavior of the Oracle SQL optimizer, a component that selects the most efficient execution plan for SQL statements. The two main modes are ALL_ROWS and FIRST_ROWS_n, and they impact how the Cost-Based Optimizer (CBO) optimizes queries.

Performance Tuning: If the DBA is conducting performance tuning activities, examining the default optimizer mode can provide important context. It can help understand the nature of any performance issues that might be observed, as different optimizer modes have different optimization strategies that can impact SQL performance.
Change in Workload Characteristics: If the workload characteristics of the database change, for example, transitioning from an OLTP (Online Transaction Processing) system to a DSS (Decision Support System) or vice versa, the DBA should re-evaluate the optimizer mode. ALL_ROWS mode is typically more suitable for DSS workloads that prioritize overall throughput, whereas FIRST_ROWS_n mode can be more appropriate for OLTP systems where response time for individual transactions is more important.
Upgrades and Migration: During Oracle upgrades or migration projects, the DBA should check the default optimizer mode to ensure it is set appropriately for the nature of the workload on the new version of the database.
SQL Tuning: When carrying out specific SQL tuning tasks, understanding the optimizer mode is crucial. The DBA can then see if changes to the mode may improve the performance of specific problem SQL statements.
Investigating Inconsistent Query Performance: If there are cases where a query's performance is inconsistent, checking the default optimizer mode can provide insights. For example, if the optimizer mode has been manually overridden at the session level or within specific SQL statements, this could lead to inconsistencies in performance.

The default optimizer mode is a key configuration setting that governs the behavior of Oracle's SQL optimizer. An Oracle Remote DBA needs to be aware of this setting during various activities such as performance tuning, workload changes, upgrades, SQL tuning, and performance troubleshooting.

Two Classes of Optimizer Modes

One of the first things the Oracle Remote DBA looks at is the default optimizer mode for their database. There are two classes of optimizer modes:

(RBO) the rule-based optimizer and
(CBO) the cost-based optimizer.

The Oracle init.ora parameters offer four values for the optimizer_mode parameter.

optimizer_mode = RULE

The first, and oldest optimizer mode is RULE. Under the rule-based optimizer, Oracle uses heuristics from the data dictionary in order to determine the most effective way to service an Oracle query and translate the declarative SQL command into an actual navigation plan to extract the data. In many pre-Oracle8i systems, rule-based optimization is faster than cost-based. In fact, Oracle Applications used rule-based optimization until release 11i.

optimizer_mode = FIRST_ROWS

This is a cost-based optimizer mode that will return rows as soon as possible, even if the overall query runs longer or consumes more resources. The FIRST_ROWS optimizer mode usually involves choosing a full index scan over a parallel full table scan. Because the FIRST_ROWS mode favors index scans over full table scans, the FIRST_ROWS mode is most appropriate for inline systems where an end user wants to see some results as quickly as possible.

optimizer_mode = ALL_ROWS

This is a cost-based optimizer mode that ensures that the overall query time is minimized, even if it takes longer to receive the first row. This usually involves choosing a parallel full table scan over a full index scan. Because the ALL_ROWS mode favors full table scans, the ALL_ROWS mode is best suited for batch-oriented queries where intermediate rows are not required for viewing.
To illustrate the difference between ALL_ROWS and FIRST_ROWS with an oversimplistic example, consider the following query:

Because of the shortcomings of the rule-based optimizer, Oracle knew that the best access decisions could only be made by the optimizer that was aware of the amount and distribution of the data in the target tables. Hence the cost-based optimizer was developed.
In the next lesson, the cost-based optimizer will be examined.

[1]Hint: This is an SQL compiler directive that tells Oracle to change an execution plan.