Lesson 1

Clustering Oracle Tables and Indexes

One advantage of a relational database is that users do not need to know anything about the physical structure of the database.
They can access all the data by simply using (SQL) Structured Query Language . However, underneath the logical structures manipulated by SQL, the data in your Oracle database is still stored on a disk. And, as with all data on disk, the performance of data retrieval is dependent to some extent on the physical movement of the disk head over the spinning disk. If you can minimize this disk head movement, you minimize the single largest bottleneck in data retrieval. Oracle allows you to create clustered tables and indexes, whose primary purpose is to minimize the inherent delays caused by disk head movement. This module covers all you need to know about clusters, including:

The advantages of clustering
How to create and size a cluster
How to create a cluster key
The purpose of a hash cluster
How to create a hash cluster
How to drop a cluster

Concise Overview of Oracle Database Clusters

Here's a concise overview of Oracle database clusters, addressing the points mentioned above, with a focus on their role in minimizing disk head movement and improving data retrieval performance:
Advantages of Clustering

Reduced Disk Head Movement: Clustering stores related data from multiple tables or indexes physically close together on disk, minimizing disk head movement and improving retrieval speed.
Improved Query Performance: By colocating data that is frequently accessed together (e.g., via joins), clusters reduce I/O operations, speeding up queries.
Efficient Storage: Clusters can reduce storage overhead by eliminating redundant data in certain scenarios, such as with index clusters.

How to Create and Size a Cluster

Creating a Cluster:

CREATE CLUSTER cluster_name (column_name datatype [,...])
[INDEX | HASHKEYS number]
[SIZE bytes] [TABLESPACE tablespace_name];

column_name datatype: Defines the cluster key, which determines how data is grouped.
SIZE: Specifies the average amount of space needed to store all rows with the same cluster key value. Proper sizing reduces wasted space and improves performance.
TABLESPACE: Assigns the cluster to a specific tablespace for storage management.

Sizing a Cluster:

Estimate the average size of rows sharing the same cluster key value.
Use the SIZE parameter to allocate sufficient space per cluster key to avoid overflow.
Analyze table data and query patterns to determine optimal cluster size (e.g., using DBMS_SPACE or ANALYZE).

How to Create a Cluster Key

The cluster key is defined in the `CREATE CLUSTER` statement and is used to group related rows.
Tables in the cluster must include the cluster key column(s).

Example:

CREATE CLUSTER emp_dept_cluster (deptno NUMBER(4))
SIZE 512 TABLESPACE users;

CREATE TABLE dept (
    deptno NUMBER(4) PRIMARY KEY,
    dname VARCHAR2(30)
) CLUSTER emp_dept_cluster (deptno);

CREATE TABLE emp (
    empno NUMBER(4),
    ename VARCHAR2(30),
    deptno NUMBER(4)
) CLUSTER emp_dept_cluster (deptno);

The `deptno` column is the cluster key, ensuring that employee and department rows with the same `deptno` are stored together.

Purpose of a Hash Cluster

A hash cluster uses a hash function on the cluster key to determine the physical storage location of data, eliminating the need for an index in many cases.

Purpose:
- Provides faster access for exact-match queries (e.g., WHERE cluster_key = value).
- Reduces disk I/O by directly mapping the cluster key to a disk location.
- Ideal for static data with predictable key values, such as lookup tables.
Trade-offs:
- Less effective for range queries or frequently updated data.
- Requires careful sizing to avoid hash collisions.

How to Create a Hash Cluster

Use the HASHKEYS clause to specify the number of unique hash values (this determines the number of slots for data).

Example:

CREATE CLUSTER product_hash_cluster (product_id NUMBER)
HASHKEYS 1000
SIZE 1024
TABLESPACE users;

CREATE TABLE products (
    product_id NUMBER,
    product_name VARCHAR2(50)
) CLUSTER product_hash_cluster (product_id);

HASHKEYS 1000: This allocates space for 1,000 unique hash values.
Optionally, you can specify HASH IS to use a custom hash function.

How to Drop a Cluster

Dropping a cluster removes the cluster and its associated tables unless the tables are explicitly retained.
Syntax:
```
DROP CLUSTER cluster_name [INCLUDING TABLES [CASCADE CONSTRAINTS]];
```
- INCLUDING TABLES: Drops all tables in the cluster.
- CASCADE CONSTRAINTS: Drops any referential integrity constraints on cluster tables.

Example:

DROP CLUSTER emp_dept_cluster INCLUDING TABLES CASCADE CONSTRAINTS;

Note: Ensure no dependent objects (e.g., triggers) exist, or the drop operation will fail.

Additional Notes

Types of Clusters:
- Index Clusters: Use a B-tree index on the cluster key for data retrieval. Suitable for range queries.
- Hash Clusters: Use a hash function for direct access, ideal for exact matches.
Performance Considerations:
- Clusters are most effective when data is frequently accessed together (e.g., parent-child relationships).
- Improper sizing or excessive updates can lead to overflow blocks, degrading performance.
When to Use Clusters:
- High join frequency between tables with the same key.
- Static or slowly changing data for hash clusters.
- Avoid clusters for highly dynamic data or queries that don’t use the cluster key.

Clustering Technique

Clustering is a method of storing tables that are intimately related and often joined together into the same area on disk. For example, instead of the BOOKSHELF table being in one section of the disk and the BOOKSHELF_AUTHOR table being somewhere else, their rows could be interleaved together in a single area, called a cluster. The cluster key is the column or columns by which the tables are usually joined in a query (for example, Title for the BOOKSHELF and BOOKSHELF_ AUTHOR tables). To cluster tables, you must own the tables you are going to cluster together. The following is the basic format of the create cluster command:

create cluster cluster
(column datatype [,column datatype]. . .) [other options];

The cluster name follows the table-naming conventions, and column datatype is the name and datatype you will use as the cluster key. The column name may be the same as one of the columns of a table you will put in this cluster, or it may be any other valid name. Here is an example:

create cluster BOOKandAUTHOR (Col1 VARCHAR2(100));

This creates a cluster (a space is set aside, as it would be for a table) with nothing in it. The use of Col1 for the cluster key is irrelevant; you will never use it again. However, its definition should match the primary key of the table to be added. Next, tables are created to be included in this cluster:

create table BOOKSHELF
(Title VARCHAR2(100) primary key,
Publisher VARCHAR2(20),
CategoryName VARCHAR2(20),
Rating VARCHAR2(2),
constraint CATFK foreign key (CategoryName)
references CATEGORY(CategoryName)
)
cluster BOOKandAUTHOR (Title);

Prior to inserting rows into BOOKSHELF, you must create a cluster index:

create index BOOKandAUTHORndx
on cluster BOOKandAUTHOR;

Recall that the presence of a cluster clause here precludes the use of a tablespace or storage clause. Note how this structure differs from a standard create table statement:

create table BOOKSHELF
(Title VARCHAR2(100) primary key,
Publisher VARCHAR2(20),
CategoryName VARCHAR2(20),
Rating VARCHAR2(2),
constraint CATFK foreign key (CategoryName)
references CATEGORY(CategoryName)
);

Although clusters are not appropriate for all types of data, in the right situation a cluster can provide a significant increase in performance. In the next lesson you will learn when to use clustered tables.