Clustering Tables   «Prev  Next»

Lesson 1

Clustering Oracle Tables and Indexes

One of the strengths of a relational database is that application developers and analysts work primarily with logical structures (tables, rows, columns) using SQL, without needing to manage physical disk layouts directly. However, an Oracle database still has to place blocks on storage somewhere, and performance is influenced by how efficiently Oracle can locate and read the blocks needed to satisfy a query.

In modern Oracle implementations (including Oracle 23ai), the “bottleneck” is rarely literal disk-head movement. The practical goal is still the same: reduce unnecessary I/O and improve locality so Oracle can return rows with fewer physical reads and fewer buffer gets. Clusters are one of Oracle’s physical design features that can help in a narrow set of workload patterns—primarily when multiple tables are frequently joined using the same key.

This module introduces Oracle clusters and shows when clustering is beneficial, how to build clusters correctly, and how to avoid designs that increase DML overhead or waste space.

  1. The advantages and trade-offs of clustering
  2. How to create and size an index cluster
  3. How to choose and define a cluster key
  4. The purpose of a hash cluster
  5. How to create and size a hash cluster
  6. How to drop a cluster safely
Oracle Cloud DBA

What “clustering” means in Oracle

In Oracle, a cluster is a database segment designed to store rows from one or more tables together based on a shared value called the cluster key. When rows are stored together by that key, Oracle can often retrieve related rows with fewer block reads—especially for join-heavy queries that repeatedly access the same key value.

Clustering is not a general performance “boost.” It is a physical design choice that trades flexibility for locality. You typically consider a cluster only when:

  • Two (or more) tables are joined very frequently using the same key column(s).
  • Queries commonly filter on the cluster key (or join by it) and then fetch rows from multiple clustered tables.
  • The clustered tables are relatively stable (fewer updates that cause row migration or frequent growth patterns).

Types of clusters

Index clusters

An index cluster stores rows by cluster key and uses a cluster index to locate the block(s) that hold the key value. This is the more common cluster type and is usually the better choice when you have range access patterns or mixed predicates (not only equality).

Hash clusters

A hash cluster uses a hash function on the cluster key to compute the target block location. This can be very fast for exact-match access (for example, WHERE key = :value) but is a poor fit for range queries and can degrade if sizing is wrong (hash collisions and overflow blocks).

Cluster key

The cluster key is the column (or columns) that Oracle uses to colocate rows. It should match how your workload joins or filters data. In a parent/child model, the cluster key is often the parent key that also exists as the child foreign key.

All tables you place in a cluster must include the cluster key column(s), and the table’s CLUSTER ... (key) clause tells Oracle which column(s) map to the cluster key definition.

Index cluster example

The example below clusters two tables on a shared department key, so department rows and employee rows for the same department can be stored close together. This can reduce block visits for join-heavy lookups by department.


CREATE CLUSTER emp_dept_cl
  (deptno NUMBER(4))
  SIZE 1024
  TABLESPACE users;

-- Required for an index cluster before loading data
CREATE INDEX emp_dept_cl_ix
  ON CLUSTER emp_dept_cl;

CREATE TABLE dept (
  deptno NUMBER(4) CONSTRAINT dept_pk PRIMARY KEY,
  dname  VARCHAR2(30)
) CLUSTER emp_dept_cl (deptno);

CREATE TABLE emp (
  empno NUMBER(4) CONSTRAINT emp_pk PRIMARY KEY,
  ename VARCHAR2(30),
  deptno NUMBER(4) CONSTRAINT emp_dept_fk REFERENCES dept(deptno)
) CLUSTER emp_dept_cl (deptno);

Notes:

  • SIZE is the estimated space needed to store all rows that share the same cluster key value (for example, the average “department + its employees”). Poor sizing leads to wasted space or overflow blocks.
  • The cluster index is not an index on a table. It is an index on the cluster structure that maps key values to the block(s) where clustered rows reside.

Hash cluster example

A hash cluster is optimized for equality lookups on the key. It is typically used for lookup-style access patterns where you repeatedly query by a specific key value.


CREATE CLUSTER product_hcl
  (product_id NUMBER)
  HASHKEYS 1000
  SIZE 1024
  TABLESPACE users;

CREATE TABLE products (
  product_id   NUMBER CONSTRAINT products_pk PRIMARY KEY,
  product_name VARCHAR2(50)
) CLUSTER product_hcl (product_id);

Practical guidance:

  • HASHKEYS should reflect the expected number of distinct key values.
  • If the cluster grows far beyond your assumptions, hash collisions and overflow blocks increase, and performance can become worse than a conventional indexed table.
  • Hash clusters are generally not a good fit for range predicates or highly volatile tables.

Dropping a cluster

Dropping a cluster removes the cluster segment and (optionally) its tables. In practice, you typically drop the tables explicitly as part of a controlled change, then drop the cluster.


DROP CLUSTER emp_dept_cl INCLUDING TABLES CASCADE CONSTRAINTS;

Use INCLUDING TABLES only when you explicitly intend to remove the cluster tables as part of the same operation. Otherwise, plan the migration (move tables out of the cluster) before dropping.

When not to use clusters

Clusters are specialized. In many modern systems, better outcomes come from appropriate indexing, partitioning, and query tuning. Avoid clustering when:

  • Queries rarely join by a single stable key (or access patterns are unpredictable).
  • Tables are subject to heavy insert/update activity that changes row sizes and increases overflow risk.
  • You rely on full scans or analytics where physical colocating by a join key adds little value.

In the next lesson, you will evaluate workload patterns and learn how to decide whether clustering is appropriate before implementing it.


SEMrush Software 1 SEMrush Banner 1