Physical Design   «Prev  Next»

Lesson 6Downside to Clustering
ObjectiveDescribe the trade-off that occurs when clustering Data

Data Clustering Trade Off

When you cluster data, you store frequently combined records from different tables as close together on the computer's hard drive as possible. The downside to clustering records from different tables is that records from the same table are no longer stored close to each other on the disk, which affects performance:
  1. Clustering slows down operations that require the RDBMS to search an entire table.
  2. Clustering may slow down data insertion because the RDBMS must search the disk for the last record in the table.
  3. Clustering may increase the time needed to update data in the fields used to define the cluster because the RDBMS must determine again which records to include in the cluster.

Clustering Purpose

The slowest part of a DBMS's actions is retrieving data from or writing data to a disk. If you can cut down on the number of times the DBMS must read from or write to a disk, you can speed up overall database performance. The trick to doing this is understanding that a database must retrieve an entire disk page of data at one time. The size of a page varies from one computing platform to another and it can be anywhere from 1024 bytes to 8 K, with 1 K being typical on a PC. Data always travel to and from disk in page-sized units. Therefore, if you store data that are often accessed together on the same disk page (or pages that are physically close together), you can speed up data access. This process is known as clustering[1] and is available with many large DBMSs (for example, Oracle).
In practice, a cluster is designed to keep together rows related by matching primary and foreign keys. To define the cluster, you specify a column or columns on which the DBMS should form the cluster and the tables that should be included. Then, all of the rows that share the same value of the column or columns on which the cluster is based are stored as physically close together as possible. As a result, the rows in a table may be scattered across several disk pages, but matching primary and foreign keys are usually on the same disk page. Clustering can significantly speed up join performance. However, just as with indexes, there are some trade-offs to consider when contemplating creating clusters:

  1. Because clustering involves physical placement of data in a file, a table can be clustered on only one column or combination of columns.
  2. Clustering can slow down performance of operations that require a scan of the entire table because clustering may mean that the rows of any given table are scattered throughout many disk pages.
  3. Clustering can slow down insertion of data.
  4. Clustering can slow down modifying data in the columns on which the clustering is based.

The next lesson introduces partitioning.

Clustering Disadvantages - Quiz

Before you move on to the next lesson, click the Quiz link below to reinforce your understanding of clustering.
Clustering Disadvantages - Quiz

[1] clustering: The term clustering has another meaning in the SQL standard. It refers to groups of catalogs (which in turn are groups of schemas) manipulated by the same DBMS. The use of the term in this section, however, is totally distinct from the SQL meaning.