Relational Concepts  «Prev 

Purpose Stages of Data Normalization

When we design a database for an enterprise, the main objective is to create an accurate representation of the data, relationships between the data, and constraints on the data that is pertinent to the enterprise. To help achieve this objective, we can use one or more database design techniques. One of those techniques is called Entity-Relationship (ER) modeling.
This module discusses the database design technique called normalization.
Normalization is a database design technique, which begins by examining the relationships (called functional dependencies) between attributes. Attributes describe some property of the data or of the relationships between the data that is important to the enterprise. Normalization uses a series of tests (described as normal forms) to help identify the optimal grouping for these attributes to ultimately identify a set of suitable relations that supports the data requirements of the enterprise.
While the main purpose of this module is to introduce the concept of functional dependencies and describe normalization up to Third Normal Form (3NF), later we will take a more formal look at functional dependencies and also consider later normal forms that go beyond 3NF.

Purpose of Normalization

Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise.
The purpose of normalization is to identify a suitable set of relations that support the data requirements of an enterprise. The characteristics of a suitable set of relations include the following:
  1. n the minimal number of attributes necessary to support the data requirements of the enterprise;
  2. n attributes with a close logical relationship (described as functional dependency) are found in the same relation;
  3. n minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys , which are essential for the joining of related relations.
The benefits of using a database that has a suitable set of relations is that the database will be easier for the user to access and maintain the data, and take up minimal storage space on the computer.

First name and last name columns have repetitive information. As the table grows, this information will require more diskspace.

After normalizing, the First Name and Last Name columns have been moved into a separate table, and linked to the original table through the Employee ID column. This option stores the same data, but requires less storage space.

When normalizing tables, make sure that tables store only one type of entity. For example, timesheets and employees are each separate entities.

Each table should have a unique row identifier, which can be one or more columns that uniquely identifies a row in a table. Notice that the combination of the Employee ID and Week Ending Date columns uniquely identify each row of data in the TimeSheets table.

The Employees table uniquely identifies a row of data by the Employee ID column (the primary key) alone.

The benefit of normalization is that repetitive values are eliminated, unless those values are part of the primary key. Before normalization, the First Name and Last Name columns repeat data consistently.

After normalization, information is no longer repeated in these columns.