Relational Concepts  «Prev  Next»

Lesson 5 Data normalization
Objective Define data normalization.

Define Data Normalization

When you design the tables in your database, you also define the relationships between them. This process defines how the data will be stored in the tables. When you normalize a database, you organize the data in tables in such a way that the data does not repeat. Typically, you end up with a greater number of tables, each containing fewer columns and a greater number of defined relationships.
Generally, it is a good idea to normalize, because it will result in tables having more clustered indexes. Tables with clustered indexes usually require less disk space than tables that are not normalized, and they enable the software to execute queries more quickly. This is not always true, however, because normalizing increases the number of tables, and the more tables you use in a query, the more slowly the query is likely to execute. The basic concept of normalization is to remove repeating columns (as much as possible) and place them in one or more new tables. View the Slide Show below for an example of normalization.

Remove Repeating Columns

First name and last name columns have repetitive information. As the table grows, this information will require more diskspace.
1) First name and last name columns have repetitive information. As the table grows, this information will require more diskspace.

After normalizing, the First Name and Last Name columns have been moved into a separate table, and linked to the original table through the Employee ID column.
2) After normalizing, the First Name and Last Name columns have been moved into a separate table, and linked to the original table through the Employee ID column. This option stores the same data, but requires less storage space.

When normalizing tables, make sure that tables store only one type of entity. For example, timesheets and employees are each separate entities
3) When normalizing tables, make sure that tables store only one type of entity. For example, timesheets and employees are each separate entities.

Each table should have a unique row identifier, which can be one or more columns that uniquely identifies a row in a table.
4) Each table should have a unique row identifier, which can be one or more columns that uniquely identifies a row in a table. Notice that the combination of the Employee ID and Week Ending Date columns uniquely identify each row of data in the TimeSheets table.

The Employees table uniquely identifies a row of data by the Employee ID column (the primary key) alone.
5) The Employees table uniquely identifies a row of data by the Employee ID column (the primary key) alone.

The benefit of normalization is that repetitive values are eliminated, unless those values are part of the primary key. Before normalization, the First Name and Last Name columns repeat data consistently.
6) The benefit of normalization is that repetitive values are eliminated, unless those values are part of the primary key. Before normalization, the First Name and Last Name columns repeat data consistently.

After normalization, information is no longer repeated in these columns.
7) After normalization, information is no longer repeated in these columns.

Data Normalization Stages
The following table assesses the potential impact of normalization on performance, maintainability, extensibility, scalability, availability, and security. This table is directly related to one of the Microsoft 70-029 exam objectives.

Factor Consideration
Performance Normalizing your database can tend to hurt performance if data tables are very large and your queries need to join many tables together.
Maintainability Normalized tables are much easier to maintain than non-normalized tables, because normalized tables have less columns.
Extensibility Extensibility generally refers the ability to upgrade your applications in the future. Microsoft determines whether specific functions and features are available to support your existing database designs. Therefore, normalizing your database will not have any impact on its extensibility.
Scalability Scalability allows you to migrate your database onto other operating systems, such as Windows 89 or 2012. Scalability is affected when you incorporate database features that aren't supported by other operating systems. However, scalability is not impacted by normalization.
Availability Availability is a term that is used to describe SQL Server being on-line and available to process data. A highly normalized database that contains much data might affect the availability of the server while very intensive queries are being processed. Therefore, if you normalize your database, your server may not be available for other tasks if you have lots of data and lots of normalized tables. However, many different query techniques can be employed to minimize this, such as using Stored Procedures.
Security Security restrictions can be placed on tables or columns. However, it is easier and more convenient to place security restrictions on entire tables. Therefore, if you normalize your database, you can enhance security by restricting access to a table that represents a complete object, or entity, such as salary information.

Now that you have learned about data normalization, you will learn about data denormalization in the next lesson.

Data Normalization - Exercise

Before moving on to the next lesson, click the Exercise link below to get some hands-on practice with data normalization.
Data Normalization - Exercise