Database Analysis   «Prev  Next»
Lesson 4 First normal form
Objective Explain the requirements for first normal form.

First Normal Form of a Database Table

Many databases require that you plan your tables so they can handle multiple occurrences of the same type of item.
A book might have several authors. The following table shows one way to handle such a circumstance:

Book table containing columns that are not ideal for data storing
Book table containing columns that are not ideal for data storing.

The Author1 and Author2 fields are able to handle up to two authors for each book, but there are two obvious problems with the table:
  1. If a book has only one author, the Author2 column is wasted space.
  2. There is no place to store the name of a third, fourth, or succeeding author.

There is also a problem from a theoretical standpoint. One of the rules of tables is that every column in a table must represent a unique attribute of an entity. In the case of the Books table, the Author1 and Author2 columns represent the same attribute: an author. The technical term for columns that represent the same attribute is a repeating group. (Do not let the semantics fool you, since the lead and second authors of a book are separate individuals, but they are both members of the set of Authors.)
For a table to be in first normal form (1NF), the table must not contain any repeating groups.
first normal form (1NF): A table is in first normal form if it contains no repeating groups.

First Normal Form Defined

A table is in first normal form (1NF) if it meets the following criteria:
  1. The data are stored in a two-dimensional table.
  2. There are no repeating groups.
The key to understanding 1NF, therefore, is understanding the nature of a repeating group of data.

Understanding Repeating Groups

A repeating group is an attribute that has more than one value in each row of a table. For example, assume that you were working with an employee relation and needed to store the names and birth dates of the children from the employees. Because each employee can have more than one child, the names of the children and their birth dates each form a repeating group.
Note: A repeating group is directly analogous to a multivalued attribute in an ER diagram.

There is actually a very good reason why repeating groups are not permitted. To see what might happen if they were used, take a look at Figure 2 below, an instance of an employee table containing repeating groups. Notice that there are multiple values in a single row in both the children's names and the children's birth dates columns.
This presents two major problems:

EmpNum First Name Last Name Child Names Child Birthdates
1001 Jane Doe Mary, Sam 1/9/02, 5/15/04
1002 John Doe Lisa, David 1/9/00, 5/15/01
1003 Jane Smith John, Pat, Lee, Mary 10/5/04, 10/12/00, 6/6/2006, 8/21/04
1004 John Smith Michael 7/4/06
1005 Jane Jones Edward, Martha 10/21/05, 10/15/99
  1. There is no way to know exactly which birth date belongs to which child. It is tempting to say that we can associate the birth dates with the children by their positions in the list, but there is nothing to ensure that the relative positions will always be maintained.
  2. Searching the table is very difficult. If, for example, we want to know which employees have children born before 2005, the DBMS will need to perform data manipulation to extract the individual dates themselves. Given that there is no way to know how many birth dates there are in the column for any specific row, the processing overhead for searching becomes even greater.
The solution to these problems, of course, is to get rid of the repeating groups altogether.
The next lesson describes how to decompose a table to first normal form.