|Lesson 4 || First normal form |
|Objective || Explain the requirements for first normal form. |
First Normal Form
What are the requirements to place a table in first normal form?
In order to place a table in first normal form (1NF), the table must meet the following requirements:
- Atomicity: Each column in the table must hold only a single value, and that value should be indivisible. This means that a column should not contain multiple values or a list of values.
- Uniqueness: Each row in the table must be unique, meaning that no two rows should be identical.
- Primary key: The table must have a primary key, which is a column or set of columns that uniquely identifies each row in the table.
- No repeating groups: Each column in the table should hold a single value, and there should be no repeating groups of columns. This means that a table should not contain multiple columns that hold the same type of data.
- No partial dependencies: Each non-key column in the table should depend on the entire primary key, and not just a part of it. This means that a column should not be dependent on only a subset of the primary key columns.
By meeting these requirements, a table can be considered in first normal form (1NF), which is the first step towards achieving a normalized database design.
Multiple Occurrences of the same Type of Item
Many databases require that you plan your tables so they can handle multiple occurrences of the same type of item.
A book might have several authors. The following table shows one way to handle such a circumstance:
Book table containing columns that are not ideal for data storing.
The Author1 and Author2 fields are able to handle up to two authors for each book, but there are two obvious problems with the table:
- If a book has only one author, the Author2 column is wasted space.
- There is no place to store the name of a third, fourth, or succeeding author.
There is also a problem from a theoretical standpoint. One of the rules of tables is that every column in a table must represent a unique attribute
of an entity. In the case of the Books table, the Author1 and Author2 columns represent the same attribute: an author. The technical term for columns that represent the same attribute is a repeating group
. (Do not let the semantics fool you, since the lead and second authors of a book are separate individuals, but they are both members of the set of Authors.)
For a table to be in first normal form (1NF),
the table must not contain any repeating groups.
First Normal Form Defined
A table is in first normal form (1NF) if it meets the following criteria:
- The data are stored in a two-dimensional table.
- There are no repeating groups.
The key to understanding 1NF, therefore, is understanding the nature of a repeating group of data.
Understanding Repeating Groups
A repeating group is an attribute that has more than one value in each row of a table. For example, assume that you were working with an employee relation and needed to store the names and birth dates of
the children from the employees. Because each employee can have more than one child, the names of the children and their birth dates each form a repeating group.
Note: A repeating group is directly analogous to a multivalued attribute in an ER diagram.
Repeating Groups not permitted
There is actually a very good reason why repeating groups are not permitted. To see what might happen if they were used, take a look at Figure 2 below, an instance of an employee table containing repeating groups.
Notice that there are multiple values in a single row in both
- the children's names and
- the children's birth dates columns.
This presents two major problems:
||John, Pat, Lee, Mary
||10/5/04, 10/12/00, 6/6/2006, 8/21/04
- There is no way to know exactly which birth date belongs to which child.
It is tempting to say that we can associate the birth dates with the children by their positions in the list, but there is nothing to ensure that the relative positions will always be maintained.
- Searching the table is very difficult. If, for example, we want to know which employees have children born before 2005, the DBMS will need to perform data manipulation to extract the individual dates themselves.
Given that there is no way to know how many birth dates there are in the column for any specific row, the processing
overhead for searching becomes even greater.
The solution to these problems, of course, is to get rid of the repeating groups altogether.
The next lesson describes how to decompose a table to first normal form.
first normal form (1NF):
A table is in first normal form if it contains no repeating groups.