Lesson 7 | Normalization beyond 3NF |

Objective | Describe normalization beyond Third Normal Form |

Normalizing a table to 3NF will remove most of the anomalies from your tables, but there are a few that can slip through the cracks.

It is possible to create tables where a field can determine more than one value in another field. For example, if a CD contained stories written by more than one author and read by more than one reader, a CDNo value could determine more than one Author value and more than one Reader value. Records from such a table are shown in the following illustration.

It is possible to create tables where a field can determine more than one value in another field. For example, if a CD contained stories written by more than one author and read by more than one reader, a CDNo value could determine more than one Author value and more than one Reader value. Records from such a table are shown in the following illustration.

CD | ||

CDNo | ReaderID | AuthorID |

101 | Frye | Rawdah |

101 | Frye | Loge |

101 | Shoemaker | Yarborough |

102 | Shoemaker | Rawdah |

102 | Barksdale | Roth |

102 | Barksdale | Loge |

Note the table is flawed in several ways:

- Deleting the author from a CD destroys all information about the CD's reader(s).
- Deleting the reader from a CD destroys all information about the CD's author(s).
- You must know at least one author to enter information about a CD's readers.
- You must know at least one reader to enter information about a CD's authors.
- If a CD has more than one record associated with it, either an author name or a reader name is repeated in every record relating to that CD.

These anomalies are a result of the relation having *multi-valued dependencies*^{[1]}.
That is, in the preceding relation, a CDNo value can determine more than one Author value and more than one Reader value. Eliminating multivalued dependencies is the goal of *Fourth Normal Form (4NF)*.[2] 4NF is really part of advanced database design and is beyond the scope of this course.

Fourth Normal Form (4NF) is a crucial concept in database normalization, a process aimed at organizing data in a structured and efficient manner. It is an extension of the Third Normal Form (3NF) and focuses on eliminating certain types of redundancies not addressed by 3NF. To understand how 4NF distinguishes itself from 3NF, we must first briefly revisit the concepts of functional dependencies and 3NF.

Third Normal Form (3NF) is based on the idea of eliminating transitive dependencies, which occur when a non-prime attribute is functionally dependent on another non-prime attribute through a prime attribute. A relation is in 3NF if it satisfies the following conditions:

However, 3NF is not sufficient to address all redundancy issues in a database, particularly when dealing with multi-valued dependencies (MVDs). This is where Fourth Normal Form (4NF) comes into play.

Fourth Normal Form (4NF) deals with the elimination of multi-valued dependencies. A multi-valued dependency exists in a relation when two or more independent sets of attributes depend on a third set of attributes, leading to redundancy. A relation is in 4NF if it satisfies the following conditions:

Third Normal Form (3NF) is based on the idea of eliminating transitive dependencies, which occur when a non-prime attribute is functionally dependent on another non-prime attribute through a prime attribute. A relation is in 3NF if it satisfies the following conditions:

- The relation is in Second Normal Form (2NF).
- Every non-prime attribute is non-transitively dependent on every candidate key.

However, 3NF is not sufficient to address all redundancy issues in a database, particularly when dealing with multi-valued dependencies (MVDs). This is where Fourth Normal Form (4NF) comes into play.

Fourth Normal Form (4NF) deals with the elimination of multi-valued dependencies. A multi-valued dependency exists in a relation when two or more independent sets of attributes depend on a third set of attributes, leading to redundancy. A relation is in 4NF if it satisfies the following conditions:

- The relation is in Third Normal Form (3NF).
- There are no non-trivial multi-valued dependencies in the relation.

To better illustrate the difference between 3NF and 4NF, consider a relation R(A, B, C) with the following dependencies:

- A → B (A functionally determines B)
- A →→ C (A multi-valued dependency on C)

The relation is in 3NF because there are no transitive dependencies. However, it is not in 4NF due to the presence of a non-trivial multi-valued dependency. To achieve 4NF, the relation should be decomposed into two separate relations: R1(A, B) and R2(A, C).

In summary, Fourth Normal Form (4NF) distinguishes itself from Third Normal Form (3NF) by addressing multi-valued dependencies, which are not dealt with in 3NF. While 3NF is concerned with eliminating transitive dependencies, 4NF seeks to remove redundancies caused by independent sets of attributes that depend on a third set of attributes. By achieving 4NF, a database can further minimize redundancy and improve data integrity.

When designing a database, your goal is to normalize its tables to 3NF. Except in special circumstances, like those described above, 3NF will be sufficient to ensure you don't lose any important data and that your database will update, delete, and modify data efficiently. There are normal forms above 4NF (called Fifth Normal Form and domain/key normal form), but they are highly theoretical and not reasonable design goals when normalizing the tables of a database. The next lesson discusses denormalization.

[1]*multi-valued dependency:* A condition in which a primary-key value in one table determines multiple values in two or more tables.

[2]*Fourth Normal Form (4NF):* A relation is in 4NF if it all of its determinants are candidate keys and it contains no multi-valued dependencies.