| Lesson 7 | Normalization Beyond Third Normal Form |
| Objective | Explain why and how to normalize database schemas beyond Third Normal Form (3NF). |
Most practical database designs aim for Third Normal Form (3NF), which eliminates redundancy caused by partial and transitive dependencies. However, there are still situations where a 3NF schema can suffer from anomalies and unnecessary duplication. These cases typically involve multi-valued dependencies and motivate the need for Fourth Normal Form (4NF) and, in more advanced scenarios, even higher normal forms.
A table can satisfy all the rules for 3NF and still allow update, insert, and delete anomalies if it tries to represent multiple independent relationships in a single relation. This often happens when one key value determines multiple values in two or more different attributes.
Consider a table that stores information about audio CDs, their authors, and their readers (narrators). A single CD may have multiple authors and multiple readers. If we try to store all of this in one table, the result might look like this:
CD table with CD number, reader, and author
| CD | ||
| CDNo | ReaderID | AuthorID |
|---|---|---|
| 101 | Frye | Rawdah |
| 101 | Frye | Loge |
| 101 | Shoemaker | Yarborough |
| 102 | Shoemaker | Rawdah |
| 102 | Barksdale | Roth |
| 102 | Barksdale | Loge |
Although the table can be brought to 3NF with respect to functional dependencies, it still exhibits several serious problems:
These problems arise because the table contains a multi-valued dependency:
CDNo, there can be many independent AuthorID values.CDNo, there can also be many independent ReaderID values.The set of authors for a CD is logically independent from the set of readers for that CD, but the combined table forces every author–reader combination to appear as a separate row. Eliminating these multi-valued dependencies is the goal of Fourth Normal Form.
Fourth Normal Form is an extension of 3NF that targets multi-valued dependencies (MVDs). While 3NF eliminates transitive dependencies, 4NF goes further by ensuring that a relation does not contain multiple independent one-to-many relationships in the same table.
A relation is in Fourth Normal Form (4NF) if:
Informally, if a single key value determines two or more independent sets of values, those sets should be decomposed into separate tables so that each table contains exactly one such relationship.
In the CD example, the CD number (CDNo) independently determines a set of authors and a set of readers. To achieve 4NF, we decompose the table into two relations:
CD_Author (CDNo, AuthorID)
CD_Reader (CDNo, ReaderID)
Now, the authors of a CD and the readers of a CD are stored independently. This decomposition:
If you need to see all combinations of authors and readers for a given CD, you can reconstruct them using a join:
SELECT ca.CDNo,
ca.AuthorID,
cr.ReaderID
FROM CD_Author ca
JOIN CD_Reader cr
ON ca.CDNo = cr.CDNo;
This approach preserves all information without forcing every author–reader pairing to be stored as a separate physical row.
Beyond 4NF, database theory defines Fifth Normal Form (5NF) and Domain-Key Normal Form (DKNF), which handle increasingly specialized cases involving join dependencies and domain constraints. These forms are valuable in academic and highly specialized designs, but they are rarely explicit design targets in typical enterprise systems.
In practice, your design goals are usually:
Once the logical model is stable and normalized, you can evaluate performance requirements. For heavy reporting or analytics workloads, modern relational databases and cloud platforms often use denormalization, materialized views, or columnar storage to speed up complex queries, while still basing those optimizations on a well-normalized core schema.