Third Normal Form   «Prev  Next»

Lesson 7 Normalization Beyond Third Normal Form
Objective Explain why and how to normalize database schemas beyond Third Normal Form (3NF).

Normalization Beyond Third Normal Form (3NF)

Most practical database designs aim for Third Normal Form (3NF), which eliminates redundancy caused by partial and transitive dependencies. However, there are still situations where a 3NF schema can suffer from anomalies and unnecessary duplication. These cases typically involve multi-valued dependencies and motivate the need for Fourth Normal Form (4NF) and, in more advanced scenarios, even higher normal forms.

When 3NF Is Not Enough

A table can satisfy all the rules for 3NF and still allow update, insert, and delete anomalies if it tries to represent multiple independent relationships in a single relation. This often happens when one key value determines multiple values in two or more different attributes.

Consider a table that stores information about audio CDs, their authors, and their readers (narrators). A single CD may have multiple authors and multiple readers. If we try to store all of this in one table, the result might look like this:

CD table with CD number, reader, and author

CD
CDNo ReaderID AuthorID
101 Frye Rawdah
101 Frye Loge
101 Shoemaker Yarborough
102 Shoemaker Rawdah
102 Barksdale Roth
102 Barksdale Loge

Flaws in the Combined Table

Although the table can be brought to 3NF with respect to functional dependencies, it still exhibits several serious problems:

  1. Delete anomalies: Deleting the last row for a specific author on a CD may also remove information about which readers are associated with that CD.
  2. Delete anomalies (mirrored): Deleting the last row for a specific reader on a CD may also remove information about which authors are associated with that CD.
  3. Insert anomalies: You cannot add a new reader for a CD unless you also know at least one author, because both attributes are stored in the same row.
  4. Insert anomalies (mirrored): You cannot add a new author for a CD unless you also know at least one reader.
  5. Update anomalies and redundancy: Author and reader names are repeated across multiple rows for the same CD, increasing the risk of inconsistent updates.

These problems arise because the table contains a multi-valued dependency:

  • For a given CDNo, there can be many independent AuthorID values.
  • For the same CDNo, there can also be many independent ReaderID values.

The set of authors for a CD is logically independent from the set of readers for that CD, but the combined table forces every author–reader combination to appear as a separate row. Eliminating these multi-valued dependencies is the goal of Fourth Normal Form.

Fourth Normal Form (4NF)

Fourth Normal Form is an extension of 3NF that targets multi-valued dependencies (MVDs). While 3NF eliminates transitive dependencies, 4NF goes further by ensuring that a relation does not contain multiple independent one-to-many relationships in the same table.

Definition of 4NF

A relation is in Fourth Normal Form (4NF) if:

  1. The relation is already in Third Normal Form (3NF).
  2. Every non-trivial multi-valued dependency has a candidate key as its determinant.

Informally, if a single key value determines two or more independent sets of values, those sets should be decomposed into separate tables so that each table contains exactly one such relationship.

Decomposing to 4NF

In the CD example, the CD number (CDNo) independently determines a set of authors and a set of readers. To achieve 4NF, we decompose the table into two relations:


CD_Author  (CDNo, AuthorID)
CD_Reader  (CDNo, ReaderID)

Now, the authors of a CD and the readers of a CD are stored independently. This decomposition:

  • Removes redundant repetitions of author and reader names.
  • Prevents loss of information when an author or reader is deleted.
  • Allows independent maintenance of author–CD and reader–CD relationships.

If you need to see all combinations of authors and readers for a given CD, you can reconstruct them using a join:


SELECT  ca.CDNo,
        ca.AuthorID,
        cr.ReaderID
FROM    CD_Author ca
JOIN    CD_Reader cr
  ON    ca.CDNo = cr.CDNo;

This approach preserves all information without forcing every author–reader pairing to be stored as a separate physical row.

Higher Normal Forms and Practical Design

Beyond 4NF, database theory defines Fifth Normal Form (5NF) and Domain-Key Normal Form (DKNF), which handle increasingly specialized cases involving join dependencies and domain constraints. These forms are valuable in academic and highly specialized designs, but they are rarely explicit design targets in typical enterprise systems.

In practice, your design goals are usually:

  • Normalize operational schemas to 3NF to control redundancy and eliminate most anomalies.
  • Apply 4NF when you detect genuine multi-valued dependencies, as in the CD author/reader example.
  • Use higher normal forms selectively and only when they solve a clearly identified structural problem.

Once the logical model is stable and normalized, you can evaluate performance requirements. For heavy reporting or analytics workloads, modern relational databases and cloud platforms often use denormalization, materialized views, or columnar storage to speed up complex queries, while still basing those optimizations on a well-normalized core schema.

Key Takeaways

[1]Multi-valued dependency: A condition in which a single primary-key value determines multiple independent sets of attribute values (for example, multiple authors and multiple readers for the same CD).
[2]Fourth Normal Form (4NF): A relation is in 4NF if it is in 3NF and every non-trivial multi-valued dependency has a candidate key as its determinant.

SEMrush Software 5 SEMrush Banner 5