Third Normal Form   «Prev  Next»

Lesson 9 Denormalization
Objective Explain the rationale behind denormalizing tables and show how to denormalize a normalized design.

Denormalizing Tables: When and How to Move to a Lower Normal Form

In relational database design, the default goal is to normalize tables—typically to Third Normal Form (3NF) or Boyce-Codd Normal Form (BCNF)—to reduce redundancy and prevent update anomalies. However, in some situations you may intentionally introduce controlled redundancy to improve read performance or simplify frequently used queries. This intentional step down from a higher normal form is called denormalization.

What Is Denormalization?

Denormalization is the process of starting from a properly normalized schema and then:

to reduce the number of joins or calculations required by common queries. In other words, you trade some redundancy and potential anomalies for:

Because denormalization weakens some of the guarantees provided by higher normal forms, it should always be:

How Far Should You Normalize Before Denormalizing?

Database theory defines increasingly strict normal forms up to Fifth Normal Form (5NF) and beyond. In practice:

Once a table is normalized, you can deliberately move it to a lower normal form for two main reasons:

  1. To improve query or reporting performance on critical workloads
  2. To reduce complexity for common access paths (even if the underlying model is more complex)

The key idea is: normalize first for correctness, then selectively denormalize for performance.

Step-by-Step: How to Denormalize a Table

The denormalization process is not random; it follows a disciplined workflow:

  1. Start from a normalized design.
    Ensure your base schema is in at least 3NF or BCNF. This becomes your “source of truth” model.
  2. Identify performance pain points.
    Use query plans, execution statistics, or monitoring tools to find:
    • Queries with many joins over large tables
    • Reports or dashboards that run slowly but follow a stable pattern
    • Access paths that are read-heavy and rarely updated
  3. Decide what to duplicate or pre-join.
    For each hot query, ask:
    • Which tables are joined every time?
    • Which columns are repeatedly looked up but change rarely (e.g., manager name, category name)?
    These are good candidates for adding to a denormalized structure.
  4. Design the denormalized structure.
    Choose one of the following patterns:
    • Create a new denormalized table (e.g., a “summary” or “team” table) built from multiple normalized tables.
    • Add redundant columns (such as a cached total or display name) to an existing table.
  5. Define how redundancy will be maintained.
    Decide how the denormalized data stays in sync with the normalized source:
    • Triggers or stored procedures that update redundant columns
    • Application logic that writes to both normalized and denormalized structures
    • Scheduled refresh jobs (for reporting tables or materialized views)
  6. Test for correctness and performance.
    Validate that:
    • Results match the normalized model
    • Critical queries actually run faster
    • Maintenance operations (inserts, updates, deletes) remain acceptable

Example: Genre Team Management

Consider a bookstore or media company that tracks employees, the genres they work on, and the manager responsible for each genre. A fully normalized design might use two tables:


GenreManagement (GenreManager, GenreID)
GenreEmployees (EmployeeID, GenreManager)

In this design:

Now suppose your most common query is:

“List all employees on a genre team, showing EmployeeID, GenreID, and GenreManager on the same row.”

In the normalized design, every such query must join two tables:


SELECT  e.EmployeeID,
        gm.GenreID,
        gm.GenreManager
FROM    GenreEmployees   e
JOIN    GenreManagement  gm
   ON   e.GenreManager = gm.GenreManager;

If this join runs millions of times per day, it may become a performance hotspot. One denormalization option is to create a combined table that stores the joined data directly:


GenreTeam (EmployeeID, GenreManager, GenreID)

You can populate it initially from the normalized tables:


INSERT INTO GenreTeam (EmployeeID, GenreManager, GenreID)
SELECT  e.EmployeeID,
        gm.GenreManager,
        gm.GenreID
FROM    GenreEmployees   e
JOIN    GenreManagement  gm
   ON   e.GenreManager = gm.GenreManager;

Once populated, queries that list employees, their genre, and their manager can read from GenreTeam directly with no join:


SELECT EmployeeID, GenreID, GenreManager
FROM   GenreTeam;

The trade-off is that GenreManager and GenreID are now duplicated across multiple rows. If a genre manager’s name changes, you must update it in every affected row of GenreTeam (or refresh the table from the normalized sources).

When Not to Denormalize

Because denormalization introduces redundancy, it also re-introduces the classic anomalies that normalization was designed to avoid:

You should avoid denormalizing when:

For example, if your goal is simply to list all GenreManager values and their GenreID, the normalized table GenreManagement(GenreManager, GenreID) is ideal. A denormalized GenreTeam(EmployeeID, GenreManager, GenreID) would require scanning many more rows and filtering out duplicates, making this query slower instead of faster.

Best Practices for Denormalization

  1. Measure first, change later.
    Use execution plans and profiling tools to confirm that joins or aggregations are the real bottleneck.
  2. Denormalize narrowly.
    Add only the minimum additional columns or tables required to solve a specific performance problem.
  3. Preserve a normalized source of truth.
    Keep the normalized schema (or views over it) as the authoritative model; treat denormalized structures as performance layers.
  4. Automate consistency.
    Use triggers, application logic, materialized views, or scheduled refresh jobs to keep redundant data synchronized.
  5. Review periodically.
    Workloads change. A denormalization that helped last year might not be needed today and can sometimes be removed.


graph LR
A[Normalization] --> B[Data Integrity]
A --> C[Reduced Redundancy]
D[Denormalization] --> E[Improved Query Performance]
D --> F[Simpler Access Paths]

Definition

Denormalization: The deliberate process of moving a table or schema from a higher normal form (for example, 3NF or BCNF) to a lower normal form by introducing controlled redundancy, typically to improve read performance or simplify common queries.

SEMrush Software 7 SEMrush Banner 7