| Lesson 9 |
Denormalization |
| Objective |
Explain the rationale behind denormalizing tables and show how to denormalize a normalized design. |
Denormalizing Tables: When and How to Move to a Lower Normal Form
In relational database design, the default goal is to normalize tables—typically to Third Normal Form (3NF) or
Boyce-Codd Normal Form (BCNF)—to reduce redundancy and prevent update anomalies. However, in some situations you may
intentionally introduce controlled redundancy to improve read performance or simplify frequently used queries. This intentional step
down from a higher normal form is called denormalization.
What Is Denormalization?
Denormalization is the process of starting from a properly normalized schema and then:
- Combining two or more related tables into a single table, or
- Adding derived or duplicated columns to an existing table
to reduce the number of joins or calculations required by common queries. In other words, you trade some redundancy
and potential anomalies for:
- Fewer joins in hot (frequently executed) queries
- Simpler query patterns for common access paths
- Potentially lower latency on read-heavy workloads
Because denormalization weakens some of the guarantees provided by higher normal forms, it should always be:
- Measured – based on real performance data, not guesses
- Documented – so future developers understand why it was done
- Controlled – with mechanisms to keep redundant data consistent
How Far Should You Normalize Before Denormalizing?
Database theory defines increasingly strict normal forms up to Fifth Normal Form (5NF) and beyond. In practice:
- Most operational (OLTP) schemas are normalized to 3NF or BCNF.
- More advanced forms (4NF, 5NF) are used for special cases such as multi-valued or join dependencies.
Once a table is normalized, you can deliberately move it to a lower normal form for two main reasons:
- To improve query or reporting performance on critical workloads
- To reduce complexity for common access paths (even if the underlying model is more complex)
The key idea is: normalize first for correctness, then selectively denormalize for performance.
Step-by-Step: How to Denormalize a Table
The denormalization process is not random; it follows a disciplined workflow:
-
Start from a normalized design.
Ensure your base schema is in at least 3NF or BCNF. This becomes your “source of truth” model.
-
Identify performance pain points.
Use query plans, execution statistics, or monitoring tools to find:
- Queries with many joins over large tables
- Reports or dashboards that run slowly but follow a stable pattern
- Access paths that are read-heavy and rarely updated
-
Decide what to duplicate or pre-join.
For each hot query, ask:
- Which tables are joined every time?
- Which columns are repeatedly looked up but change rarely (e.g., manager name, category name)?
These are good candidates for adding to a denormalized structure.
-
Design the denormalized structure.
Choose one of the following patterns:
- Create a new denormalized table (e.g., a “summary” or “team” table) built from multiple normalized tables.
- Add redundant columns (such as a cached total or display name) to an existing table.
-
Define how redundancy will be maintained.
Decide how the denormalized data stays in sync with the normalized source:
- Triggers or stored procedures that update redundant columns
- Application logic that writes to both normalized and denormalized structures
- Scheduled refresh jobs (for reporting tables or materialized views)
-
Test for correctness and performance.
Validate that:
- Results match the normalized model
- Critical queries actually run faster
- Maintenance operations (inserts, updates, deletes) remain acceptable
Example: Genre Team Management
Consider a bookstore or media company that tracks employees, the genres they work on, and the manager responsible for each genre.
A fully normalized design might use two tables:
GenreManagement (GenreManager, GenreID)
GenreEmployees (EmployeeID, GenreManager)
In this design:
GenreManagement tells you which manager is responsible for which genre.
GenreEmployees tells you which manager each employee reports to for that genre team.
Now suppose your most common query is:
“List all employees on a genre team, showing EmployeeID, GenreID, and GenreManager on the same row.”
In the normalized design, every such query must join two tables:
SELECT e.EmployeeID,
gm.GenreID,
gm.GenreManager
FROM GenreEmployees e
JOIN GenreManagement gm
ON e.GenreManager = gm.GenreManager;
If this join runs millions of times per day, it may become a performance hotspot. One denormalization option is to create
a combined table that stores the joined data directly:
GenreTeam (EmployeeID, GenreManager, GenreID)
You can populate it initially from the normalized tables:
INSERT INTO GenreTeam (EmployeeID, GenreManager, GenreID)
SELECT e.EmployeeID,
gm.GenreManager,
gm.GenreID
FROM GenreEmployees e
JOIN GenreManagement gm
ON e.GenreManager = gm.GenreManager;
Once populated, queries that list employees, their genre, and their manager can read from GenreTeam directly
with no join:
SELECT EmployeeID, GenreID, GenreManager
FROM GenreTeam;
The trade-off is that GenreManager and GenreID are now duplicated across multiple rows. If a genre manager’s
name changes, you must update it in every affected row of GenreTeam (or refresh the table from the normalized sources).
When Not to Denormalize
Because denormalization introduces redundancy, it also re-introduces the classic anomalies that normalization was designed to avoid:
- Update anomalies: A manager’s name appears in many rows; if one row is updated incorrectly, results become inconsistent.
- Insert anomalies: You might not be able to represent certain facts cleanly without extra placeholder data.
- Delete anomalies: Removing the last employee in a team might also remove the only copy of that team’s metadata.
You should avoid denormalizing when:
- The tables involved are small or rarely joined.
- The performance problem can be solved with indexing, partitioning, or query tuning instead.
- The data changes very frequently, making it difficult to keep redundant copies in sync.
- Regulatory or auditing requirements demand a single, authoritative representation of each fact.
For example, if your goal is simply to list all GenreManager values and their GenreID, the normalized table
GenreManagement(GenreManager, GenreID) is ideal. A denormalized GenreTeam(EmployeeID, GenreManager, GenreID)
would require scanning many more rows and filtering out duplicates, making this query slower instead of faster.
Best Practices for Denormalization
-
Measure first, change later.
Use execution plans and profiling tools to confirm that joins or aggregations are the real bottleneck.
-
Denormalize narrowly.
Add only the minimum additional columns or tables required to solve a specific performance problem.
-
Preserve a normalized source of truth.
Keep the normalized schema (or views over it) as the authoritative model; treat denormalized structures as performance layers.
-
Automate consistency.
Use triggers, application logic, materialized views, or scheduled refresh jobs to keep redundant data synchronized.
-
Review periodically.
Workloads change. A denormalization that helped last year might not be needed today and can sometimes be removed.
graph LR
A[Normalization] --> B[Data Integrity]
A --> C[Reduced Redundancy]
D[Denormalization] --> E[Improved Query Performance]
D --> F[Simpler Access Paths]
Definition
Denormalization: The deliberate process of moving a table or schema from a higher normal form (for example, 3NF or BCNF)
to a lower normal form by introducing controlled redundancy, typically to improve read performance or simplify common queries.