Lesson 1
Normalization: 1NF, 2NF, and 3NF with Practical Examples
Normalization is a disciplined way to structure tables so that each fact is stored once and updates stay consistent. You don’t need advanced set theory, just a few rules and a repeatable workflow.
- Why normalize? Preserve information from your conceptual model (entities, attributes, relationships) while minimizing redundancy and update anomalies.
- How it fits: Works with both top-down (start from ERD) and bottom-up (start from attributes) design. Most real projects blend the two.
Learning Objectives
- Define normalization and functional dependency.
- Recognize and correct violations of First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).
- Identify repeating groups, partial dependencies, and transitive dependencies.
Key Terms (Quick Ref)
- Repeating group: Multiple values of the same attribute stored in one row (e.g., Item1, Item2, …).
- Functional dependency (FD): Attribute(s)
X determine attribute Y (written X → Y).
- Partial dependency: In a composite key
(A,B), a non-key attribute depends on only A or B, not both.
- Transitive dependency:
A → B and B → C imply A → C; if C is non-key, that’s a 3NF violation.
First Normal Form (1NF)
Requirements:
- No repeating groups or arrays in a column; each column holds atomic values.
- Each row-column intersection stores one value of the declared type.
Fix: Move repeating groups to a separate child table so each repeated value becomes a separate row.
Second Normal Form (2NF)
Applies when a table’s primary key is composite.
- All non-key attributes must depend on the entire key.
- Violation: A column depends only on part of the composite key (→ partial dependency).
- Fix: Split attributes that depend on a subset of the key into their own table keyed by that subset.
Third Normal Form (3NF)
- No non-key attribute may depend on another non-key attribute (no transitive dependencies).
- Fix: Move the transitively determined attributes to a new table keyed by their direct determinant.
Database Design for Mere Mortals
Worked Example: From UNF to 3NF
Scenario: An order record includes customer and line item details in one table with repeating groups.
OrderUNF(OrderID, OrderDate, CustomerID, CustomerName,
Item1_ProductID, Item1_ProductName, Item1_Qty, Item1_UnitPrice,
Item2_ProductID, Item2_ProductName, Item2_Qty, Item2_UnitPrice, ...)
-- 1NF: Eliminate repeating groups (one line per row)
OrderLine1NF(OrderID, OrderDate, CustomerID, CustomerName,
ProductID, ProductName, Qty, UnitPrice)
-- Candidate key: (OrderID, ProductID) for OrderLine1NF
-- Check FDs (illustrative):
-- (OrderID → OrderDate, CustomerID)
-- (CustomerID → CustomerName)
-- (ProductID → ProductName)
-- (OrderID, ProductID → Qty, UnitPrice)
-- 2NF: Remove partial dependencies on a subset of the key
Order(OrderID, OrderDate, CustomerID)
Customer(CustomerID, CustomerName)
Product(ProductID, ProductName)
OrderLine(OrderID, ProductID, Qty, UnitPrice)
-- Now all non-key columns in OrderLine depend on the full key (OrderID, ProductID)
-- 3NF: Remove transitive dependencies (if any remain)
-- Example already handled: CustomerName depends on CustomerID, not on OrderID.
-- If UnitPrice were a catalog price (ProductID → UnitPrice),
-- move catalog price to Product and keep OrderLine.UnitPrice only if it captures
-- the historical price at order time (business rule).
Quick Checklist
- 1NF: Any multi-valued cells or repeating sets? Split them out.
- 2NF: With composite keys, do all non-key columns depend on the whole key?
- 3NF: Do any non-key columns determine other non-key columns? If yes, split.
Design Notes and Next Steps
- Information preservation: Keep every business fact from the conceptual model; you’re redistributing facts across tables, not losing them.
- Performance: Normalize first for correctness; consider selective denormalization later for read-heavy workloads, with constraints, indexes, and views.
- Continue with: Functional Dependencies and higher normal forms beyond 3NF.
