This module introduced database normalization from the ground up. It began with the definition and purpose of normalization, worked through the requirements and limitations of First Normal Form, then covered the functional dependency theory that underpins Second Normal Form, and concluded with the practical process for achieving 2NF and understanding where it falls short.
The following sections summarize the key concepts from each lesson, reinforce the connections between them, and provide a glossary of the terms introduced throughout the module.
Normalization is the process of applying increasingly stringent rules to a relational database to correct problems associated with poor design. The goal is to store each fact exactly once, make updates consistent, and prevent the three categories of anomaly that arise in poorly structured tables.
The three anomalies that normalization prevents are the same ones that motivate the entire process. An insertion anomaly occurs when a legitimate fact cannot be recorded until an unrelated fact exists - a new customer cannot be added until they place an order, or a new product cannot be added until someone buys it. An update anomaly occurs when the same fact is stored in multiple rows and a change to that fact must be applied to every row simultaneously - a missed row leaves the database in an inconsistent state. A deletion anomaly occurs when removing one fact unintentionally removes another - cancelling the last order for a customer also deletes the only record of that customer.
Normalization works equally well with top-down design (starting from an entity-relationship diagram) and bottom-up design (starting from a list of attributes). Most real projects blend both approaches. The normal forms provide a series of tests that confirm the schema is progressively more free of the structural conditions that allow anomalies to occur.
First Normal Form (1NF) establishes the structural prerequisites that every relation must satisfy before normalization can proceed. A table is in 1NF when every column contains only atomic (indivisible) values and there are no repeating groups within a single row.
A repeating group occurs when a single row stores multiple values for the same
attribute. The classic example is a Book table with columns Author1 and
Author2: these two columns represent the same attribute (author) and form a
repeating group. The 1NF correction is to move the repeating attribute into a separate child
table where each value occupies its own row, related back to the parent by a foreign key.
Achieving 1NF requires four steps: identify repeating groups, move them to their own relation, choose a primary key that uniquely identifies each row, and ensure every column contains atomic values. A table that stores comma-separated lists, arrays, or nested structures in a single cell violates 1NF regardless of any other properties it may have.
First Normal Form is necessary but not sufficient. A table in 1NF can still contain data redundancy, partial dependencies, transitive dependencies, and all three anomaly types. The 1NF requirements concern only the shape of the data - atomicity and the absence of repeating groups. They say nothing about the relationships between columns or about which entity each column actually describes.
A flat order table with columns for customer, order, and line item data in a single relation satisfies 1NF when all values are atomic. But that same table mixes facts about three distinct entities - customer, order, and item - and therefore stores customer information once per line item per order, creating exactly the redundancy and anomalies that normalization is designed to eliminate.
In a small number of specialized contexts - logging tables that are append-only, data warehouse fact tables that are never updated, or temporary ETL staging tables - stopping at 1NF may be defensible. In virtually all transactional OLTP systems, it is poor design.
Functional dependencies are the theoretical basis for all normalization beyond 1NF. A
functional dependency exists when the value of one attribute or set of attributes uniquely
determines the value of another attribute. In arrow notation: X → Y, read
as "X determines Y" or "Y is functionally dependent on X."
The direction of a functional dependency matters. In a Customer relation with primary key
CustID, the dependency CustID → CustName, Address, Phone is
valid: knowing the customer ID uniquely identifies all other fields. The reverse -
CustName, Address, Phone → CustID - is not valid, because multiple
customers can share the same name, address, or phone number.
The attribute or set of attributes on the left side of a functional dependency is called the determinant. Every determinant is a candidate key for the relation. A candidate key is any minimal set of attributes that could serve as a primary key - a set of attributes that uniquely identifies every row. The primary key is one candidate key that has been designated as the primary identifier. Alternate keys are the remaining candidate keys.
Two dependency types drive the transition from 1NF to 2NF and from 2NF to 3NF respectively.
A partial dependency occurs when a non-prime attribute (a column that is not
part of any candidate key) depends on only part of a composite primary key rather than the full
key. If the primary key is (OrderNo, ItemNo) and ItemTitle depends
only on ItemNo, that is a partial dependency. Partial dependencies are the target
of Second Normal Form.
A transitive dependency occurs when a non-prime attribute depends on another
non-prime attribute rather than directly on the primary key. If
EmployeeID → DeptID and DeptID → DeptName, then
DeptName is transitively dependent on EmployeeID through
DeptID. Transitive dependencies are the target of Third Normal Form.
Functional dependencies do not emerge from sample data alone - they reflect business rules. A dependency should be derived from requirements analysis: understanding what the data represents, what uniqueness constraints the business imposes, and how attributes relate to one another in the real world. Sample data can suggest dependencies, but those suggestions must always be verified against the actual business rules, not just the data currently in the table.
Twelve practical techniques for identifying functional dependencies include: analyzing business rules and requirements, examining sample data patterns, creating dependency diagrams, identifying candidate keys, evaluating attribute semantics, testing dependencies formally using Armstrong's axioms, looking for repeated data patterns, applying normalization steps progressively, conducting interviews or workshops, using ERD design tools, leveraging domain and historical knowledge, and verifying logical consistency through queries.
A practical two-question test for 2NF compliance: ask of each non-key column whether it is (1) an inherent characteristic of the entity the table represents, or (2) a foreign key linking the table to another table. If the answer to both is no, the field does not belong in that table and the table likely violates 2NF.
A relation is in Second Normal Form when it satisfies two conditions: it is in First Normal Form, and every non-prime attribute is fully functionally dependent on the entire primary key. The second condition only becomes relevant when the primary key is composite. A table with a single-column primary key that satisfies 1NF automatically satisfies 2NF, because there is no "part" of the key for a non-key attribute to partially depend on.
The Car table illustrates the 2NF problem. With primary key CarID, the columns
DealerID and DealerName are not properties of the car - they are
properties of the dealer. This functional dependency violation means that changing a dealer's
name requires updating every row for every car that dealer stocks. It also means that a dealer
cannot be recorded in the database until they have at least one car, and that deleting the last
car for a dealer removes the dealer's information entirely.
The five-step process for achieving 2NF: (1) verify the table is in 1NF, (2) identify all functional dependencies, (3) represent them in arrow notation to make partial dependencies visible, (4) identify columns that are not inherent characteristics or foreign keys of the entity the table represents, and (5) move those columns into a new table whose primary key is the part of the original composite key they depended on, and link the tables with a foreign key.
The general structural result of 2NF is: remove subsets of data that apply to multiple rows and place them in separate tables; create relationships between those tables through foreign keys. Each table after decomposition describes exactly one real-world entity, and every non-key column in that table describes a property of that entity.
Formally, a table T with functional dependency set F is in 2NF when, for any functional
dependency X → A implied by F where A is nonprime, X is not a proper subset
of any key of T. A database schema is in 2NF when all tables it contains satisfy this condition.
Normalization and entity-relationship modeling appear to be separate activities, but they produce nearly identical results because they analyze the same underlying reality from different angles. A relational schema produced by normalization and one produced by transforming an ER diagram into tables will converge on the same structure.
The normalization approach starts with a list of data items and a list of rules about how they are related. The ER modeling approach starts with entities and their relationships. Both ultimately produce tables where each table describes one entity, each column describes a property of that entity, and entities are linked through foreign keys. The consistency between functional dependencies identified during normalization and entities identified during ER modeling is the primary test that the design is on solid ground.
Database design is an iterative process. An initial design is created, checked against both the functional dependencies and the ER diagram, modified, and checked again. Functional dependencies and entities can be verified against each other at any stage in this process.
Second Normal Form eliminates partial dependencies, but it does not resolve all design problems. A table can be in 2NF and still contain transitive dependencies, data redundancy, and all three anomaly types. 2NF answers only one question: does every non-key attribute depend on the whole primary key? It does not ask whether non-key attributes depend on other non-key attributes.
The CD table provides a concrete example. With single-column primary key CDNo, the
table is automatically in 2NF. But CatName repeats across every CD in the same
category, creating update anomalies (renaming a category requires updating many rows), insertion
anomalies (a new category cannot be stored without a CD in it), and deletion anomalies (removing
the last CD in a category destroys the category record). The 2NF test passes; the table remains
poorly designed.
The Item/distributor example demonstrates transitive dependency. The table
Item(item_numb, title, distrib_numb, warehouse_phone_number) is in 2NF because its
primary key is a single column. But item_numb → distrib_numb and
distrib_numb → warehouse_phone_number, making the phone number transitively
dependent on the item number through the distributor. The same three anomalies result.
The five specific limitations of 2NF are: transitive dependencies are still permitted; redundancy can persist even without partial dependency; anomalies caused by transitive dependencies survive 2NF; 2NF is silent on many real-world constraints such as multivalued attributes and optional relationships; and mechanical over-normalization can introduce unnecessary join complexity without a justifying dependency. Third Normal Form addresses the first three by eliminating transitive dependencies.
Y → Z where
removal of any attribute from Y means the functional dependency no longer holds.The next module describes Third Normal Form.
Before moving on to the next module, take the following quiz to check your understanding of
normalization through 2NF:
Normalization -
Quiz