Normal Forms   «Prev  Next»

Lesson 12

Database Normalization: 1NF and 2NF Conclusion

This module introduced database normalization from the ground up. It began with the definition and purpose of normalization, worked through the requirements and limitations of First Normal Form, then covered the functional dependency theory that underpins Second Normal Form, and concluded with the practical process for achieving 2NF and understanding where it falls short.

The following sections summarize the key concepts from each lesson, reinforce the connections between them, and provide a glossary of the terms introduced throughout the module.

What Normalization Is and Why It Matters

Normalization is the process of applying increasingly stringent rules to a relational database to correct problems associated with poor design. The goal is to store each fact exactly once, make updates consistent, and prevent the three categories of anomaly that arise in poorly structured tables.

The three anomalies that normalization prevents are the same ones that motivate the entire process. An insertion anomaly occurs when a legitimate fact cannot be recorded until an unrelated fact exists - a new customer cannot be added until they place an order, or a new product cannot be added until someone buys it. An update anomaly occurs when the same fact is stored in multiple rows and a change to that fact must be applied to every row simultaneously - a missed row leaves the database in an inconsistent state. A deletion anomaly occurs when removing one fact unintentionally removes another - cancelling the last order for a customer also deletes the only record of that customer.

Normalization works equally well with top-down design (starting from an entity-relationship diagram) and bottom-up design (starting from a list of attributes). Most real projects blend both approaches. The normal forms provide a series of tests that confirm the schema is progressively more free of the structural conditions that allow anomalies to occur.

First Normal Form: The Foundation

First Normal Form (1NF) establishes the structural prerequisites that every relation must satisfy before normalization can proceed. A table is in 1NF when every column contains only atomic (indivisible) values and there are no repeating groups within a single row.

A repeating group occurs when a single row stores multiple values for the same attribute. The classic example is a Book table with columns Author1 and Author2: these two columns represent the same attribute (author) and form a repeating group. The 1NF correction is to move the repeating attribute into a separate child table where each value occupies its own row, related back to the parent by a foreign key.

Achieving 1NF requires four steps: identify repeating groups, move them to their own relation, choose a primary key that uniquely identifies each row, and ensure every column contains atomic values. A table that stores comma-separated lists, arrays, or nested structures in a single cell violates 1NF regardless of any other properties it may have.

What 1NF Does Not Guarantee

First Normal Form is necessary but not sufficient. A table in 1NF can still contain data redundancy, partial dependencies, transitive dependencies, and all three anomaly types. The 1NF requirements concern only the shape of the data - atomicity and the absence of repeating groups. They say nothing about the relationships between columns or about which entity each column actually describes.

A flat order table with columns for customer, order, and line item data in a single relation satisfies 1NF when all values are atomic. But that same table mixes facts about three distinct entities - customer, order, and item - and therefore stores customer information once per line item per order, creating exactly the redundancy and anomalies that normalization is designed to eliminate.

In a small number of specialized contexts - logging tables that are append-only, data warehouse fact tables that are never updated, or temporary ETL staging tables - stopping at 1NF may be defensible. In virtually all transactional OLTP systems, it is poor design.

Functional Dependencies: The Analytical Foundation

Functional dependencies are the theoretical basis for all normalization beyond 1NF. A functional dependency exists when the value of one attribute or set of attributes uniquely determines the value of another attribute. In arrow notation: X → Y, read as "X determines Y" or "Y is functionally dependent on X."

The direction of a functional dependency matters. In a Customer relation with primary key CustID, the dependency CustID → CustName, Address, Phone is valid: knowing the customer ID uniquely identifies all other fields. The reverse - CustName, Address, Phone → CustID - is not valid, because multiple customers can share the same name, address, or phone number.

The attribute or set of attributes on the left side of a functional dependency is called the determinant. Every determinant is a candidate key for the relation. A candidate key is any minimal set of attributes that could serve as a primary key - a set of attributes that uniquely identifies every row. The primary key is one candidate key that has been designated as the primary identifier. Alternate keys are the remaining candidate keys.

Two Types of Dependency Relevant to Normalization

Two dependency types drive the transition from 1NF to 2NF and from 2NF to 3NF respectively.

A partial dependency occurs when a non-prime attribute (a column that is not part of any candidate key) depends on only part of a composite primary key rather than the full key. If the primary key is (OrderNo, ItemNo) and ItemTitle depends only on ItemNo, that is a partial dependency. Partial dependencies are the target of Second Normal Form.

A transitive dependency occurs when a non-prime attribute depends on another non-prime attribute rather than directly on the primary key. If EmployeeID → DeptID and DeptID → DeptName, then DeptName is transitively dependent on EmployeeID through DeptID. Transitive dependencies are the target of Third Normal Form.

Identifying Functional Dependencies

Functional dependencies do not emerge from sample data alone - they reflect business rules. A dependency should be derived from requirements analysis: understanding what the data represents, what uniqueness constraints the business imposes, and how attributes relate to one another in the real world. Sample data can suggest dependencies, but those suggestions must always be verified against the actual business rules, not just the data currently in the table.

Twelve practical techniques for identifying functional dependencies include: analyzing business rules and requirements, examining sample data patterns, creating dependency diagrams, identifying candidate keys, evaluating attribute semantics, testing dependencies formally using Armstrong's axioms, looking for repeated data patterns, applying normalization steps progressively, conducting interviews or workshops, using ERD design tools, leveraging domain and historical knowledge, and verifying logical consistency through queries.

A practical two-question test for 2NF compliance: ask of each non-key column whether it is (1) an inherent characteristic of the entity the table represents, or (2) a foreign key linking the table to another table. If the answer to both is no, the field does not belong in that table and the table likely violates 2NF.

Second Normal Form: Eliminating Partial Dependencies

A relation is in Second Normal Form when it satisfies two conditions: it is in First Normal Form, and every non-prime attribute is fully functionally dependent on the entire primary key. The second condition only becomes relevant when the primary key is composite. A table with a single-column primary key that satisfies 1NF automatically satisfies 2NF, because there is no "part" of the key for a non-key attribute to partially depend on.

The Car table illustrates the 2NF problem. With primary key CarID, the columns DealerID and DealerName are not properties of the car - they are properties of the dealer. This functional dependency violation means that changing a dealer's name requires updating every row for every car that dealer stocks. It also means that a dealer cannot be recorded in the database until they have at least one car, and that deleting the last car for a dealer removes the dealer's information entirely.

The five-step process for achieving 2NF: (1) verify the table is in 1NF, (2) identify all functional dependencies, (3) represent them in arrow notation to make partial dependencies visible, (4) identify columns that are not inherent characteristics or foreign keys of the entity the table represents, and (5) move those columns into a new table whose primary key is the part of the original composite key they depended on, and link the tables with a foreign key.

The general structural result of 2NF is: remove subsets of data that apply to multiple rows and place them in separate tables; create relationships between those tables through foreign keys. Each table after decomposition describes exactly one real-world entity, and every non-key column in that table describes a property of that entity.

Formally, a table T with functional dependency set F is in 2NF when, for any functional dependency X → A implied by F where A is nonprime, X is not a proper subset of any key of T. A database schema is in 2NF when all tables it contains satisfy this condition.

The Relationship Between Normalization and ER Modeling

Normalization and entity-relationship modeling appear to be separate activities, but they produce nearly identical results because they analyze the same underlying reality from different angles. A relational schema produced by normalization and one produced by transforming an ER diagram into tables will converge on the same structure.

The normalization approach starts with a list of data items and a list of rules about how they are related. The ER modeling approach starts with entities and their relationships. Both ultimately produce tables where each table describes one entity, each column describes a property of that entity, and entities are linked through foreign keys. The consistency between functional dependencies identified during normalization and entities identified during ER modeling is the primary test that the design is on solid ground.

Database design is an iterative process. An initial design is created, checked against both the functional dependencies and the ER diagram, modified, and checked again. Functional dependencies and entities can be verified against each other at any stage in this process.

Limitations of Second Normal Form

Second Normal Form eliminates partial dependencies, but it does not resolve all design problems. A table can be in 2NF and still contain transitive dependencies, data redundancy, and all three anomaly types. 2NF answers only one question: does every non-key attribute depend on the whole primary key? It does not ask whether non-key attributes depend on other non-key attributes.

The CD table provides a concrete example. With single-column primary key CDNo, the table is automatically in 2NF. But CatName repeats across every CD in the same category, creating update anomalies (renaming a category requires updating many rows), insertion anomalies (a new category cannot be stored without a CD in it), and deletion anomalies (removing the last CD in a category destroys the category record). The 2NF test passes; the table remains poorly designed.

The Item/distributor example demonstrates transitive dependency. The table Item(item_numb, title, distrib_numb, warehouse_phone_number) is in 2NF because its primary key is a single column. But item_numb → distrib_numb and distrib_numb → warehouse_phone_number, making the phone number transitively dependent on the item number through the distributor. The same three anomalies result.

The five specific limitations of 2NF are: transitive dependencies are still permitted; redundancy can persist even without partial dependency; anomalies caused by transitive dependencies survive 2NF; 2NF is silent on many real-world constraints such as multivalued attributes and optional relationships; and mechanical over-normalization can introduce unnecessary join complexity without a justifying dependency. Third Normal Form addresses the first three by eliminating transitive dependencies.

Module Topics

  1. Define normalization - the process of applying increasingly stringent rules to a relational database to correct problems associated with poor design.
  2. Explain the requirements for First Normal Form (1NF) - a table is in 1NF if it contains no repeating groups and every column holds atomic values.
  3. Describe the limitations of 1NF - redundancy, partial dependencies, transitive dependencies, and anomalies all persist after 1NF is achieved.
  4. Identify repeating groups - an attribute that has more than one value in a table row.
  5. Explain the requirements for Second Normal Form (2NF) - a table is in 2NF when it is in 1NF and every non-key attribute is fully functionally dependent on the entire primary key.
  6. Identify functional dependencies - the one-way relationships between attributes where a value of one attribute uniquely determines a value of another.

Second Normal Form: Formal Summary

  1. Uses the concepts of functional dependencies and primary key.
  2. A prime attribute is an attribute that is a member of some candidate key K of the relation.
  3. Full functional dependency: a functional dependency Y → Z where removal of any attribute from Y means the functional dependency no longer holds.
  4. A relation schema R is in Second Normal Form (2NF) if every nonprime attribute A in R is fully functionally dependent on every key of R.

Glossary

  1. Candidate key: A field or combination of fields that can act as a primary key for a table, uniquely identifying each record. A relation may have multiple candidate keys.
  2. Concatenated primary key: A primary key made up of more than one field.
  3. Denormalization: To move a table to a lower normal form (for example, from 3NF to 2NF) for performance reasons, with documented justification.
  4. Determinant: The attribute or set of attributes on the left side of a functional dependency - the attribute whose value determines the values of other attributes. A determinant is always a candidate key.
  5. First Normal Form (1NF): A table is in first normal form if it contains no repeating groups and every column holds a single, atomic value.
  6. Functional dependency: A one-way relationship between two attributes such that, for any unique value of one attribute, there is only one value of the other attribute.
  7. Multi-field determinant: A determinant consisting of more than one field.
  8. Normal form: A defined standard structure for relational databases. Each successive normal form is more restrictive than the previous and eliminates a specific class of data anomaly.
  9. Normalization: The process of applying increasingly stringent rules to a relational database to correct problems associated with poor design.
  10. Partial dependency: A functional dependency in which a non-prime attribute depends on only part of a composite primary key rather than the full key. Eliminated by 2NF.
  11. Repeating groups: An attribute that has more than one value in a table row, violating First Normal Form.
  12. Requirements analysis: The stage in the database design process when designers determine what data the system needs to store and the conditions under which that data is accessed.
  13. Second Normal Form (2NF): A table is in second normal form when it is in 1NF and every non-prime attribute is fully functionally dependent on the entire primary key.
  14. Transitive dependency: A functional dependency in which a non-prime attribute depends on another non-prime attribute rather than directly on the primary key. Eliminated by 3NF.

The next module describes Third Normal Form.

Normalization - Quiz

Before moving on to the next module, take the following quiz to check your understanding of normalization through 2NF:
Normalization - Quiz

[1]normalization: The process of applying increasingly stringent rules to a relational database to correct any problems associated with poor design.

SEMrush Software 12 SEMrush Banner 12