| Lesson 6 | Limitations of First Normal Form (1NF) |
| Objective | Describe the limitations of first normal form and explain why databases in 1NF alone are still prone to redundancy and anomalies. |
First Normal Form (1NF) is the foundational level of database normalization. It establishes two structural requirements that every relation in a well-designed database must satisfy:
Phone1, Phone2, Phone3 columns, no comma-separated lists of values, no multi-valued
attributes embedded in a single field.These requirements eliminate the most obvious structural problems that arise when relational data is stored in flat, spreadsheet-like structures. A table that violates 1NF cannot be reliably queried, indexed, or joined in standard SQL. Achieving 1NF is therefore a prerequisite for everything that follows in the normalization process.
However, 1NF is only a prerequisite - it is not a destination. 1NF says nothing about redundancy, functional dependencies, or data anomalies. A table can satisfy every requirement of 1NF and still contain serious design problems that will cause data integrity failures in production. Understanding what 1NF does not guarantee is as important as understanding what it does.
The 1NF requirements concern the shape of the data - atomicity and the absence of repeating groups. They say nothing about the meaning of the data or the relationships between columns. Specifically, 1NF does not address:
A table can be perfectly 1NF-compliant and still suffer from all four of these problems simultaneously. The normalization levels that follow 1NF - Second Normal Form (2NF) and Third Normal Form (3NF) - were designed specifically to address these remaining issues.
Consider a flat order table of the kind commonly found in legacy systems and early database designs:
Order (
CustID, CustName, CustAddress, CustPhone,
OrderNo, OrderDate,
ItemNo, ItemTitle, ItemPrice, Qty, LineTotal, ShippedVia, ShippedDate
)
This schema has no repeating groups and every cell holds a single value - it satisfies 1NF. Yet it mixes data about three distinct real-world entities in a single table: customers, orders, and items. A customer can place many orders, and each order can contain many items, so the same customer data appears on every row for every item on every order that customer has ever placed.
The table exhibits all four classic normalization problems:
| Problem | Description | Consequence |
|---|---|---|
| Insertion Anomaly | Cannot add a new customer without placing an order. Cannot add a new inventory item until someone orders it. | Legitimate data cannot be entered until unrelated data exists. |
| Update Anomaly | Changing a customer's phone number requires updating every row for that customer - one row per item on every order they have placed. | If any row is missed, the database contains contradictory phone numbers for the same customer. |
| Deletion Anomaly | Cancelling an order deletes the only record of the customer if it was their only order, and deletes item details if no other order contains that item. | Accidental and irreversible loss of master data when operational data is removed. |
| Redundancy | Customer name, address, and phone are stored on every row for every item on every order. Item title and price are stored on every row that includes that item. | Wasted storage, slower queries, and increased risk of inconsistency. |
Each of these problems is a direct consequence of mixing data about multiple entities in a single table. The 1NF requirements did nothing to prevent this - they only ensured that the columns are atomic and that there are no repeating groups within a row.
The correct primary key for the order-line data in this flat table is composite:
(OrderNo, ItemNo). Each row represents one item on one order, so the combination of
order number and item number uniquely identifies each row.
Examining the functional dependencies in this table reveals the core problem. A functional dependency exists when the value of one column or set of columns determines the value of another column. In the flat Order table:
OrderNo → OrderDate, CustID, CustName, CustAddress - these columns depend
only on the order part of the composite key, not on the item part.ItemNo → ItemTitle, ItemPrice - these columns depend only on the item part
of the composite key, not on the order part.CustID → CustName, CustAddress, CustPhone - these columns depend on CustID,
which is itself a non-key column.The first two examples are partial dependencies: a non-prime attribute (a column that is not part of the primary key) depends on only part of a composite key rather than the full key. The third example is a transitive dependency: a non-prime attribute depends on another non-prime attribute rather than directly on the primary key.
1NF does nothing to eliminate either type of dependency. Partial dependencies are the target of Second Normal Form; transitive dependencies are the target of Third Normal Form. A table in 1NF with a composite key and partial dependencies is not in 2NF. A table in 2NF with transitive dependencies is not in 3NF.
The connection between partial dependencies and data anomalies is direct. Because
CustName and CustAddress depend on OrderNo rather than on
the full key (OrderNo, ItemNo), the same customer information is stored once per item
on each order. A customer who places a single order for five items has their name and address stored
five times. A customer who places ten orders with five items each has their name and address stored
fifty times.
This redundancy is the root cause of the update anomaly: changing the customer's address requires finding and updating all fifty rows. It is the root cause of the deletion anomaly: deleting the last item row on the last order deletes the customer's information entirely. And it is the root cause of the insertion anomaly: there is no way to record a customer's information until they have placed at least one order.
The solution is decomposition: separating the flat table into multiple tables, each describing exactly one real-world entity. Each table has its own primary key and contains only columns that depend on that key. The tables are then linked through foreign keys.
Applying 2NF and 3NF to the flat Order table produces four tables:
Customer (CustID PK, CustName, CustAddress, CustPhone)
Item (ItemNo PK, Title, CurrentPrice, ...)
Order (OrderNo PK, OrderDate, CustID FK REFERENCES Customer)
OrderLine (
OrderNo REFERENCES Order,
ItemNo REFERENCES Item,
Qty,
PriceAtSale, -- historical price at time of sale
ShippedVia,
ShippedDate,
PRIMARY KEY (OrderNo, ItemNo)
)
This decomposition eliminates every anomaly that the flat table suffered from:
PriceAtSale column in
OrderLine records the price at the time of the transaction. Changing the current price in the
Item table does not alter the historical record of what was charged on past orders.In a small number of specialized contexts, stopping at 1NF may be defensible:
In all three cases the justification is that the anomalies that normalization prevents do not arise because the tables are not subject to the operations - insert, update, delete - that trigger those anomalies. In virtually all transactional OLTP systems, however, stopping at 1NF is poor design that will cause data integrity problems in production.
Use the following checklist to assess whether a table needs further normalization:
First Normal Form is the essential starting point for database normalization, but it is only the first step. It guarantees that every cell contains a single, atomic value and that there are no repeating groups within a row. It does not guarantee that the table is free of redundancy, partial dependencies, transitive dependencies, or data anomalies.
A 1NF-compliant table that mixes data about multiple entities - customers, orders, and items in a single flat structure - will exhibit insertion anomalies, update anomalies, deletion anomalies, and data redundancy. These problems are not structural violations of 1NF; they are logical violations of higher normal forms.
Moving to Second Normal Form eliminates partial dependencies by ensuring that every non-key column depends on the full primary key. Moving to Third Normal Form eliminates transitive dependencies by ensuring that every non-key column depends directly on the primary key rather than on another non-key column. Both steps are almost always necessary for a robust, maintainable relational database design.
The next lesson formally defines Second Normal Form and shows exactly how to eliminate partial dependencies.