| Lesson 6 |
An Example of a Customer Table |
| Objective |
Describe what a table looks like if it is not normalized. |
Unnormalized Database Table Example (Customer Orders and Data Anomalies)
The previous lesson defined normalization and described the three anomalies it prevents. This lesson makes those concepts concrete by examining two real unnormalized tables — a customer orders table and a bookstore sales table — and working through exactly what goes wrong with each one. By the end of this lesson you will be able to look at any table and identify the redundancy patterns and anomaly risks that signal it needs to be normalized.
The Customer Orders Table — An Unnormalized Starting Point
The table below combines customer information and order information into a single flat structure. Examine it carefully before reading the analysis — the problems should become visible once you know what to look for.
| CustID |
Name |
City |
Item_ID |
Description |
Qty |
Total |
| 000001 |
Smith |
Tucson |
100101 |
Green Widgets |
1 |
$50.00 |
| 000001 |
Smith |
Tucson |
100102 |
Blue Widgets |
2 |
$100.00 |
| 000001 |
Smith |
Tucson |
100103 |
Yellow Widgets |
1 |
$50.00 |
| 000002 |
Jones |
L.A. |
100101 |
Green Widgets |
2 |
$100.00 |
| 000002 |
Jones |
L.A. |
100106 |
Orange Widgets |
1 |
$50.00 |
Relational Database Design
What the Table Contains
The table has seven columns spanning two distinct subjects. The first three columns — CustID, Name, and City — describe the customer. The remaining four columns — Item_ID, Description, Qty, and Total — describe the order line. Combining these two subjects into one table is the root cause of every problem this lesson will identify.
This is a five-row table representing two customers and five order lines. Customer Smith (CustID 000001) placed three orders: Green Widgets, Blue Widgets, and Yellow Widgets. Customer Jones (CustID 000002) placed two orders: Green Widgets and Orange Widgets. The data is simple and the volume is small — but even at this scale the problems are already present and measurable.
The Redundancy Problem — Quantified
Smith's name "Smith" and city "Tucson" each appear three times in the table — once for every order line. Jones's name and city each appear twice. In a table with five rows and two customers, four of the ten customer-attribute values stored are pure duplicates. That is a 40% redundancy rate on the customer columns before the table has grown beyond a handful of rows.
Scale this to a real retail system with ten thousand customers and an average of fifteen orders each. The customer name and city would each be stored 150,000 times instead of 10,000 times — 140,000 duplicate values per column. Add a full mailing address (street, state, zip, country) and the storage waste becomes significant. More importantly, every one of those duplicate values is an independent opportunity for the data to become inconsistent.
Notice also that the table has no unit price column. The Total column shows the total cost of each order line, but the per-unit price of each item is not stored anywhere. Green Widgets appear in two rows: Smith's order (1 unit, $50.00) and Jones's order (2 units, $100.00). The unit price of $50.00 per Green Widget can be inferred by dividing, but it is not explicit. If the price changes, there is no single value to update — the price is buried inside the Total column, mixed with the quantity. This is a design flaw separate from the customer data redundancy: the table is storing a derived value (Total) without storing the inputs needed to verify or recalculate it (unit price).
The Missing Primary Key
A primary key must uniquely identify each row in a table. In this table, CustID alone cannot be the primary key — Smith has three rows all with CustID 000001. Item_ID alone cannot be the primary key either — Green Widgets (Item_ID 100101) appears in two rows for two different customers.
The combination of CustID and Item_ID could serve as a composite primary key — the pair (000001, 100101) identifies exactly one row. But this composite key encodes a business assumption: that a customer can order each item only once. If Smith placed a second order for Green Widgets on a different date, the composite key would be violated. The absence of a proper primary key is not just a technical inconvenience — it reflects a deeper modeling problem: the table is trying to represent two entities (customers and order lines) with one key structure, and no single key structure works cleanly for both.
Anomaly Analysis — What Goes Wrong
The three data anomalies introduced in lesson 5 — insertion, update, and deletion — are all present in this table. Working through each one using the actual Smith and Jones data makes them concrete.
Insertion Anomaly in the Customer Table
Suppose a new customer, Williams of Phoenix, has been set up in the system and assigned CustID 000003, but has not yet placed any orders. There is no way to record Williams in this table. Every column in the table either describes the customer (CustID, Name, City) or describes an order line (Item_ID, Description, Qty, Total). Without an order line, the Item_ID, Description, Qty, and Total columns have no values to store — and if those columns are defined as NOT NULL (as they should be in a well-constrained table), the INSERT will be rejected by the database engine.
The customer exists. The business knows about the customer. But the schema has no place for a customer without a transaction. This is the insertion anomaly: valid data cannot be recorded because the table structure requires unrelated data to accompany it.
Update Anomaly in the Customer Table
Smith moves from Tucson to Denver. To update Smith's city, every row where CustID = 000001 must be updated simultaneously — three rows in this small table. The SQL is straightforward:
UPDATE orders
SET City = 'Denver'
WHERE CustID = '000001';
If this UPDATE executes successfully, all three rows are corrected. But if the transaction is interrupted — by a network timeout, an application crash, or a concurrent lock conflict — it is possible for one or two rows to be updated while the third retains "Tucson." The database now contains two different cities for the same customer. Every subsequent query that reads Smith's city may return a different answer depending on which row it encounters first. The data is inconsistent, and the inconsistency is invisible unless you specifically query for customers where the city value differs across rows.
In a normalized schema, Smith's city is stored in exactly one row in a Customers table. One UPDATE statement touches one row. There is no partial-update risk and no possibility of inconsistency.
Deletion Anomaly in the Customer Table
Jones decides to cancel all outstanding orders. All rows where CustID = 000002 are deleted. But those two rows are the only place where Jones's name and city are recorded. After the DELETE, the database contains no knowledge that Jones ever existed — not as a customer, not as a contact, not as an account. If the business needs to send Jones a cancellation confirmation, or retain Jones's record for compliance purposes, or simply know who cancelled their orders, that information is gone.
The deletion anomaly is particularly dangerous because it is silent. The DELETE succeeds without error. No constraint is violated. The database engine has no way to know that it just destroyed customer data alongside the order data, because both lived in the same rows.
SQL Consequences of the Unnormalized Structure
The anomalies above describe data integrity risks. The unnormalized structure also creates immediate practical problems for the SQL queries you write against the table — queries that should be simple become unnecessarily complex.
Querying for Unique Customers Requires Extra Work
A straightforward business question: how many distinct customers are in the system? Against a normalized Customers table, the answer is trivial:
SELECT COUNT(*) FROM Customers;
Against the unnormalized table, the same question requires deduplication:
SELECT COUNT(DISTINCT CustID) FROM orders;
This works, but it requires the developer to know that the table contains duplicate CustID values and to compensate for that with DISTINCT. Every developer who queries this table must carry that knowledge. If they forget, they get the wrong answer — five rows instead of two customers — without any error message to alert them.
Retrieving a list of customers with their cities has the same problem:
-- Wrong: returns five rows, one per order line
SELECT CustID, Name, City FROM orders;
-- Correct: requires GROUP BY or DISTINCT to deduplicate
SELECT DISTINCT CustID, Name, City FROM orders;
The DISTINCT keyword here is compensating for a schema design problem. It should not be necessary. The fact that it is necessary is a signal that the table is not modeling its data correctly.
Aggregation Becomes Ambiguous
Another business question: what is the total amount spent by each customer? Against the unnormalized table:
SELECT CustID, Name, SUM(Total) AS total_spent
FROM orders
GROUP BY CustID, Name;
This query works, but Name must appear in the GROUP BY clause alongside CustID — even though Name is not what we are grouping by conceptually. We are grouping by customer identity (CustID), and Name is just a label for that identity. Including Name in GROUP BY is harmless here because Name is functionally dependent on CustID — every row with CustID 000001 has Name "Smith." But if the data were inconsistent (an update anomaly had partially changed Smith's name), the GROUP BY would split Smith's rows into separate groups, producing incorrect totals.
In a normalized schema, this query joins two tables cleanly:
SELECT c.CustID, c.Name, SUM(o.Total) AS total_spent
FROM Customers c
JOIN Orders o ON c.CustID = o.CustID
GROUP BY c.CustID, c.Name;
The join makes the relationship explicit. The GROUP BY groups by the customer's primary key. The Name value comes from one authoritative source. There is no ambiguity and no risk of inconsistency affecting the aggregation result.
A Second Domain — The Bookstore Table
The customer orders table illustrates unnormalized structure in a retail context. The same patterns appear in every domain where data about multiple entities is combined into a single table. A bookstore sales table provides a second concrete example that reinforces the same lessons in a different context.
| BookID |
Title |
Author |
CustomerID |
CustomerName |
CustomerAddress |
SaleDate |
| 1 |
1984 |
Orwell |
101 |
John Doe |
123 Main St, Cityville |
2023-10-01 14:00 |
| 2 |
Brave New World |
Huxley |
101 |
John Doe |
123 Main St, Cityville |
2023-10-05 10:30 |
| 3 |
1984 |
Orwell |
102 |
Jane Smith |
456 Elm St, Townsville |
2023-10-02 15:45 |
| 4 |
To Kill a Mockingbird |
Lee |
101 |
John Doe |
123 Main St, Cityville |
2023-10-06 09:15 |
The Same Patterns in a Different Context
This table combines three subjects: books, customers, and sales transactions. John Doe's name and address appear three times — once for each purchase. The book "1984" by Orwell appears twice — once for John Doe's purchase and once for Jane Smith's purchase. The same redundancy, the same anomaly risks, and the same SQL complications that appeared in the customer orders table appear here.
The insertion anomaly: a new book cannot be added to the catalog until someone buys it. A new customer cannot be recorded until they make a purchase. The update anomaly: if John Doe moves, three rows must be updated simultaneously. If "1984" is re-catalogued under a different author attribution, two rows must be updated. The deletion anomaly: if BookID 1 is the only sale record linking John Doe to the database, deleting that sale erases John Doe entirely.
The CustomerAddress column introduces an additional problem not present in the customer orders table: composite attributes. The address "123 Main St, Cityville" combines street, city, and implicitly a zip code and country into a single column value. This violates 1NF atomicity. If the business later needs to query customers by city, or sort by zip code, or filter by country, the CustomerAddress column cannot support those queries without string parsing — a fragile, error-prone approach that no well-designed schema should require.
Characteristics of Unnormalized Tables
Both tables share the same set of structural characteristics that identify an unnormalized design. Understanding these characteristics as a checklist lets you diagnose any table you encounter, regardless of domain.
Redundant data is the most visible symptom. When the same value appears in multiple rows — a customer name, a city, a book title, an author — the table is storing facts about an entity that should have its own table. Redundancy is not just a storage problem; it is an integrity problem waiting to happen.
Data anomalies — insertion, update, and deletion — flow directly from redundancy. If a fact is stored in one place, changing or removing it is a single operation. If it is stored in many places, every operation on that fact requires coordinating multiple rows, and any failure in that coordination produces inconsistent data.
Lack of a clean primary key signals that the table is modeling multiple entities. When no single column or simple combination of columns can uniquely identify every row without encoding business assumptions, the table is almost certainly mixing entity types that should be separated.
Multi-valued attributes — columns that could contain multiple values for a single entity — violate 1NF atomicity. A phone number column that might contain "555-1234, 555-5678" for a customer with two phones, or an items column that lists multiple purchased products in one cell, are multi-valued attributes. Each value belongs in its own row in a properly structured table.
Composite attributes — columns that combine multiple distinct data points into one value — violate atomicity at the sub-field level. CustomerAddress is the clearest example: street, city, state, zip, and country are five separate facts combined into one string. Each should be its own column if the application ever needs to filter, sort, or validate them independently.
Transitive dependencies — where a non-key column depends on another non-key column rather than directly on the primary key — are the structural signature of a 3NF violation. In the bookstore table, Author depends on Title, and Title depends on BookID. Author is therefore transitively dependent on BookID through Title. Author belongs in a Books table where it is directly dependent on BookID, not in a Sales table where it is present only because Title dragged it along.
Complex queries for simple questions are the operational symptom of all the structural problems above. When retrieving a list of unique customers, or calculating per-customer totals, or finding all orders for a specific item requires DISTINCT, GROUP BY workarounds, or string parsing, the schema is forcing query complexity that normalization would eliminate.
Weak data integrity is the cumulative consequence. An unnormalized schema cannot enforce the consistency guarantees that a normalized schema provides through foreign key constraints and single-source storage. Data quality becomes a problem that application code must manage — and application code is far less reliable than database constraints for maintaining long-term data consistency.
How Normalization Resolves These Issues
Both tables are resolved by the same process: identify the distinct entities, give each entity its own table with its own primary key, and represent relationships between entities through foreign keys.
Decomposing the Customer Orders Table
The customer orders table contains three distinct entities: customers, items, and order transactions. Normalization produces three tables:
A Customers table stores each customer exactly once: (CustID, Name, City). Smith appears in one row. Jones appears in one row. Adding Williams requires one INSERT into one table with no order data required. Changing Smith's city requires one UPDATE to one row. Deleting Jones's orders does not touch the Customers table — Jones's customer record remains intact.
An Items table stores each product exactly once: (Item_ID, Description, UnitPrice). Green Widgets appear in one row with their unit price stored explicitly — the derived Total value problem is resolved because the price is now an authoritative stored fact, not a value inferred from dividing Total by Qty. If the price of Green Widgets changes, one row in the Items table is updated and every future order automatically reflects the new price.
An Orders table stores each transaction: (OrderID, CustID, Item_ID, Qty, Total). CustID is a foreign key referencing the Customers table. Item_ID is a foreign key referencing the Items table. The database engine enforces referential integrity — no order can reference a customer or item that does not exist, and no customer or item can be deleted while orders reference it (unless CASCADE is specified).
Decomposing the Bookstore Table
The bookstore table contains three entities: books, customers, and sales. Normalization produces:
A Books table: (BookID, Title, Author). Each book appears once. The transitive dependency (Author depending on Title) is eliminated because Author now lives directly alongside its BookID primary key.
A Customers table: (CustomerID, CustomerName, Street, City, State, Zip). The composite CustomerAddress column is decomposed into atomic columns — each addressable independently for filtering, sorting, and validation.
A Sales table: (SaleID, BookID, CustomerID, SaleDate). BookID and CustomerID are foreign keys. A new customer can be added without a sale. A new book can be catalogued without a purchase. Deleting a sale record leaves both the book and the customer intact.
What Each Decomposed Table Is Responsible For
The result of normalization is a schema where every table has exactly one responsibility: one entity, one set of facts, one primary key. The Customers table is responsible for customer identity and contact information — nothing else. The Items or Books table is responsible for product or catalog information — nothing else. The Orders or Sales table is responsible for recording the transaction — linking a customer to a product at a specific time and quantity — nothing else.
This single-responsibility structure is what makes a normalized schema reliable to query, easy to maintain, and straightforward to extend. When a new requirement emerges — add a phone number column to Customers, add a publisher column to Books, add a discount column to Orders — the change affects exactly one table and does not disturb the others. That extensibility is the practical dividend that normalization pays over the lifetime of a database system.
The next lesson examines the normalization of the customer orders table step by step, showing each decomposition decision and the resulting normalized schema.
