| Lesson 6 | Primary and Foreign Keys |
| Objective | Explain common mistakes involving primary and foreign keys in relational database design. |
Primary and foreign keys form the backbone of relational database integrity. The primary key uniquely identifies each record within a table, while the foreign key establishes a link between related tables. Poorly defined keys can lead to redundancy, data anomalies, and referential integrity problems. Below are the most frequent mistakes made when defining these crucial columns.
Every table in a relational database must have a primary key. Without one, there’s no guaranteed way to uniquely identify a row, leading to duplicate or inconsistent data. Always define an explicit primary key rather than relying on implicit uniqueness.
Primary keys should never contain meaningful or changing business data such as phone numbers, email addresses, or social security numbers. These values can change or be duplicated, violating key constraints. Instead, create a surrogate key—an arbitrary, immutable identifier such as an integer sequence or UUID. This ensures long-term consistency even when business rules evolve.
Foreign keys enforce the logical connection between tables, ensuring that relationships remain valid over time. Omitting foreign key constraints allows orphaned records to accumulate, breaking referential integrity. Always identify and declare foreign key relationships explicitly when you create your schema.
A foreign key in one table typically references a primary key in another table—but it can also reference any unique key. The essential requirement is that the referenced column(s) must be unique, allowing the database to maintain one-to-one or one-to-many consistency. This flexibility enables designers to reference candidate keys when appropriate.
When resolving many-to-many (M:N) relationships, you create a linking table—also known as an associative entity. This table must include the primary keys from both related entities, which together form a composite primary key. For example:
CREATE TABLE StudentCourses (
student_id INT NOT NULL,
course_id INT NOT NULL,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
This approach ensures that each student–course pair is unique and prevents redundant enrollment records.
While normalization is essential to eliminate redundancy, taking it to extremes can reduce clarity and performance. Highly normalized schemas scatter related data across many small tables, increasing query complexity. Consider your workload and access patterns: if most data interactions are mediated through stored procedures or a controlled application layer, you can sometimes relax normalization slightly without risking data inconsistency.
For instance, one overzealous designer might split every attribute into its own table, producing a schema that is technically in Fifth Normal Form (5NF) but functionally unusable. A practical schema balances normalization with usability and performance.
Effective key design lies in finding balance—strong enough to enforce data integrity, yet flexible enough to evolve. Aim for:
The next lesson will explore mistakes related to relationships and referential integrity in greater depth.