Boyce-Codd Normal Form (BCNF), also referred to as 3.5NF, is a stricter version of Third Normal Form (3NF) used in relational database normalization. A table is in BCNF if and only if, for every non-trivial functional dependency (FD) of the form A → B, A is a superkey. This ensures that all determinants in functional dependencies are capable of uniquely identifying a tuple, eliminating certain anomalies not addressed by 3NF.
Understanding BCNF
A relation is in BCNF when:
- It is already in 3NF (no partial or transitive dependencies).
- For every functional dependency A → B, where A determines B, A must be a superkey (a set of attributes that uniquely identifies a tuple).
BCNF addresses anomalies that arise when a non-superkey attribute determines another attribute, which can lead to redundant data and update inconsistencies.
Example of a BCNF Violation and Resolution
Consider a table
CourseAssignment
that tracks which instructor teaches which student a specific subject:
Student |
Instructor |
Subject |
Alice |
Dr. Smith |
Math |
Bob |
Dr. Smith |
Math |
Alice |
Dr. Jones |
Physics |
- Candidate Key: {Student, Subject} (uniquely identifies each tuple).
- Functional Dependency: Instructor → Subject (each instructor teaches only one subject).
- Problem: The table is in 3NF because there are no partial or transitive dependencies. However, it violates BCNF because
Instructor
is not a superkey, yet it determines Subject
.
- Anomaly: If Dr. Smith changes the subject they teach (e.g., from Math to Calculus), multiple rows must be updated, risking inconsistencies.
Resolving the BCNF Violation
To achieve BCNF, decompose the table into two tables:
1. InstructorSubject Table (captures the dependency Instructor → Subject):
Instructor |
Subject |
Dr. Smith |
Math |
Dr. Jones |
Physics |
2. StudentInstructor Table (links students to instructors):
Student |
Instructor |
Alice |
Dr. Smith |
Bob |
Dr. Smith |
Alice |
Dr. Jones |
- Primary Key: {Student, Instructor}
- Foreign Key: Instructor references InstructorSubject(Instructor)
This decomposition ensures both tables are in BCNF because all functional dependencies have superkeys as determinants, and it eliminates the update anomaly.
Comparison of 3NF and BCNF
The following table outlines the similarities and differences between 3NF and BCNF:
Properties |
3NF |
BCNF |
Achievability |
Always achievable with lossless decomposition |
Not always achievable with lossless decomposition |
Data Redundancy |
May retain some redundancy due to non-key determinants |
Eliminates redundancy from non-superkey determinants |
Non-key Determinants |
Allows non-key attributes as determinants if they are part of a candidate key |
Does not allow non-superkey attributes as determinants |
Proposed by |
Edgar F. Codd |
Raymond F. Boyce and Edgar F. Codd |
Why BCNF Matters
BCNF is based on functional dependencies, where an attribute (or set of attributes) A determines another attribute B (A → B). By ensuring that A is always a superkey, BCNF prevents anomalies such as:
- Insertion Anomalies: Difficulty adding data due to dependency constraints.
- Update Anomalies: Inconsistent updates when a determinant changes.
- Deletion Anomalies: Loss of valid data when removing a tuple.
For example, in the
CourseAssignment
table, updating an instructor’s subject required changing multiple rows. The BCNF decomposition ensures updates are made in one place (the
InstructorSubject
table).
Higher Normal Forms
BCNF is a stronger definition of 3NF, proposed by Raymond F. Boyce and Edgar F. Codd. Beyond BCNF, Fourth Normal Form (4NF) addresses multivalued dependencies, and Fifth Normal Form (5NF) deals with join dependencies, further refining database design to eliminate complex anomalies.
- Conclusion:
- The schema is refactored into BCNF by decomposing tables to ensure all determinants in functional dependencies are superkeys.
- The decomposition process, as shown in the
CourseAssignment
example, eliminates anomalies while preserving data integrity.
- BCNF provides a robust foundation for database design, reducing redundancy and ensuring consistent data management.