Lesson 1

Third Normal Form (Intro)

Sound relational database design means taking the time to ensure your database conforms to certain rules. This module introduces techniques you can use to improve your database designs without worrying about the relational math.

When we design a database for an enterprise, the main objective is to

create an accurate representation of the data,
relationships between the data, and
constraints on the data that is pertinent to the enterprise.

To help achieve this objective, we can use one or more database design techniques. In this module we describe another database design technique called normalization^[1]. Normalization is a database design technique, which begins by examining the relationships (called functional dependencies) between attributes. Attributes describe some property of the data or of the relationships between the data that is important to the enterprise.

Learning Objectives

After completing the lessons in this module, you should be able to:

Explain the requirements for third normal form (3NF)
Identify transitive dependencies
Normalize a relation to 3NF
Describe Codd's 12 criteria for a fully relational RDBMS
Explain how Codd's 12 criteria relate to normalization
Explain what type of dependencies might require normalization beyond 3NF
Define denormalization and identify when it might be useful

Third Normal Form

As a data modeler working on normalizing database tables, understanding Third Normal Form (3NF) is crucial. 3NF is a level of database normalization that aims to reduce the duplication of data and ensure data integrity by organizing the data within a relational database. In essence, Third Normal Form is achieved when a database design meets all the requirements of the Second Normal Form (2NF) and additionally satisfies the following condition: every non-prime attribute of the table is non-transitively dependent on the primary key. In simpler terms, this means that there should be no transitive dependencies for non-key attributes. To elaborate, a transitive dependency in a database occurs when a non-key column depends on another non-key column, which in turn depends on the primary key. 3NF seeks to eliminate these transitive dependencies. By doing so, it ensures that:

The data stored in the database is free from insertion, update, and deletion anomalies. This means that the database structure allows efficient data manipulation and ensures the integrity of the data.
Every non-key attribute is directly dependent on the primary key, and not on any other non-key attribute. This reduces redundancy and dependency, making the database more streamlined and easier to maintain.

However, it's important to note that while 3NF can significantly improve database design, it may not always be practical or necessary to achieve this form in every database scenario. The decision to normalize a database to the third normal form should consider the specific requirements and constraints of the application, including factors like database performance, complexity, and the nature of the data being stored.
The next lesson discusses limitations of second normal form.

The the diagram show above is an Entity-Relationship Diagram (ERD) for a database in Third Normal Form (3NF).
Here are the characteristics of this diagram:

Normalization: The database design appears to be normalized to the third normal form. This is indicated by the separation of concerns, the absence of transitive dependencies, and the use of primary (PK) and foreign keys (FK).

Entities: There are five entities represented in the diagram: `Student`, `Course`, `Teacher`, `Subject`, and `Address Code`.

Primary Keys (PK): Each entity has a primary key, which uniquely identifies a record within the entity. These are `student ID`, `course ID`, `teacher ID`, `subject ID`, and `address code ID`.

Foreign Keys (FK): There are foreign keys used to create relationships between the entities. These are indicated as FK and they reference primary keys of other entities, ensuring referential integrity.

Attributes: Each entity has a set of attributes. For example, `Student` has `student name`, `fees paid`, `date of birth`, etc.

Relationships:

`Student` has a many-to-one relationship with `Course`, indicating that each student is enrolled in one course, but each course can have many students.

`Course` has a many-to-one relationship with `Teacher`, indicating that each course is taught by one teacher, but each teacher can teach many courses.

`Subject Enrollment` is a junction table that resolves the many-to-many relationship between `Student` and `Subject`, containing both `student ID` and `subject ID` as foreign keys.

`Student`, `Teacher`, and `Address Code` are related, indicating that both students and teachers have an address which is referenced from the `Address Code` entity.

Modular Design: The modular design of the database facilitates data integrity and reduces redundancy. For instance, addresses are stored once in the `Address Code` entity and referenced by `Student` and `Teacher` entities, rather than repeating address information in each record.

Referential Integrity: The use of foreign keys ensures that relationships between tables remain consistent. For example, a student's record can't reference an `address code ID` that doesn't exist in the `Address Code` entity.

Junction Table: `Subject Enrollment` is a junction table that connects students to subjects they are enrolled in, which is a characteristic way to handle many-to-many relationships in relational databases.

This design supports efficient querying and updates because it minimizes redundancy and ensures that each piece of information is stored only once. It also helps prevent update anomalies and maintains data integrity.

Polynomial Regression

Not all relationships are linear.
Linear Equation: y = mx + b
This is a first order or first degree polynomial, where the power of x is 1.
Higher order equations produce more complex curves.

Quadratic Equation:

x = \frac{- b \pm \sqrt{b^{2} - 4 \cdot a \cdot c}}{2 \cdot a}

Third order Equation:

y = a x^{3} + b x^{2} + c x + d

[1]Normalization: Normalization uses a series of tests (described as normal forms) to help identify the optimal grouping for these attributes to ultimately identify a set of suitable relations that supports the data requirements of the enterprise