Introduction to First, Second, and Third Normal Forms used in Relational Database Analysis
Relational databases work because they are based on logically sound elements of set theory.
Fortunately, you do not have to become an expert in a theoretical branch of mathematics to design efficient databases.
If you learn a few of the principles without delving into the underlying theory, you will be a better designer.
This module introduces techniques you can use to improve your database designs without worrying about the math.
As with many design problems, database design may be performed using two approaches: 1) bottom-up or 2) top-down.
A bottom-up design methodology (also called design by synthesis) considers the basic relationships among individual attributes as the starting point and uses those to construct relation schemas. This approach is not very popular in practice because it suffers from the problem of having to collect a large number of binary relationships among attributes as the starting point.
For practical situations, it is next to impossible to capture binary relationships among all such pairs of attributes. In contrast, a top-down design methodology (also called design by analysis) starts with a number of groupings of attributes into relations that exist together naturally (for example, on an invoice, a form, or a report).
The relations are then analyzed individually and collectively, leading to further decomposition until all desirable properties are met. The theory described in this module is applicable to both the 1) top-down and 2) bottom-up design approaches, but is more appropriate when used with the top-down approach.
Relational database design ultimately produces a set of relations. The implicit goals of the design activity are a) information preservation and b) minimum redundancy. Information is very hard to quantify, and for this reason we consider information preservation in terms of maintaining all concepts,
including attribute types, entity types, and relationship types as well as relationships, which are described using a model such as the ERD model.
Thus, the relational design must preserve all of these concepts, which are originally captured in the conceptual design after the conceptual to logical design mapping has occurred. Minimizing redundancy implies minimizing redundant storage of the same information and reducing the need for multiple updates to maintain consistency across multiple copies of the same information in response to real-world events that require making an update.
After completing the lessons in this module, you should be able to:
- Define the concept of normalization
- Explain the requirements for first normal form (1NF)
- Identify repeating groups
- Explain the requirements for second normal form (2NF)
- Identify functional dependencies
- Describe the limitations of 2NF
The next lesson introduces normalization.
Database Design and Theory
When we design a database for an enterprise, the main objective is to create an accurate representation of the data, relationships between the data, and constraints on the data that is pertinent to the enterprise.
To help achieve this objective, we can use one or more database design techniques.
Entity Relationship or (ER) modeling is one of those tecniques. In this module we describe another database design technique called normalization
is a database design technique, which begins by examining the relationships (called functional dependencies) between attributes.
Attributes describe some property of the data or of the relationships between the data that is important to the enterprise.
Normalization uses a series of tests (described as normal forms) to help identify the optimal grouping for these attributes to ultimately identify a set of suitable relations that supports the data requirements of the enterprise.
The main purpose of this module is to introduce
- the concept of functional dependencies and
- describe normalization up to Third Normal Form (3NF).
Later we take a more formal look at functional dependencies and also consider later normal forms that go beyond 3NF.
Database normalization prevents data redundancy
and makes the best possible usage of storage.
Purpose of Normalization
The purpose of normalization is to identify a suitable set of relations that support the data requirements of an enterprise. The characteristics of a suitable set of relations include the following:
- The minimal number of attributes necessary to support the data requirements of the enterprise;
- attributes with a close logical relationship (described as functional dependency) are found in the same relation;
- minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys, which are essential for the joining of related relations.
The benefits of using a database that has a suitable set of relations is that
- the database will be easier for the user to access and maintain the data, and
- take up minimal storage space on the computer.