Lesson 5	Normalization
Objective	Gain an overview of normalization.

What is Database Normalization?

Given any pool of entities and attributes, you can group them into relations in many ways. The process of normalization creates relations that avoid most of the problems that arise from bad relational design. There are at least two ways to approach normalization.

The first is to work from an ER diagram. If the diagram is drawn correctly, then there are some simple rules you can use to translate it into relations that will avoid most relational design problems. The drawback to this approach is that it can be difficult to determine whether your design is correct.
The second approach is to use the theoretical concepts behind good design to create your relations. This is a bit more difficult than working from an ER diagram, but it often results in a better design.

In practice, you may find it useful to use a combination of both approaches. First, create an ER diagram and use it to design your relations. Then, check those relations against the theoretical rules for good design and make any changes necessary to meet the rules.

Relational Database Design

Relational database design takes a statement of requirements and produces a database definition to address those requirements. The definition consists of a collection of constraint definitions. The goal of logical database design is to produce a design that is independent of all considerations having to do with either

physical implementation or
specific applications.

The latter objective being desirable for the reason that it is generally not the case that all uses to which the database will be put are known at design time. The production and format of a precise and complete requirements statement usually takes the form of a collection of

presented using an agreed notation. Business rules are expressed in an intuitive and somewhat informal style thought to be good enough for the purpose at hand. The fact is, that even when the requirements are 100% clear there are usually some design choices to be made. In other words, there can be several significantly different designs to implement any given requirement statement. What common kinds of alternative might the designer encounter and in each case what considerations should guide the designer in arriving at the preferred choice? The reader should be warned, that relational database theory has very little science to offer regarding database design, and what little science it does offer is almost entirely within the first of these topics, projection-join normalization.

Projection-join Normalization

Projection-Join Normalization is a strategy used in the design of relational databases to reduce data redundancy and improve data integrity. The term refers to the decomposition of a database into projections that can be joined back together without loss of information, known as lossless join property.
To explain in more detail:

A projection in the context of relational databases is a unary relational operation that returns a vertical subset of a table. Essentially, it includes only those attributes (columns) specified in the projection.
A join, on the other hand, is a binary relational operation that combines two tables into one. The join operation is based on a condition known as the 'join condition'.

The aim of Projection-Join Normalization is to decompose the original relation into a set of smaller relations (projections), such that you can reconstruct the original relation (through join operations) without any data loss or spurious tuples.
When we say a decomposition is lossless, it means that when we join back the decomposed tables, we should get the original table. This is crucial, because a lossless decomposition ensures that no information gets lost in the process of decomposition and reconstruction.
This strategy plays a crucial role in ensuring the normalization of a relational database schema, which in turn helps to minimize redundancy, avoid update anomalies, and increase the overall efficiency of database operations.
It is important to note that achieving projection-join normalization requires careful planning and a deep understanding of the relational data model, as well as the specific requirements of the database system in question.

Normalization is a Rule

Normalization is a rule, and like any other rule, it is made to be broken in certain cases. If you are responsible for changing a table's structure and applying the normalization approach, you should only take it as far as it makes sense. There will be times when it simply does not help to break apart a table into separate related tables. You will have to use your best judgement to determine when these cases come up. Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity. E.F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the first normal form in 1970.

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships. Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970. Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974. Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form. Most 3NF tables are free of insertion, update, and deletion anomalies.

When you create new databases, you will often hear phrases such as

Question: Does this database really need to be normalized?
Do we need to split up these tables?
Does this table need to be dissected into component parts?

Normalization Methodology

These are referring to a methodology that makes using relational tables much easier and more flexible. At the same time, this approach is most efficient since the process eliminates duplications in the information you are storing. This fact is key as to why a normalized database structure will be used, in most of the work you do with SQL. When you normalize a table, you take out the duplicate information and place it in a related table. Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process:

Eliminating redundant data (for example, redundant data occurs when storing the duplicate data in more than one table) and
ensuring that data dependencies make sense (only storing related data in a table).

Both of these are important objectives as they reduce the amount of space a database consumes and ensure that data is stored logically. Consider the example in the next lesson.