Lesson 4	First normal form
Objective	Explain the requirements for first normal form.

First Normal Form

Question: What are the requirements to place a database table in first normal form?
To place a database table in the First Normal Form (1NF), there are generally a few key requirements that are typically considered, though these can vary slightly depending on specific interpretations and contexts:

Atomicity: Each cell or field in the table should contain only atomic values, meaning they cannot be divided further. For example, a column for names should not contain multiple names in a single cell.
Uniqueness of Rows: Each row in the table should be unique. This can often be achieved by introducing a primary key, which is a column or a set of columns that uniquely identifies each row in the table.
Consistent Data Types: Each column should contain data of a consistent type. For instance, a column designed to store dates should not contain text or numbers unrelated to dates.
No Repeating Groups or Columns: The table should not contain repeating groups or columns. In other words, there should not be multiple columns that store similar kinds of data that could be more efficiently represented as separate data in the same column.
Order Independence: The order in which the rows and columns appear should not affect the integrity of the data. The relational model of databases typically ensures this by its very nature.

It's important to note that these are guidelines and the application of 1NF can depend on the specific requirements and context of the database design project. The main goal of 1NF is to structure the database in a way that improves its reliability, eliminates redundancy, and simplifies its use.

Multiple Occurrences of the same Type of Item
Many databases require that you plan your tables so they can handle multiple occurrences of the same type of item.
A book might have several authors. The following table shows one way to handle such a circumstance:

Book
BookID	ISBN	Title	Author1	Author2	Date	Pages	Publisher	City
1	978-0321765723	The Lord of the Rings	J.R.R. Tolkien	Glenn Gould	1954	1216	Allen Unwin	London
2	978-0743273565	The Hitchhiker's Guide to the Galaxy	Douglas Adams	Byron Janis	1979	224	Pan Books	London

table containing columns that are not ideal for data storing.

The Author1 and Author2 fields are able to handle up to two authors for each book, but there are two obvious problems with the table:

If a book has only one author, the Author2 column is wasted space.
There is no place to store the name of a third, fourth, or succeeding author.

There is also a problem from a theoretical standpoint. One of the rules of tables is that every column in a table must represent a unique attribute of an entity. In the case of the Books table, the Author1 and Author2 columns represent the same attribute: an author. The technical term for columns that represent the same attribute is a repeating group. (Do not let the semantics fool you, since the lead and second authors of a book are separate individuals, but they are both members of the set of Authors.) For a table to be in first normal form (1NF),^[1] the table must not contain any repeating groups.

First Normal Form Defined

A table is in first normal form (1NF) if it meets the following criteria:

The data are stored in a two-dimensional table.
There are no repeating groups.

The key to understanding 1NF, therefore, is understanding the nature of a repeating group of data.

Understanding Repeating Groups
In the context of database normalization, a "repeating group" can refer to two different things, depending on the context:
1. Multi-valued attribute: This is the stricter definition and relates to the first normal form (1NF). A repeating group here signifies a single column that stores multiple values for a single entity. This violates 1NF, which requires each column to hold an atomic value (a single, indivisible piece of data) for each row. For example, a column named "phone numbers" containing multiple phone numbers separated by commas is a repeating group and would be normalized into a separate "phone numbers" table with a foreign key linking it to the original table.
2. Collection of related attributes: This is a broader definition used informally by some database designers. It refers to a set of columns that contain similar information for a single entity. While not technically violating any normal form rules, these groups can still benefit from normalization if they are frequently accessed together or contain redundant data. For example, an "address" group with columns like "street", "city", and "state" might be normalized into a separate "address" table to avoid redundancies and improve data integrity.

Here's a table summarizing the key differences:

Definition	Example	Violation of Normal Form?
Multi-valued attribute	Phone numbers: 555-1234, 555-5678	Yes, violates 1NF
Collection of related attributes	Address: Street, City, State	No, but can be normalized for efficiency and data integrity

Ultimately, recognizing and addressing repeating groups is crucial for creating well-structured, efficient, and maintainable databases. If you encounter a repeating group, analyze its nature and purpose to determine the appropriate normalization strategy.

A repeating group is an attribute that has more than one value in each row of a table. For example, assume that you were working with an employee relation and needed to store the names and birth dates of the children from the employees. Because each employee can have more than one child, the names of the children and their birth dates each form a repeating group.
Note: A repeating group is directly analogous to a multivalued attribute in an ER diagram.

Repeating Groups not permitted

There is actually a very good reason why repeating groups are not permitted. To see what might happen if they were used, take a look at Figure 2 below, an instance of an employee table containing repeating groups. Notice that there are multiple values in a single row in both

the children's names and
the children's birth dates columns.

This presents two major problems:

EmpNum	First Name	Last Name	Child Names	Child Birthdates
1001	Jane	Doe	Mary, Sam	1/9/02, 5/15/04
1002	John	Doe	Lisa, David	1/9/00, 5/15/01
1003	Jane	Smith	John, Pat, Lee, Mary	10/5/04, 10/12/00, 6/6/2006, 8/21/04
1004	John	Smith	Michael	7/4/06
1005	Jane	Jones	Edward, Martha	10/21/05, 10/15/99

There is no way to know exactly which birth date belongs to which child. It is tempting to say that we can associate the birth dates with the children by their positions in the list, but there is nothing to ensure that the relative positions will always be maintained.
Searching the table is very difficult. If, for example, we want to know which employees have children born before 2005, the DBMS will need to perform data manipulation to extract the individual dates themselves. Given that there is no way to know how many birth dates there are in the column for any specific row, the processing overhead for searching becomes even greater.

The solution to these problems, of course, is to get rid of the repeating groups altogether.
The next lesson describes how to decompose a table to first normal form.

[1]first normal form (1NF): A table is in first normal form if it contains no repeating groups.