Relational Constructs   «Prev  Next»

Lesson 9 Relational Database Null Values
Objective Explain the purpose of null values in a Relational Database.

Null Values in a Relational Database

In a relational database, a null value represents the absence of information or the inapplicability of a data element. The concept was introduced by Dr. E.F. Codd as part of his foundational work on the relational model and plays a central role in managing data integrity, schema flexibility, and query semantics. Understanding null values - what they mean, how they differ from blank and zero, and how the RDBMS handles them - is essential for designing and querying relational databases correctly.

The Five Purposes of Null Values

  1. Representation of missing information. When a specific attribute has not been provided or is not yet known for a given tuple, a null value indicates that the information is currently unavailable. This allows the database to store partial information without resorting to artificial default values - a practice that could lead to incorrect assumptions. A null in a phone number column means the phone number exists but is not known; it does not mean the phone number is zero or empty.
  2. Distinction from empty or zero values. Null distinguishes the absence of information from other valid data elements such as empty strings, zeros, or other defaults. A zero is a known numeric value. A blank is a known absence of text content. A null is neither - it means the value is unknown. The RDBMS treats these three cases differently in queries, comparisons, and calculations.
  3. Support for optional attributes. Some attributes may not apply to every tuple in a table. A customer who does not live in an apartment has no apartment number - that attribute is inapplicable to that customer. Null enables optional attributes to be represented in a schema without requiring redesign or the introduction of placeholder values that would distort query results.
  4. Preservation of data integrity. By explicitly representing the absence of information through null, the database can maintain integrity during joins, selections, and projections. Queries involving null can be designed to account for null semantics, ensuring that results accurately reflect the underlying data and its limitations rather than producing misleading output.
  5. Semantics in aggregate functions. Aggregate functions - COUNT, SUM, AVG, MIN, and MAX - are defined to ignore null values in their calculations. AVG computed over a set of salaries where some are null returns the average of the known salaries, not a distorted result caused by treating unknown salaries as zero. This behavior is consistent across SQL-compliant RDBMS platforms.

Every column in a table should ideally contain a value, but the real world produces situations where values are genuinely unknown. The SUPPLIER table below illustrates this: when you need to contact a supplier, you need their name, address, phone number, and fax. If you do not yet know one or more of those values, you need a way to represent that gap without compromising the integrity of the other data in the row.

SUPPLIER
SupplierId SupplierName SupplierAddress SupplierPhone SupplierFax

Columns of the SUPPLIER table: 1) SupplierId, 2) SupplierName, 3) SupplierAddress, 4) SupplierPhone, 5) SupplierFax.

Determining Column Values: Null, Blank, and Zero

When the value to enter into a field is unknown, it is called a null value. A null is distinct from a blank and from a zero, and the RDBMS treats each differently.

  • Null means the value is unknown - it may exist but has not been determined. If you do not know a supplier's phone number, enter null into the SupplierPhone column. The null indicates the phone number exists but is not on record.
  • Blank means the value is known to be absent or not applicable. If the supplier does not have a fax number, enter a blank into the SupplierFax column. A blank is not unknown - it is a deliberate indication that the attribute does not apply. You can represent a blank by entering a single space character in the field.
  • Zero is a valid INT or DECIMAL numeric value equal to zero. If a supplier provided a thousand free CDs to give to the next thousand customers who placed an order, the RetailPrice column for that CD would contain zero - a known, valid price, not an unknown value.

In practice, blank and null may appear similar on screen - both often display as empty in a query result table. However, the distinction matters to the RDBMS and to the accuracy of your data. If a value is truly unknown, enter null. How null is entered varies by RDBMS; one common method is to skip the field entirely by pressing Tab without typing a value.

Null, Zero, and Blank Values: Applied Examples

Consider how you would specify attribute values for a Customer entity when the value is not known in advance. Three attributes illustrate the three cases:

  1. Customer Apartment Number - blank. A customer may not live in an apartment, making this attribute inapplicable. The field should contain a blank. If the customer did live in an apartment, they would include the apartment number in their mailing address to ensure correct delivery. The absence of an apartment number is a known, intentional absence - not an unknown value.
  2. Customer Phone Number - null. The assumption is that every customer has a phone. If the phone number is not on record, null indicates that the value exists but is not yet known. Entering null is more honest than entering a placeholder value or leaving the field blank, because it signals to anyone querying the database that the data is missing rather than simply absent.
  3. Customer Order Total - zero. A customer could use a coupon to reduce their order total to zero. Zero is a valid, meaningful value for this field - it represents a known outcome, not a missing value. Treating a zero order total as null would be incorrect.

NULL as an SQL Reserved Keyword

NULL is a special marker in SQL used to indicate that a data value does not exist in the database. SQL NULL fulfills the requirement that all true RDBMS platforms support a representation of both missing information and inapplicable information. Codd introduced the lowercase Greek omega symbol (ω) to represent null in database theory. In SQL, NULL is a reserved keyword used to identify this special marker, and it appears in constructs such as IS NULL, IS NOT NULL, and COALESCE().

Three-Valued Logic and Null in SQL Queries

Null values introduce a complication into query evaluation. Standard Boolean logic has two states: true and false. When a null value is involved in a comparison, the result is neither true nor false - it is unknown. Relational databases therefore use three-valued logic, where the result of evaluating a logical expression can be true, false, or maybe (unknown).

Consider an Employees relation that includes a Salary column, where salary is null for some rows. A query asks for all employees whose salary exceeds $60,000:

SELECT * FROM Employees WHERE Salary > 60000;

For a row with a null salary, the RDBMS cannot evaluate whether null is greater than 60,000. The answer is not true and not false - it is unknown. The standard SQL behavior is to exclude such rows from the result, because the condition cannot be confirmed as true. If you specifically want to retrieve rows where salary is unknown, you must ask for them explicitly:

SELECT * FROM Employees WHERE Salary IS NULL;

The three-valued logic truth table governs how AND, OR, and NOT operations behave when one or more operands evaluate to unknown. Most RDBMS platforms adhere to the SQL standard truth table for null propagation, though developers should test null behavior on their specific platform when writing complex conditional logic.

How the RDBMS Stores and Handles NULL

Two requirements govern null handling across all SQL-compliant RDBMS platforms:

  1. Consistent storage. The RDBMS must store the same internal representation for null across all columns and data types. Null is not a space character or zero - it has its own distinct ASCII or Unicode marker. Although nulls typically display as blank in query result tables, their internal representation is unique and type-independent.
  2. Consistent query behavior. The RDBMS must handle nulls in a predictable, documented way during queries. The default behavior - excluding null rows from comparisons unless explicitly requested - must be consistent and well-defined so that developers can rely on it when writing queries and interpreting results.

Why Nulls Matter: The Account Balance Example

The importance of null becomes clear when considering the practical difference between a known zero and an unknown value. Consider an accounts payable balance:

  • Balance = 0 means the account has been paid in full. This is a desirable outcome - the company owes nothing.
  • Balance = null means the balance is unknown. This could indicate a data entry problem, a disputed invoice, or a system failure. An unknown balance on an accounts payable record is potentially a significant business problem requiring investigation.

Without null, there would be no way to distinguish between these two cases using a numeric column. A zero would have to serve double duty as both "paid in full" and "unknown" - making the data ambiguous and unreliable for financial reporting.

The concept of unknown values is not unique to relational databases. Any data management system must address what to do when a value is absent. The relational model's answer - a formal, type- independent null with defined query semantics - is one of its most important contributions to practical database design.

Codd's Rule 3: Systematic Treatment of NULL Values

Codd's Rule 3 addresses nulls directly as one of the twelve rules that define a fully relational DBMS. The rule states:

Null values (distinct from the empty character string, a string of blank characters, or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type.

This rule has two important implications:

  • Defined nullability. The database engine must support the ability for any column of any data type to hold a null value. Null is not a data-type-specific concept - it applies equally to integer columns, text columns, date columns, and all others. A column that cannot hold null must explicitly declare NOT NULL; nullability is the default.
  • Optionality. The rule acknowledges that some relationships between data are optional. A customer may or may not have a second phone number. A supplier may or may not have a fax number. Null enables the schema to represent optional relationships without forcing every tuple to carry a value for every attribute.

Rule 3 distinguishes null from zero, blank, and N/A - all of which are values, not the absence of a value. A DBMS that uses zero or empty string to represent missing information violates the spirit of the relational model and creates data integrity problems that compound over time.

The next lesson discusses the difference between key columns and descriptor columns.


SEMrush Software 9 SEMrush Banner 9