Lesson 17

Relational Constructs and Constraints Conclusion

This module has brought together the core relational constructs used to transform database analysis into a structured relational database design. You began with the purpose of relational constraints, then moved through the database life cycle, relational notation, table structure, column rules, row rules, entity attributes, data domains, null values, key columns, primary keys, surrogate keys, concatenated primary keys, all-key relations, foreign keys, and referential integrity. These topics form a single design sequence. Each concept builds on the one before it, and together they explain how a relational schema represents business facts, identifies rows, constrains values, and protects relationships between tables.

A relational database is not simply a collection of tables. It is a logical model of a business environment. The tables, columns, rows, keys, domains, and constraints are the formal structures used to express that model. A table represents one subject or one relationship. A row represents one occurrence of that subject. A column represents one attribute. A domain restricts the values that may appear in a column. A primary key identifies rows. A foreign key connects related rows in different tables. Referential integrity ensures that those connections remain valid as data is inserted, updated, and deleted.

The important lesson is that relational design is about meaning before implementation. The designer first identifies the business objects, business rules, relationships, and facts that must be represented. Only after those meanings are understood should the design be translated into relational notation, table definitions, keys, and constraints. Good relational design is therefore a disciplined movement from business understanding to logical structure, and then from logical structure to a physical implementation in a relational database management system.

From Requirements to Relational Constructs

The database life cycle provides the larger framework for this module. A database begins with requirements analysis. During that stage, the designer identifies what the organization needs to store, what questions users need to answer, what transactions must be supported, and what rules must always remain true. These requirements are then translated into conceptual models, logical models, and eventually physical database structures.

In conceptual design, business objects are identified as entities. A customer, product, order, student, course, vendor, invoice, employee, department, or shipment may become an entity if the organization needs to store information about it. The characteristics of an entity become attributes. For example, a customer may have a customer identifier, last name, first name, street address, city, state, ZIP code, and phone number. At the conceptual level, the designer is still concerned primarily with business meaning, not with storage, indexing, or DBMS-specific syntax.

Logical design translates this conceptual structure into relational constructs. Entities become relations or tables. Attributes become columns. Entity instances become rows. Relationships are represented by primary key and foreign key structures. Constraints are added to preserve validity. Relational notation helps during this stage because it provides a compact textual form for documenting table names, attributes, primary keys, and foreign keys before writing SQL DDL.

Physical design and implementation come later. During physical design, the designer chooses data types, indexes, storage options, naming conventions, and DBMS-specific features. During implementation, the schema is created using SQL or a database modeling tool. This separation matters because it prevents premature implementation decisions from corrupting the logical design. A good schema starts with correct meaning, then becomes executable SQL.

Tables, Rows, Columns, and Domains

A table is the main structural unit of a relational database. Each table should represent one subject or one relationship. The rows in the table represent individual instances of that subject, and the columns represent the attributes used to describe each instance. In relational theory, a table is called a relation, a row is called a tuple, and a column is called an attribute. In practice, database designers and developers commonly use table, row, and column, but understanding both vocabularies is useful because database theory and database tools often use different terms for the same underlying ideas.

A valid relational table has structure and meaning. Its columns should be named clearly. Its rows should be distinguishable. Its values should be atomic. Its columns should contain values from a defined domain. It should not contain repeating groups. It should avoid unnecessary redundancy. It should have a primary key when rows need to be identified individually. If it participates in relationships with other tables, it should define the proper foreign keys.

Column design is especially important because columns define what values may be stored. A column should use a data type that matches its meaning. Dates should use date or timestamp types rather than strings. Monetary values should use precise numeric types such as DECIMAL rather than floating-point types. Identifiers should use stable formats. Codes should be constrained to valid values. A column should allow NULL only when missing or inapplicable data is meaningful for that attribute.

A domain is the set of permitted values for an attribute. The domain may be broad, such as all valid integers, or narrow, such as a fixed list of order status values. Domain design supports data integrity because it prevents invalid values from entering the database. Data types, NOT NULL constraints, CHECK constraints, lookup tables, and application rules may all help define or enforce a domain. However, the strongest designs place as many rules as possible in the database schema so that the rules apply no matter which application, script, import process, or user writes the data.

Null Values, Blanks, and Zeros

The lesson on null values clarified an important distinction in relational databases: NULL, blank, and zero are not the same thing. A null value means that a value is missing, unknown, inapplicable, or not yet supplied. A blank string is a known text value with no characters. A zero is a known numeric value. Treating these as equivalent causes incorrect queries, incorrect calculations, and misleading reports.

Null values are useful because they allow a database to store incomplete information without inventing false values. For example, a customer may not have an apartment number. An employee may not yet have a termination date. A product may not yet have a discontinued date. In each case, NULL communicates that the value is absent in a specific way. However, null values also require careful handling because SQL comparisons involving NULL produce unknown rather than true or false.

Good table design therefore requires deliberate nullability decisions. Columns that are required by the business rule should be declared NOT NULL. Columns that are optional or conditionally applicable may allow NULL. Primary key columns must not be NULL because a row without an identifier violates entity integrity. Foreign key columns may be nullable or mandatory depending on the relationship. In a customer-order design, an order's customer foreign key is usually mandatory because an order without a customer is normally invalid.

Key Columns and Descriptor Columns

The module distinguished between key columns and descriptor columns. A key column helps identify a row or participates in a relationship. A descriptor column describes the entity represented by the row. In a CUSTOMER table, CustID is a key column because it identifies the customer. CustLast, CustFirst, CustStreet, CustCity, and CustPhone are descriptor columns because they describe the customer.

This distinction helps the designer understand the role of each column. A key column should be stable, minimal, and unique when it serves as a primary key. A descriptor column should depend on the key and describe the subject of the table. When descriptor columns describe something other than the table's subject, the table may contain a design problem that should be corrected through decomposition or normalization.

Candidate keys are possible row identifiers. A table may have more than one candidate key. For example, a customer table might have a system-generated CustID and a unique email address. Either might identify a row, but only one candidate key is selected as the primary key. The remaining candidate keys can be enforced with UNIQUE constraints when the business rule requires them to remain unique.

Primary Keys and Surrogate Keys

A primary key is the selected candidate key used to uniquely identify each row in a table. It must be unique and not null. It should also be stable. A value that changes often is usually a poor primary key because changes ripple into related tables and create unnecessary maintenance risk. Primary keys are central to relational design because they give each row a reliable identity.

Some primary keys are natural keys. A natural key has business meaning outside the database. Examples may include an ISBN for a book or a government-issued identifier in systems where that identifier is legally and ethically appropriate to store. Other primary keys are surrogate keys. A surrogate key is a system-generated identifier with no independent business meaning, such as an integer sequence, identity column, UUID, or similar generated value.

Surrogate keys are common in modern database design because they are stable, compact, and independent of business attributes that may change. For example, a product code, email address, phone number, or account name may seem unique at first, but business rules can change. A surrogate key allows the descriptive attributes to change without changing the row's identity. However, surrogate keys do not eliminate the need to enforce natural uniqueness when the business requires it. A table may use a surrogate primary key and still define UNIQUE constraints on natural candidate keys.

The design choice is not simply "natural key versus surrogate key." The better question is which identifier should serve as the table's stable primary key, and which other candidate keys must be protected with constraints. A strong design uses primary keys for row identity and additional constraints for business uniqueness.

Concatenated Primary Keys and All-key Relations

A concatenated primary key, also called a composite primary key, uses more than one column to identify a row. Composite keys are common when no single column identifies a fact by itself. In an order detail table, the combination of Order Number and Item Number may identify one line item. Order Number alone is not enough because one order can contain many items. Item Number alone is not enough because the same item can appear on many orders. Together, the two columns identify the row.

Composite keys are especially important in associative tables. An associative table, also called an intersection table, bridge table, or junction table, resolves a many-to-many relationship. For example, an ITEM table and a LANGUAGES table may have a many-to-many relationship: one item can be available in many languages, and one language can apply to many items. The AVAILABLE_IN table resolves that relationship by storing valid item-language pairings.

An all-key relation occurs when every column in the table is part of the primary key. If AVAILABLE_IN contains only LangID and ItemNo, and both columns together form the primary key, then every column participates in the key. That makes the table an all-key relation. This is not a design error. It is a valid relational pattern when the only fact being stored is the relationship itself.

A table may stop being all-key if the relationship gains descriptive attributes. For example, if AVAILABLE_IN later includes TranslationStatus or AvailableDate, those columns may describe the item-language relationship but may not be part of the primary key. The table would still be a valid associative table, but it would no longer be an all-key relation. The classification changes because the meaning of the relationship has expanded.

Foreign Keys and Parent-Child Relationships

A foreign key is a column or group of columns in a child table that references a key in a parent table. The referenced key is usually the primary key, although relational databases may also allow references to a UNIQUE candidate key. The foreign key is the mechanism that implements a relationship between tables in the schema.

The CUSTOMER and ORDER example shows the basic pattern. CUSTOMER.CustID is the primary key of the CUSTOMER table. ORDER.OrderNo is the primary key of the ORDER table. ORDER.CustID is a foreign key that references CUSTOMER.CustID. This means one customer can have many orders, and each order belongs to a valid customer.

A foreign key is not merely a column that happens to have the same name as a primary key in another table. Matching names improve readability, but names do not create the relationship. The declared foreign key constraint creates the relationship. A foreign key column could be named OrderedByID and still reference Customer.CustID if the constraint declares that relationship.

Foreign keys may be mandatory or optional. A mandatory foreign key is declared NOT NULL and requires every child row to reference a parent row. An optional foreign key allows NULL, meaning that the child row is not currently associated with a parent row for that relationship. Referential integrity checks non-null foreign key values. If the value is present, it must match a valid referenced key.

Referential Integrity

Referential integrity is the rule that every non-null foreign key value in a child table must match an existing referenced key value in the parent table. This rule prevents orphan records. An orphan record is a child row that points to a parent row that does not exist.

In the CUSTOMER and ORDER example, an order with CustID 101 is valid only if customer 101 exists in the CUSTOMER table. If no customer has CustID 9999, then an order with CustID 9999 should be rejected. The database enforces this rule through a foreign key constraint. This is stronger than relying only on application code because the rule applies no matter how data enters the database.

Referential integrity also affects update and delete behavior. If a parent row is referenced by child rows, the database needs a rule for what happens when someone attempts to delete the parent or change its key. Common referential actions include RESTRICT or NO ACTION, CASCADE, SET NULL, and SET DEFAULT. The correct choice depends on the business rule. In a customer-order database, deleting a customer should usually be restricted if order history must be preserved.

Referential integrity is related to joins, but it is not the same thing as a join. A join is a query operation that retrieves related rows. A foreign key constraint is a schema rule that keeps the relationship valid. The join uses matching values. The constraint ensures that those matching values remain meaningful.

How the Module Fits Together

The concepts in this module are not isolated vocabulary terms. They describe a connected design system. The database life cycle tells you where relational design fits in the development process. Relational notation gives you a compact way to represent the design. Table rules and column rules ensure that the structure is valid. Domains restrict permitted values. Null rules clarify how missing information should be represented. Key columns identify rows. Descriptor columns describe rows. Primary keys provide row identity. Surrogate keys provide stable system-generated identifiers when natural keys are unsuitable. Composite keys identify rows when one column is not enough. All-key relations represent pure relationship facts. Foreign keys implement relationships. Referential integrity protects those relationships.

These ideas prepare you for normalization. Normalization evaluates whether table structures correctly represent facts and dependencies. A table should store facts about one subject. Non-key attributes should depend on the key, the whole key, and nothing but the key. If a table mixes facts about multiple subjects, repeats groups of values, or stores values that depend on something other than the key, the design may produce update, insert, and delete anomalies. The next module develops those ideas formally.

A reliable relational database depends on both structure and constraint. Structure organizes the data. Constraints protect the data. A schema that has tables but weak constraints may still allow duplicate rows, invalid values, missing required values, and broken relationships. A well-designed schema uses constraints as part of the design, not as afterthoughts.

Summary of Learning Objectives

After completing this module, you should be able to:

Describe the purpose of the database life cycle and identify where logical relational design occurs.
Explain how business objects and business rules are translated into relational constructs.
Read and write relational notation for tables, attributes, primary keys, and foreign keys.
Describe the characteristics of relational tables and distinguish tables, relations, rows, tuples, columns, and attributes.
Apply rules for table columns, including naming, data types, nullability, keys, defaults, and constraints.
Apply rules for table rows, including uniqueness, atomic values, row identity, and consistent structure.
Explain how columns represent entity attributes and why each column should describe the table subject.
Define data domains and identify domains that support calculations, comparisons, and validation.
Differentiate null values from blank strings and numeric zero values.
Explain the difference between key columns and descriptor columns.
Define candidate keys and primary keys and explain how primary keys identify rows.
Explain when surrogate keys are useful and why natural candidate keys may still require UNIQUE constraints.
Define concatenated primary keys and explain when a composite key is required.
Describe all-key relations and explain their role in associative tables.
Define foreign keys and explain how they implement parent-child table relationships.
Define referential integrity and explain how foreign key constraints prevent orphan records.

Glossary Terms

This module introduced the following relational database design terms:

attribute: A characteristic of an entity. In a relational table, an attribute is represented as a column.
base table: A table that stores persistent data as part of the database schema.
business object: A real-world subject about which an organization stores data, such as a customer, product, order, student, or course.
candidate key: A minimal column or group of columns that can uniquely identify each row in a table.
column: A named attribute in a relational table. A column has a domain or data type that restricts the values it can store.
composite primary key: A primary key made from more than one column. Also called a concatenated primary key.
constraint: A schema rule that restricts values or relationships. Common constraints include PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK.
data type: A classification that defines the kind of data a column can store, such as INTEGER, VARCHAR, DECIMAL, DATE, TIMESTAMP, or BOOLEAN.
descriptor column: A non-key column that describes the subject represented by a row.
domain: The set of permitted values for an attribute or column.
entity: A business object or subject about which data must be stored.
entity-relationship diagram: A design diagram that shows entities, attributes, and relationships before they are translated into relational tables.
foreign key: A column or group of columns in a child table that references a candidate key, usually a primary key, in a parent table.
key column: A column that helps identify a row or distinguish one instance of an entity from another.
null value: A special marker indicating that a value is missing, unknown, inapplicable, or not yet supplied.
one-to-many relationship: A relationship in which one parent row may be associated with many child rows, while each child row references at most one parent row.
primary key: The selected candidate key used to uniquely identify each row in a table.
referential integrity: The rule that every non-null foreign key value in a child table must match an existing referenced key value in the parent table.
relation: The relational model term commonly associated with a table.
relational notation: A compact textual notation used to describe relations, attributes, primary keys, and foreign keys.
row: One occurrence or instance in a table. In formal relational terminology, a row is called a tuple.
surrogate key: A system-generated identifier used as a primary key when a natural key is unstable, complex, or unsuitable.
table: A relational structure made of rows and columns that stores data about one subject or relationship.
tuple: The relational model term for a row in a relation.
view: A named query definition that presents query results as a virtual table.

Module Transition

The next module builds on these relational constructs by examining normalization. Normalization explains how to evaluate table structure, identify dependency problems, reduce redundancy, and design relations that avoid update, insert, and delete anomalies. The constructs in this module give you the vocabulary and structural foundation needed to understand those normalization rules.

At this point, you should be able to look at a table and ask the central design questions: What subject does this table represent? What does each column mean? What values are permitted? Which column or columns identify each row? Which columns describe the row? Which columns reference other tables? What constraints protect the values? What relationships must remain valid? These questions are the beginning of disciplined relational database design.

Relational Constructs - Quiz

Before moving to the next module, click the Quiz link below to check your knowledge of relational constructs.
Relational Constructs - Quiz