Lesson 4	Entity Identifier Rules
Objective	List two rules for creating entity identifiers.

Two Rules for Creating Entity Identifiers

An entity identifier is the attribute (or minimal set of attributes) that uniquely distinguishes each instance of an entity. In a relational schema, the chosen identifier becomes the table’s primary key and anchors foreign keys in related tables. Strong identifiers keep data accurate, joins fast, and relationships reliable.

The Two Rules

Keep the identifier meaningless.
Do not embed business meaning (like phone numbers, office codes, or dates) in key values. “Meaningful” keys tend to change, leak sensitive info, and constrain future design.
Why: Business facts evolve (new area codes, reorganizations), creating rekey cascades and broken references. Opaque keys (e.g., numeric IDs or UUIDs) remain stable as the business changes.
Keep the identifier unrestricted.
Avoid values tied to a subset of users (e.g., U.S.-only SSNs) or formats that cap growth. Choose keys that scale across regions, products, and time.
Why: Keys should not block expansion, reveal PII, or force format migrations.

Good Identifier Checklist

Unique & minimal: No superfluous columns in a composite key.
Stable: Very unlikely to change after creation.
Non-null & immutable: Enforce with PRIMARY KEY and NOT NULL.
Opaque: Conveys no business meaning or secrets.

Natural vs. Surrogate Identifiers

Prefer surrogate keys (e.g., CustomerID integer or UUID) when natural candidates are volatile, bulky, regional, or sensitive. If a truly stable natural key exists (e.g., an authoritative code with governance), you may use it, but still document/enforce its uniqueness.

-- Surrogate key as PK; preserve business rule with UNIQUE
CREATE TABLE Customers (
  CustomerID    BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  FullName      VARCHAR(200) NOT NULL,
  Email         VARCHAR(255) NOT NULL,
  -- Natural rule: no two customers share the same email
  CONSTRAINT uk_customers_email UNIQUE (Email)
);

Associative (Junction) Entities & Composite Keys

Many-to-many (M:N) relationships require an associative (bridge/junction) entity. It does have an identifier-either a composite of its foreign keys or its own surrogate key plus a uniqueness rule:

Composite PK (compact, natural):

CREATE TABLE Enrollment (
  StudentID  INT NOT NULL,
  CourseID   INT NOT NULL,
  EnrolledOn DATE NOT NULL,
  PRIMARY KEY (StudentID, CourseID),
  FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
  FOREIGN KEY (CourseID)  REFERENCES Courses(CourseID)
);

Surrogate PK (flexible for extra relationships/attributes):

CREATE TABLE Enrollment (
  EnrollmentID BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  StudentID    INT NOT NULL,
  CourseID     INT NOT NULL,
  EnrolledOn   DATE NOT NULL,
  CONSTRAINT uk_enrollment UNIQUE (StudentID, CourseID),
  FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
  FOREIGN KEY (CourseID)  REFERENCES Courses(CourseID)
);

Choose based on stability, indexing needs, and how often you reference the row directly.

Naming vs. Key Values (Don’t Confuse Them)

Column names should be meaningful and consistent (CustomerID, OrderID). The values stored in identifiers should be meaningless and unrestricted. This resolves the common confusion between readable schema names and opaque key values.

Common Pitfalls to Avoid

Encoding meaning in keys (department codes, years, regions).
Using PII (SSN, phone) as keys-volatile, sensitive, region-bound.
Keys with formatting limits (leading zeros lost, fixed-width parts).
Recycling keys after deletions or merges-breaks historical integrity.

Conclusion

Follow the two rules-meaningless and unrestricted-to create stable identifiers. Use surrogate PKs for flexibility, natural keys only when they’re truly stable, and enforce business uniqueness with constraints. This approach preserves integrity, simplifies joins, and future-proofs your schema.