Lesson 12	Surrogate Keys
Objective	Describe the various types of Surrogate Keys

What is Surrogate key?

Surrogate keys are almost never defined in the logical model. They either do not map to real data elements or are derivable, both of which are reasons why you avoided these keys in the Logical model. Until the physical model has been created, you never considered these as another type of candidate key. Three types of surrogate keys exist.

Concatenated surrogate keys:^[1] These are compressed, multicolumn data strings.
Semisurrogate keys: These replace a portion of a multicolumn key with a system-generated number.
Surrogate keys: These are completely system generated.

Let us look at Game in Figure 2-12 to review the choices.

Figure 2-12: Choosing the primary key of Game

Let us look at the instance for Game, as shown in Table 2-13.

Rule out making Natural Key the Primary Key

These rows tell you that Person 1 has played two games; they won one and lost one. Sometimes looking at sample data helps make the primary key choices clearer than the logical modeling did. But in this case, we still do not see any natural keys, other than the one we already found. However, our experience has told us that you should avoid having dates and times in a primary key if possible. This is mostly because of the way DBMSs want to format 1) date and 2) time data for you. Using dates and times in a primary key can become extremely complicated and can make using the key programmatically awkward. So let us decisively rule out making the natural key the primary key in this case.

Surrogate Key

Surrogate keys are also known as system-assigned keys (SAKs). To implement these, you need the system to generate a unique number each time a record is inserted into the table and use that as the primary key. Remember to put a unique index on any natural keys to prevent inadvertently duplicating any rows. Surrogate keys are a great addition to your physical design arsenal. In fact, they are a good choice if you need to set up a large table that might have performance issues, since DBMSs generally find numeric keys much easier to sort and locate than text. But you should never apply surrogate keys as a default key design.
Note: Surrogate keys are often defaulted as the primary key of everything, without a single thought about what other choices might be made. This can promote a kind of narrow view of the design and lack of concern about the quality of data.

Guidelines for Surrogate Keys

Surrogate keys are often defaulted as the primary key of everything, without a single thought about what other choices might be made. This can promote a kind of narrow view of the design and lack of concern about the data’s quality.

Description of types key: Using surrogate keys requires that certain issues be addressed, just like using concatenated and semisurrogate keys. None of these primary keys means anything to anybody. To fulfill all the roles of a primary key, you must give at least one natural key a unique index to maintain data set member uniqueness. When you are using surrogate keys, you may also have to deal with the issue of concurrency (handling concurrent requests for new keys) and the reusability of key values. Oracle and DB2 UDB deal with this problem by using "sequence generator" structures, which are independent of tables. wIn SQL Server, you specify a column as being an identity column. This makes reusability of the surrogate key in SQL Server difficult between different tables. Before deciding on surrogate keys, make sure the RDBMS handles them in a way that will be helpful.
The good news with using surrogate keys is that they are generally fast, simple, and small. The design looks exactly like the concatenated choice, as shown in Figure 2-18.

Figure 2-18: Choosing to use a surrogate key as the primary key of Game

Ways to prevent "duplicate values" in a table without using a "surrogate key"

To prevent duplicate values in a database table without relying on a surrogate key (an artificially generated key, typically an auto-incrementing integer), there are several effective methods you can use. These methods focus on enforcing data integrity and uniqueness directly on the natural data itself:

Unique Constraints: Implement unique constraints on one or more columns to ensure that no two rows in the table can have the same value in those columns. This is perhaps the most straightforward method to prevent duplicates. For example, if you have a table of user data, you might enforce a unique constraint on the email column to ensure that no two users can register with the same email address.
```
CREATE TABLE Users (
   UserID int,
   Email varchar(255),
   Username varchar(255),
   PRIMARY KEY (UserID),
   UNIQUE (Email)
);
```
Primary Key on Natural Data: Instead of using a surrogate key, define the primary key of the table using natural data columns that uniquely identify each row. For instance, a combination of first name, last name, and birthdate could uniquely identify individuals in a certain context.
```
CREATE TABLE Employees (
   FirstName varchar(255),
   LastName varchar(255),
   BirthDate date,
   PRIMARY KEY (FirstName, LastName, BirthDate)
);
```
Composite Unique Index: If no single column is unique by itself but a combination of columns is, you can create a composite unique index on these columns. This is similar to using them as a composite primary key but allows you to keep a surrogate key as the primary key if needed.
```
CREATE UNIQUE INDEX idx_person_unique ON Persons (FirstName, LastName, BirthDate);
```
Although not directly preventing duplicates, check constraints can be used to enforce certain rules that indirectly prevent duplicates. For example, you can ensure that some data follows a specific format or meets a certain condition, reducing the likelihood of erroneous duplicates.
```
CREATE TABLE Orders (
   OrderID int,
   OrderNumber varchar(255),
   OrderDate date,
   CHECK (OrderNumber LIKE 'ORD-%')
);
```

Trigger-based Validation: For more complex scenarios where duplicates are not straightforward to define using keys or constraints, you can use database triggers to check for duplicates before inserting or updating rows. This allows you to run custom logic to validate data uniqueness based on your specific rules.

CREATE TRIGGER CheckDuplicateBeforeInsert
BEFORE INSERT ON Employees
FOR EACH ROW
BEGIN
  DECLARE duplicate_count INT;

  SELECT COUNT(*) INTO duplicate_count
  FROM Employees
  WHERE FirstName = NEW.FirstName AND 
  LastName = NEW.LastName AND BirthDate = NEW.BirthDate;

  IF duplicate_count > 0 THEN
   SIGNAL SQLSTATE '45000'
	   SET MESSAGE_TEXT = 'Duplicate employee cannot be added';
  END IF;
 END;

Using these methods helps maintain data integrity and prevent duplicates in your database tables effectively without relying on surrogate keys.

[1]concatenated surrogate key: In database design, a concatenated surrogate key is a primary key that's created by combining two or more existing columns to form a unique identifier for each row.