RelationalDBDesignRelationalDBDesign





Attributes-Entities   «Prev  Next»
Lesson 4Entity identifier rules
ObjectiveList two rules for creating entity identifiers.

Two Rules for Creating Entity Identifiers

The two most important rules for creating an entity identifier are:
  1. Keep the identifier “meaningless”
  2. Keep the identifier “unrestricted”
When choosing an entity identifier, it is important to use meaningless values, that is, identifiers that will not change. For example, combining the first five letters of an employee’s last name with his/her phone number is a “meaningful” identifier that could present problems.
Question: What if the employee’s phone number changes?
It is very important when designing a database to make sure you choose identifiers that will never change.
Social security numbers are popular “people” identifiers, but they are restricted to U.S. citizens. Companies with overseas customers or universities where foreign students make up a percentage of the student population must be especially wary of using social security numbers as identifiers. In other circumstances, however, a social security number can be a good entity identifier. Whenever possible, meaningless values should be assigned as key attributes. The next lesson describes instances of entities.


Entity Identifiers

The only purpose for putting the data that describe an entity into a database is to retrieve the data at some later date. This means that we must have some way of distinguishing one entity from another so that we can always be certain that we are retrieving the precise entity we want. We do this by ensuring that each entity has some attribute values that distinguish it from every other entity in the database (an entity identifier).
Assume, for example, that DistributedNetworks has only two customers named John Smith. If an employee searches for the items John Smith has ordered, which John Smith will the DBMS retrieve? In this case, the answer is both of them. Because there is no way to distinguish between the two customers, the result of the query will be inaccurate. DistributedNetworks solved the problem by creating customer numbers that were unique. That is indeed a common solution to identifying instances of entities where there is no simple unique identifier suggested by the data itself.
Another solution would be to pair the customer's first name and last name with his or her telephone number. This combination of data values (a concatenated identifier) would also uniquely identify each customer. There are, however, two drawbacks to doing so this. First, the identifier is long and clumsy; it would be easy to make mistakes when entering any of the parts.
Second, if the phone number of the customer changes, then the identifier must also change.
Changes made in an entity identifier can cause serious problems in a database. Some entities, such as invoices, come with natural identifiers (the invoice number). We assign unique, meaningless numbers to others, especially accounts, people, places and things. Still others require concatenated identifiers.
When we store an instance of an entity in a database, we want the DBMS to ensure that the new instance has a unique identifier. This is an example of a constraint on a database, a rule to which data must adhere. The enforcement of a variety of database constraints helps us to maintain data consistency and accuracy.