SQL DISTINCT Keyword and specifying Only Unique Records Be Retrieved
Many times you simply need to filter out duplicated information. For example, with a PUBS query that limits rows to California, you may want to only list each publisher once.
You can do this using the
DISTINCT keyword, which lets you limit rows returned to only those rows that do not have duplicate values for columns you indicate.
SELECT Title FROM Titles
IN (SELECT DISTINCT Pub_ID
WHERE State= 'CA')
This would filter out any duplicate Publisher rows in California.
Specifying Only Unique Records Be Retrieved
SELECT DISTINCT city
Retrieve specific rows
The other thing that we want to do with a single table is to retrieve just some of the rows. For example, we may want to retrieve information about those students who are doing a science degree or just those students who first enrolled in 2011.
Retrieving a subset of the rows is known as a select operation. We need to specify how we will determine which rows we want.
We do this by specifying a test condition that, for each row, is either true or false. To find all the science students, we would specify the condition degree = 'Science', while to find all the students entering the university in 2011, the condition would be year = 2011.
The condition is checked for each row in turn, and if it is true, then that row is included in the set being retrieved. We can build up more complicated conditions by using operators such as AND, OR, and NOT. For example, if we want just the science students enrolling in 2011, the condition would be
degree = 'Science' and year = 2011.
If we wanted a list of all commerce and arts students (but not any other degree), the condition would be
degree = 'Arts'
OR degree = 'Commerce'.
A select operation is specified in an SQL statement, by using the keyword WHERE followed by the appropriate condition as shown below.
The * in the first line means retrieve all the columns or fields for the selected rows.
WHERE degree = 'Science' and year = 2011
One small but important point to bear in mind is that if a field (i.e., degree) has no value, then the truth of a statement such as degree = 'Science' is unknown.
SQL queries only return those rows for which the condition statement is known to be true.
If we retrieve rows for degree = 'Science' and then retrieve rows for degree < > 'Science' we will miss the rows that have no value in degree because we do not know the value (it might be Science and it might not).
To find those fields which are empty we can use the expression where degree is NULL.
Most queries will require a combination of the select and project operations. In this case, the rows are first selected according to the condition, and then the specified columns are retrieved.
Rather than seeing all the information about each of our selected students, we may just want to see their ID numbers and names.