Advanced SQL  «Prev  Next»
Lesson 5 Introduction to Subselect statements
ObjectiveSub-SELECT or Subquery Statements

SubSelect Statement in SQL

Throughout the remainder of this module, the terms sub-select and subquery will be used interchangeably. It has been my experience that the term subquery appears more often than the term sub-select/
There are times when you may want to select values from one table and use values from another table as a guide.
For example, you may want to select values from a customer table when the customer ID exists in another table. In this case, you could use a sub-SELECT statement.
This may sound a bit familiar because the concept resembles joins, which were covered in Intro to SQL Part I,
Sub-SELECTS are very similar, but they are more flexible than joins.
To see this, take a look at the following statement:

What is the purpose of Sub-Select Statements?

SELECT Title FROM Titles 
WHERE pub_id IN 
(SELECT Pub_ID FROM  Publishers 
WHERE State='CA')
This uses the sub-SELECT statement to relate the Titles table to the Publishers table.
The difference is that you only want to retrieve the rows in which the State column is "CA".
This filter determines what publisher IDs will be used to select rows from the Titles table.
Remember, with a join, you end up pulling all matching rows from two or more tables. You do not filter rows on the fly as easily with the join.

Translating SQL Queries into Relational Algebra

In practice, SQL is the query language that is used in most commercial relational database management systems. The SQL query is first translated into an equivalent extended relational algebra expression, represented as a query tree data structure, that is then optimized. SQL queries are decomposed into query blocks, which form the basic units that can be translated into the algebraic operators and optimized. A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clauses if these are part of the block. Hence, nested queries within a query are identified as separate query blocks.
Because SQL includes aggregate operators such as MAX, MIN, SUM, and COUNT these operators must also be included in the extended algebra.
Consider the following SQL query on the EMPLOYEE relation
SELECT Lname, Fname
WHERE Salary > ( SELECT MAX (Salary)
WHERE Dno=5 );
This query retrieves the names of employees (from any department in the company) who earn a salary that is greater than the highest salary in department 5. The query includes a nested subquery and hence would be decomposed into two blocks.
The inner block is:
(SELECT MAX (Salary)
WHERE Dno=5 )
This retrieves the highest salary in department 5. The outer query block is:
SELECT Lname, Fname
WHERE Salary > c

where c represents the result returned from the inner block. The inner block could be translated into the following extended relational algebra expression:
ℑMAX Salary(σ Dno=5(EMPLOYEE))
and the outer block into the expression:
π Lname,Fname(σ Salary>c(EMPLOYEE))

The query optimizer would then choose an execution plan for each query block. Notice that in the above example, the inner block needs to be evaluated only once to produce the maximum salary of employees in department 5, which is then used as the constant c by the outer block.
It is much harder to optimize the more complex correlated nested queries, where a tuple variable from the outer query block appears in the WHERE-clause of the inner query block.

Variables T and S used below are known as tuples

Tuples are useful for saving typing, but there are other reasons to use them:
If you join a table to itself you must give it two different names otherwise referencing the table would be ambiguous. Tuples can be useful to give names to derived tables, and in some database systems it is required.
The name, "tuple" comes from the idea that a row is a tuple of values
(13, 'Karl', 1434).
The SQL Standards refers to the variables S and T as "correlation names".
All commerical databases use a name other than tuple.
In the MySQL and PostgreSQL manual, tuples are refered to as aliases rather than tuple variables.
The Oracle manual describes them as table aliases[1] or correlation names.
The SQL Server manual uses the term table alias.
[1] ALIAS: An alias is a temporary name assigned to a table or a column within a SQL statement and is used to refer to it elsewhere in the same statement (if a table) or in a SQL*Plus command (if a column). You can use the AS keyword to separate the column definition from its alias. When used for a table, the alias is referred to as a correlation name.