Throughout the remainder of this module, the terms sub-select and subquery will be used interchangeably.
It has been my experience that the term subquery
appears more often than the term sub-select/
There are times when you may want to select values from one table and use values from another table as a guide.
For example, you may want to select values from a customer table when the customer ID exists in another table. In this case, you could use a sub-SELECT statement.
This may sound a bit familiar because the concept resembles joins, which were covered in
Intro to SQL Part I
Sub-SELECTS are very similar, but they are more flexible than joins.
To see this, take a look at the following statement:
This uses the sub-SELECT statement to relate the Titles table to the Publishers table.
In practice, SQL is the query language that is used in most commercial relational database management systems.
The SQL query is first translated into an equivalent extended relational algebra expression,
represented as a query tree data structure, that is then optimized.
SQL queries are decomposed into query blocks, which form the basic units that
can be translated into the algebraic operators and optimized.
A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and
HAVING clauses if these are part of the block. Hence, nested queries within a query are identified as
separate query blocks.
The difference is that you only want to retrieve the rows in which the State column is "CA".
This filter determines what publisher IDs will be used to select rows from the Titles table.
Remember, with a join, you end up pulling all matching rows from two or more tables.
You do not filter rows on the fly as easily with the join.
Because SQL includes aggregate operators such as MAX, MIN, SUM, and COUNT
these operators must also be included in the extended algebra.
Consider the following SQL query on the EMPLOYEE relation
SELECT Lname, Fname
WHERE Salary > ( SELECT MAX (Salary)
WHERE Dno=5 );
This query retrieves the names of employees (from any department in the company) who earn a salary that is greater than the highest salary in department 5.
The query includes a nested subquery and hence would be decomposed into two blocks.
The inner block is:
(SELECT MAX (Salary)
WHERE Dno=5 )
This retrieves the highest salary in department 5. The outer query block is:
SELECT Lname, Fname
WHERE Salary > c
where c represents the result returned from the inner block.
The inner block could be translated into the following extended relational algebra expression:
ℑMAX Salary(σ Dno=5(EMPLOYEE))
and the outer block into the expression:
π Lname,Fname(σ Salary>c(EMPLOYEE))
The query optimizer would then choose an execution plan for each query block.
Notice that in the above example, the inner block needs to be evaluated only once to produce the maximum salary of employees in department 5, which is then used as the constant c by the outer block.
It is much harder to optimize the more complex correlated nested queries,
where a tuple variable from the outer query block appears in the WHERE-clause of the inner query block.