Select Statement  «Prev  Next»
Lesson 5 Introduction to Subquery Statements
ObjectiveSubquery Statements Characteristics

Mastering SQL Subqueries: A Comprehensive Guide

SQL subqueries are a powerful tool for retrieving and manipulating data in relational databases. They allow you to perform complex operations and combine data from multiple tables in a single query. In this guide, we will dive deep into the world of SQL subqueries, exploring their various forms, uses, and best practices.

What are SQL Subqueries?

SQL subqueries are queries embedded within another query, often used to filter, compare, or compute values based on the results of the inner query. They can be utilized in various parts of a larger query, including the SELECT, FROM, WHERE, and HAVING clauses.

Types of Subqueries

There are three primary types of SQL subqueries:
  1. Scalar Subqueries: These return a single value and are often used in the SELECT, WHERE, and HAVING clauses.
  2. Row Subqueries: These return a single row of data and can be used in the WHERE clause to compare against other rows.
  3. Table Subqueries: These return a table of data and are typically used in the FROM clause as a derived table.

Using SQL Subqueries in SELECT Clauses

Subqueries in SELECT clauses allow you to compute values based on the results of another query. Here is an example:
SELECT e.EmployeeID, e.FirstName, e.LastName,
       (SELECT COUNT(o.OrderID)
        FROM Orders o
        WHERE o.EmployeeID = e.EmployeeID) AS OrderCount
FROM Employees e;

This query retrieves employee information and calculates the number of orders each employee has processed.

Implementing SQL Subqueries in FROM Clauses

You can use subqueries in the FROM clause to create derived tables for further manipulation. Consider the following example:
SELECT d.DepartmentID, d.DepartmentName, e.EmployeeCount
FROM Departments d
JOIN (SELECT DepartmentID, COUNT(EmployeeID) AS EmployeeCount
      FROM Employees
      GROUP BY DepartmentID) e ON d.DepartmentID = e.DepartmentID;

In this query, the subquery generates a derived table with the total number of employees per department, which is then joined to the Departments table.

Applying SQL Subqueries in WHERE Clauses

Subqueries in WHERE clauses allow you to filter results based on the outcome of another query. Here is an example:
SELECT p.ProductID, p.ProductName, p.UnitPrice
FROM Products p
WHERE p.UnitPrice > (SELECT AVG(UnitPrice) FROM Products);

This query retrieves products with a price greater than the average product price.

Leveraging SQL Subqueries in HAVING Clauses

You can use subqueries in HAVING clauses to filter grouped results based on the outcome of another query. Consider this example:
SELECT c.Country, COUNT(o.OrderID) AS OrderCount
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.Country
HAVING COUNT(o.OrderID) > (SELECT AVG(OrderCount)
                         FROM (SELECT COUNT(OrderID) AS OrderCount
                               FROM Orders
                               GROUP BY CustomerID) AS AverageOrders);

In this query, we retrieve the total number of orders per country, filtering the results to include only countries with order counts greater than the average order count per customer.

Best Practices for SQL Subqueries

  1. Use Subqueries Judiciously: While subqueries offer powerful functionality, they can also impact performance. Use them when necessary but consider alternative solutions, such as JOINs, when possible.
  2. Keep Subqueries Readable: Complex subqueries can become difficult to understand. Use proper formatting, indentation, and comments to maintain readability.
  3. Use Aliases: Assign


Query Contained within a Query

Throughout the remainder of this module, the term subquery will be used to describe a query contained within a query. There are times when you may want to select values from one table and use values from another table as a guide. For example, you may want to select values from a customer table when the customer ID exists in another table. In this case, you could use a sub-SELECT statement. This may sound a bit familiar because the concept resembles joins, which were covered in Intro to SQL Part I, Sub-SELECTS are very similar, but they are more flexible than joins.
To see this, take a look at the following statement:

SELECT Title FROM Titles 
WHERE pub_id IN 
(SELECT Pub_ID FROM  Publishers 
WHERE State='CA')

The SQL above uses the Subquery statement to relate the Titles table to the Publishers table. The difference is that you only want to retrieve the rows in which the State column is "CA". This filter determines what publisher IDs will be used to select rows from the Titles table.
Remember, with a join, you end up pulling all matching rows from two or more tables. You do not filter rows on the fly as easily with the join.

Nested Queries within a Query are identified as separate Query Blocks

In practice, SQL is the query language that is used in most commercial relational database management systems. The SQL query is first translated into an equivalent extended relational algebra expression, represented as a query tree data structure, that is then optimized. SQL queries are decomposed into query blocks, which form the basic units that can be translated into the algebraic operators and optimized. A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clauses if these are part of the block. Hence, nested queries within a query are identified as separate query blocks. Because SQL includes aggregate operators such as MAX, MIN, SUM, and COUNT these operators must also be included in the extended algebra. Consider the following SQL query on the EMPLOYEE relation
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 );

This query retrieves the names of employees (from any department in the company) who earn a salary that is greater than the highest salary in department 5. The query includes a nested subquery and hence would be decomposed into two blocks.
The inner block is:
(SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 )

This retrieves the highest salary in department 5. The outer query block is:
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > c

where c represents the result returned from the inner block. The inner block could be translated into the following extended relational algebra expression:

ℑMAX Salary(σ Dno=5(EMPLOYEE))

and the outer block into the expression:
π Lname,Fname(σ Salary > (EMPLOYEE))

The query optimizer would then choose an execution plan for each query block. Notice that in the above example, the inner block needs to be evaluated only once to produce the maximum salary of employees in department 5, which is then used as the constant c by the outer block. It is much harder to optimize the more complex correlated nested queries, where a tuple variable from the outer query block appears in the WHERE-clause of the inner query block.

Variables T and S used below are known as tuples

SELECT CUSTOMER_NAME, T.LOAN_NUMBER, S.AMOUNT
FROM BORROWER AS T, LOAN AS S
WHERE T.LOAN_NUMBER = S.LOAN_NUMBER

Tuples are useful for saving typing, but there are other reasons to use them: If you join a table to itself you must give it two different names otherwise referencing the table would be ambiguous. Tuples can be useful to give names to derived tables, and in some database systems it is required.
The name, "tuple" comes from the idea that a row is a tuple of values
(13, 'Karl', 1434).

The SQL Standards refers to the variables S and T as "correlation names". All commerical databases use a name other than tuple. In the MySQL and PostgreSQL manual, tuples are refered to as aliases rather than tuple variables.
The Oracle manual describes them as table aliases[1] or correlation names.
The SQL Server manual uses the term table alias.
[1] ALIAS: An alias is a temporary name assigned to a table or a column within a SQL statement and is used to refer to it elsewhere in the same statement (if a table) or in a SQL*Plus command (if a column). You can use the AS keyword to separate the column definition from its alias. When used for a table, the alias is referred to as a correlation name.