| Lesson 5 | Introduction to Subquery Statements |
| Objective | Describe the Characteristics of Subquery Statements in SQL |
A subquery is a query nested inside another query. Subqueries are a core SQL mechanism for expressing “compute something, then use it” logic in a single statement—often to filter rows, compare against an aggregate value, or supply a derived data set to the outer query. Conceptually, each subquery forms its own query block (a SELECT-FROM-WHERE unit, optionally with GROUP BY and HAVING) that the optimizer can transform and reorder.
Subqueries can be used in several parts of a SQL statement:
You will commonly see subqueries discussed by the shape of the result they return:
The most common use is filtering the outer query based on the set of values produced by the inner query. For example, select book titles published by publishers located in California:
SELECT Title
FROM Titles
WHERE pub_id IN
(SELECT Pub_ID
FROM Publishers
WHERE State = 'CA');
This is similar to a join, but the mental model is different: the inner query produces the qualifying publisher IDs,
and the outer query uses that set to filter Titles. The optimizer may still transform it into a join-like plan,
but the subquery is often the clearer expression when your goal is “filter by membership in a derived set.”
A scalar subquery is frequently used when you need a single computed value such as an average, max, or min. The following query returns products whose price is greater than the average price:
SELECT p.ProductID, p.ProductName, p.UnitPrice
FROM Products p
WHERE p.UnitPrice > (SELECT AVG(UnitPrice) FROM Products);
Here the inner query is evaluated as a single value, and the outer query compares each row against that value.
Subqueries can be placed in the SELECT list to compute a value for each outer row. This pattern is useful for per-row metrics:
SELECT e.EmployeeID, e.FirstName, e.LastName,
(SELECT COUNT(o.OrderID)
FROM Orders o
WHERE o.EmployeeID = e.EmployeeID) AS OrderCount
FROM Employees e;
Note that this example is a correlated subquery: the inner query references e.EmployeeID from the outer query.
Correlated subqueries are expressive, but can be harder to optimize than non-correlated subqueries because the inner block may be evaluated
in a row-by-row fashion unless the optimizer rewrites it (for example, into a join with aggregation).
A table subquery in the FROM clause creates a derived table (sometimes called an inline view). This is a clean approach when you need to aggregate or filter first, then join the result to other tables:
SELECT d.DepartmentID, d.DepartmentName, e.EmployeeCount
FROM Departments d
JOIN (
SELECT DepartmentID, COUNT(EmployeeID) AS EmployeeCount
FROM Employees
GROUP BY DepartmentID
) e
ON d.DepartmentID = e.DepartmentID;
In modern SQL, this approach is often paired with CTEs (WITH clauses) to improve readability. The underlying idea is the same: create an intermediate result set, then consume it.
HAVING filters groups after aggregation. A subquery inside HAVING is useful when you want to compare one aggregate result to another derived aggregate:
SELECT c.Country, COUNT(o.OrderID) AS OrderCount
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.Country
HAVING COUNT(o.OrderID) > (
SELECT AVG(OrderCount)
FROM (
SELECT COUNT(OrderID) AS OrderCount
FROM Orders
GROUP BY CustomerID
) AS AverageOrders
);
Optimizers decompose SQL into query blocks. Each block is a SELECT-FROM-WHERE unit (optionally GROUP BY / HAVING). For example:
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > (
SELECT MAX(Salary)
FROM EMPLOYEE
WHERE Dno = 5
);
This contains two blocks. The inner block computes a constant value (the max salary for department 5). In many engines, that inner block can be evaluated once and reused. In contrast, correlated subqueries introduce dependencies between blocks and can be more difficult to optimize because the inner query may depend on each outer row.
You will sometimes see the term tuple variable in textbooks. In practice, database products almost always use table alias (or correlation name). Aliases reduce typing, disambiguate self-joins, and name derived tables.
SELECT CUSTOMER_NAME, T.LOAN_NUMBER, S.AMOUNT
FROM BORROWER AS T, LOAN AS S
WHERE T.LOAN_NUMBER = S.LOAN_NUMBER;
Think of a row as a tuple of values (for example, (13, 'Karl', 1434)), and think of aliases as the names you assign so you can
refer to those tuples unambiguously inside a single SQL statement.
EXISTS can be more semantically direct than IN when you only care
whether a match exists.
IN and NOT IN have tricky semantics with NULLs. For anti-joins,
NOT EXISTS is often the safer expression.