Join Queries  «Prev  Next»

Lesson 4 Correlated subqueries
Objective Identify the Connection between a Query and a Subquery.

Relationship which exists between queries and subqueries in Oracle SQL

Both inner joins and outer joins are used in SQL to combine data from two tables based on a shared field, but they differ in how they handle rows that don't have a match in the other table.
Inner Join:
  • Focuses on Matching Records: An inner join only returns rows where there's a match between the two tables on the join condition (usually common columns).
  • Excludes Unmatched Rows: If a row in one table doesn't have a corresponding match in the other table based on the join condition, it's excluded from the result set.
  • Simpler and Faster: Generally faster to execute compared to outer joins because it deals with a smaller dataset (matching rows only).

Outer Join:
  • Preserves All Rows (or Specific Table's Rows): There are three types of outer joins (left, right, and full) that determine which rows are included in the result set, even if there's no match in the other table.
    • Left outer join: Includes all rows from the left table and matching rows from the right table. Unmatched rows in the right table have null values for columns from the right table.
    • Right outer join: Includes all rows from the right table and matching rows from the left table. Unmatched rows in the left table have null values for columns from the left table.
    • Full outer join: Includes all rows from both tables, regardless of whether there's a match. Unmatched rows in either table have null values for the corresponding columns.
  • More Complex: Outer joins can be slightly slower than inner joins because they need to handle unmatched rows and potentially fill them with null values.

Here's an analogy: Imagine matching students with their test scores from two separate lists.
  • Inner join: Only students with scores on both lists are included.
  • Left outer join: All students are included, even if their score is missing (null value) on the list.
  • Right outer join: Includes all scores, even if the student is missing (null value) on the student list.
  • Full outer join: Includes all students and all scores, filling in null values where there's no match.

Choosing the right join type depends on what information you need in your final result set. Inner joins are ideal when you only care about data with matches in both tables. Outer joins are useful when you want to see all data from one or both tables, even if there's no corresponding match in the other table.

Identify Connection between Query and Subquery

You have seen plenty of subqueries in this module so far. Now you will learn about an advanced form of subquery called the correlated subquery[1]. A "correlated subquery" is a subquery in which part of the query refers back to data in the outer query. The advantage of using a correlated subquery is that you can write a query that isolates the subquery portion while still combining the subquery with the main query as if it were a join. This form of query is sometimes easier to understand than a complicated multi-table join.
Correlated subquery syntax
Here is the general syntax:
SELECT col1, col2, ...FROM table alias1WHERE col3 = 
(SELECT col4 FROM table alias2
WHERE alias1.col5 = alias2.col6)

Notice that alias1 (the alias for the table in the outer query) appears in the WHERE clause of the subquery. This connection between a column FROM the outer query and a column FROM the subquery makes this a correlated subquery. Here is an example, using the Pet Store schema. The query in the following diagram answers the question:
Question: Which PRODUCT record has been updated after the last time an entry was added to the Pet Care Log for that PRODUCT?


Example of a Correlated Subquery

Example of correlated subquery with individual components explained.
SELECT PRODUCT_NAME
FROM PRODUCT P
WHERE LAST_UPDATE_DATE > 
  (SELECT MAX(LOG_DATETIME)
  FROM PET_CARE_LOG PCL
  WHERE P.PRODUCT_ID=PCL.PRODUCT_ID)  

Example of correlated subquery with individual components explained.

  1. Use an alias for your table name so that you can easily refer to the table in the subquery. Here, the letter P is the alias for the PRODUCT table.
  2. In this example, the greater than symbol (>) is the operator that compares the outer query data (LAST_UPDATE_DATE) with the subquery data (MAX(LOG_DATETIME)). You can also use other operators such as equal, IN, LIKE, or NOT IN.
  3. The subquery uses the PET_CARE_LOG table and assigns PCL as the alias for this table. This makes it more convenient to write the WHERE clause.
  4. Here is the comparison that makes this a correlated subquery. The data FROM the outer query is used as criteria for the subquery. In this example, the criteria restricts the rows in the subquery to those rows with a PRODUCT_ID that matches the PRODUCT_ID of the current row of the outer query.
Example of a Correlated subquery using Oracle SQL.
A correlated subquery is a subquery that relies on values from the outer query to execute. In Oracle SQL, a correlated subquery can be used to filter or manipulate data based on values in the outer query. Here's an example:
Consider two tables: employees and departments.
employees table:
+-----------+---------+----------+-------------+
| employee_id | name   | salary  | department_id |
+-----------+---------+----------+-------------+
| 1          | John    | 5000     | 10          |
| 2          | Jane    | 6000     | 10          |
| 3          | Mark    | 5500     | 20          |
| 4          | Lucy    | 4500     | 20          |
+-----------+---------+----------+-------------+

Departments table:
+---------------+----------+
| department_id | name     |
+---------------+----------+
| 10            | HR       |
| 20            | IT       |
+---------------+----------+

Now, let's say we want to find all employees who earn the maximum salary in their department. We can use a correlated subquery to achieve this:
SELECT e1.employee_id, e1.name, e1.salary, e1.department_id
FROM employees e1
WHERE e1.salary = (
  SELECT MAX(e2.salary)
  FROM employees e2
  WHERE e1.department_id = e2.department_id
);

In this query, the outer query is iterating over each employee (aliased as e1), and the correlated subquery (aliased as e2) calculates the maximum salary for each department. The outer query then filters the results by matching the employee's salary with the maximum salary in their department. The output will be:
+-----------+------+--------+--------------+
| employee_id | name | salary | department_id |
+-----------+------+--------+--------------+
| 2         | Jane | 6000   | 10           |
| 3         | Mark | 5500   | 20           |
+-----------+------+--------+--------------+

In the next lesson, you will learn how to combine the correlated subquery with another command to create more efficient queries.
[1]Correlated subquery: A subquery in which part of the subquery refers back to data in the outer query. Correlated subqueries are a part of standard SQL.

SEMrush Software