Join Queries  «Prev  Next»

Lesson 5 Oracle EXISTS clause
Objective Use EXISTS instead of IN for a Subquery

Use EXISTS instead of IN for Subquery

Here is a breakdown of when to favor EXISTS over IN for subqueries in Oracle, along with the key differences to consider:
When to use EXISTS
  • Checking for Existence: If you primarily care about whether the subquery returns "any" rows at all, EXISTS is generally the more efficient choice. It stops processing as soon as a single matching row is found.
  • Correlated Subqueries: EXISTS is often necessary in correlated subqueries (where the subquery references values from the outer query).
  • Potential NULL Values: EXISTS handles NULL values more gracefully than IN.

When to use IN
  • Retrieving Specific Values: If you need the actual values returned by the subquery for use in the outer query, IN is the way to go.
  • Small Result Sets: When the subquery is expected to return a very small and well-defined set of values, the performance difference between EXISTS and IN might be negligible.

Performance Considerations
  • In many cases, EXISTS will outperform IN, especially for larger subquery result sets. This is because IN may process the entire subquery result before evaluating the condition.
  • However, the Oracle optimizer is intelligent and might rewrite some queries internally. The best way to be certain for a specific scenario is to examine the execution plans.

Example: Find customers who have placed at least one order:
  • EXISTS:
    SELECT * FROM Customers
    WHERE EXISTS (SELECT 1 FROM Orders WHERE Orders.CustomerID = Customers.CustomerID); 
    
  • IN:
    SELECT * FROM Customers
    WHERE CustomerID IN (SELECT CustomerID FROM Orders);
    

Correlated Subquery can produce more efficient Code

The correlated subquery seen in the previous lesson may seem like a clumsy substitute for a JOIN query. In some cases this is true. In fact, Oracle's SQL parser resolves the correlated subquery into a JOIN query with no subquery whenever it can.
This lesson shows you how to use the correlated subquery in a way that really pays for itself in more efficient code. The EXISTS clause combined with a correlated subquery executes faster (less CPU time, less I/O time, and less overall response time) than an equivalent subquery using the IN clause in a normal subquery.
EXISTS clause syntax
Here is the general syntax of a query that uses the EXISTS clause:
SELECT col1, col2, ...FROM table alias1
WHERE EXISTS (SELECT 'x'
FROM table alias2
WHERE alias1.col5 = alias2.col6)

The syntax is very similar to the syntax for a regular correlated subquery. The differences are:
  1. Rather than comparing a column with a result FROM the subquery, you simply use EXISTS in the outer query
  2. Rather than returning a data value FROM the subquery, the subquery's SELECT clause always selects one literal: 'X'

Your query will run efficiently when you follow this syntax.

How the EXISTS clause works

The literal in the subquery SELECT clause can be any literal you choose and it makes no difference. You use a literal because this reduces the need for retrieving actual data FROM the database when executing the subquery.
The EXISTS clause works like a switch: as soon as one row is found that satisfies the subquery's WHERE clause, the EXISTS clause sets itself to TRUE and stops. If no row is found, the EXISTS clause sets itself to FALSE. You save time because the database does not go and find every single row that satisfies the subquery. It stops as soon as one row is found. In addition, since the subquery is not retrieving data FROM the table, the execution of the subquery stops after evaluating the WHERE clause and does not continue by retrieving the associated row data. Very often, Oracle has an index to use when evaluating the WHERE clause. An index stores the indexed data and the row ID of the associated table row. Accessing an indexed column is one of the fastest access methods available in an Oracle database.

Example 5-5 semi-join Using WHERE EXISTS
The following query uses a WHERE EXISTS clause to list only the departments that contain employees
SELECT department_id, department_name
FROM departments
WHERE EXISTS (SELECT 1
FROM employees
WHERE employees.department_id = departments.department_id)

The execution plan reveals a NESTED LOOPS SEMI operation in Step 1:
Example 5-5
Example 5-5

For each row in departments, which forms the outer loop, the database obtains the department ID, and then probes the employees.department_id index for matching entries.
Conceptually, the index looks as follows:
10,rowid
10,rowid
10,rowid
10,rowid
30,rowid
30,rowid
30,rowid

If the first entry in the departments table is department 30, then the database performs a range scan of the index until it finds the first 30 entry, at which point it stops reading the index and returns the matching row from departments. If the next row in the outer loop is department 20, then the database scans the index for a 20 entry, and not finding any matches, performs the next iteration of the outer loop. The database proceeds in this way until all matching rows are returned.

Using the EXISTS clause

The best way to use the EXISTS clause is to substitute it for an IN clause.
The link below converts a query with an IN clause to a query with an EXISTS clause.
Change in to exists

Interpret Correlated - Subqueries

Click the link below to try out your skills in interpreting correlated sub-queries.
Interpret Correlated - Subqueries

Using EXISTS and Its Correlated Subquery

EXISTS is a test for existence. It is placed the way IN might be placed with a subquery, but it differs in that it is a logical test for the return of rows from a query, not for the rows themselves. How many authors have written more than one book on the bookshelf?
select AuthorName, COUNT(*)
from BOOKSHELF_AUTHOR
group by AuthorName
having COUNT(*) > 1;

group by AuthorName
group by AuthorName

Attempting to find both AuthorName and Title fails, however, because the group by made necessary by the COUNT(*) is on the primary key of the BOOKSHELF_AUTHOR table (AuthorName, Title). Because each primary key, by definition, uniquely identifies only one row, the count of titles for that one row can never be greater than 1, so the having clause always tests false since it does not find any rows:
select AuthorName, Title, COUNT(*)
from BOOKSHELF_AUTHOR
group by AuthorName, Title
having COUNT(*) > 1;
no rows selected.

EXISTS provides a solution. The following subquery asks, for each AuthorName selected in the outer query, whether an AuthorName exists in the BOOKSHELF_AUTHOR table with a count of Titles greater than one. If the answer for a given name is yes, the EXISTS test is true, and the outer query selects an AuthorName and Title. The author names are correlated by the "BA" alias given to the first BOOKSHELF_AUTHOR table.
column AuthorName format a25
column Title format a30
select AuthorName, Title
  from BOOKSHELF_AUTHOR BA
  where EXISTS
  (select 'x'
    from BOOKSHELF_AUTHOR BA2
    where BA.AuthorName = BA2.AuthorName
    group by BA2.AuthorName
    having COUNT(BA2.Title) > 1)
 order by AuthorName, Title;
In the next lesson, you will learn the unusual case of embedding a subquery in the FROM clause of your query.

SEMrush Software