IN and EXISTS often produce the same result, but when negated and dealing with NULL values, they behave differently.

Let's see an example, assume we have these two tables:

orders

order_idcustomer_idamount
110050.00
210175.00
310230.00
4NULL20.00

customers

customer_idname
100Bob
101Alice
102Martha
NULLJohn

Ignore the fact that a well formed customers table would need to have the id always specified and ideally as a constraint. This is just an example.

Now, let’s say we want to find orders where the customer_id is not present in the customers table.

Using NOT EXISTS we would write something like:

SELECT *
FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.customer_id = o.customer_id);

The output, would probably be what we expect:

order_idcustomer_idamount
4NULL20.00

Using NOT IN we would write something like:

SELECT *
FROM orders
WHERE customer_id NOT IN (SELECT customer_id FROM customers);

The output here will be empty!

When we use NOT IN, SQL checks each value in the orders table against the list of values returned by the subquery. If any value in the subquery result is NULL, the entire NOT IN comparison will result in NULL for each row in the orders table. This is because any comparison with NULL yields NULL and NOT IN needs all comparisons to be TRUE for a row to be included in the result.

This is not a problem with NOT EXISTS, because the NOT EXISTS clause checks for the non-existence of rows that meet the criteria specified in the subquery. It does not perform direct comparisons with NULL in the same way NOT IN does. Instead, it simply checks if there are any rows that match the condition. If no such rows exist, the condition is true.