SQL Joins
Table of contents
CROSS JOIN
A CROSS JOIN
returns the Cartesian product of two tables, creating n * m
rows where n
and m
are the number of rows in each table.
SELECT *
FROM a CROSS JOIN b;
Or you could just use a comma ,
:
SELECT *
FROM a, b;
Inner vs Outer Joins
Theoretically, you could think of all joins starting from the Cartesian product.
Not exactly what the query optimizer does, but it helps to understand the concept.
After obtaining the Cartesian product, we filter out rows based on some conditions:
- Inner Join: Returns only the rows that fulfill the condition.
- Outer Join: Returns the rows that fulfill the condition and
- Left Outer Join: Also the rows with values from the left table
- Right Outer Join: Also the rows with values from the right table
- Full Outer Join: Also the rows with values from both tables
For example, left outer join has the following logic:
- We have a Cartesian product of two tables.
- The rows that fulfill the condition are kept.
- In addition, find all rows from the left table (not the Cartesian product) that did not make it to the final result.
- For each of those rows, fill in
NULL
values for the right table columns, and include them in the final result.
Join Conditions
ON
With ON
, you can use any comparison operators to join tables =
, >
, <
, >=
, <=
, <>
, !=
, etc.
Equi-Join
Equi-join is the most common type of join. It is when a join condition uses the =
operator.
SELECT s.store_id, a.address, district
FROM store s
JOIN address a ON s.address_id = a.address_id;
Range variable s
and a
are technically optional, but they are highly recommended and can become necessary to resolve ambiguity when referencing column names that exist in both tables.
You can equi-join on multiple columns as well with AND
:
SELECT *
FROM a
JOIN b ON a.key1 = b.key1 AND a.key2 = b.key2;
USING
When you want to equi-join on columns with the same name, you can use the shorthand USING
keyword:
SELECT s.store_id, a.address, district
FROM store s
JOIN address a USING (address_id);
Do not forget the parentheses ()
around the key name.
JOIN / INNER JOIN
In PostgreSQL, when you use JOIN
, it is equivalent to INNER JOIN
.
Only returns rows that fulfill the condition.
SELECT *
FROM a
[INNER] JOIN b ON condition;
Outer Joins
In keywords, the word OUTER
is omitted:
LEFT JOIN
RIGHT JOIN
FULL JOIN
Remember, for dangling left or right rows, NULL
values are filled in for the columns from the other table.
Exclusive Outer Joins
To filter out rows that have matching values, you can add a NULL
check in the WHERE
clause.
For exclusive left join:
SELECT *
FROM a
LEFT JOIN b USING (key)
WHERE b.key IS NULL;
For exclusive right join:
SELECT *
FROM a
RIGHT JOIN b USING (key)
WHERE a.key IS NULL;
For exclusive full join:
SELECT *
FROM a
FULL JOIN b USING (key)
WHERE a.key IS NULL OR b.key IS NULL;
NATURAL Joins
Natural join is an equi-join on columns with the same name.
NATURAL
is a keyword that can come before INNER
, LEFT
, or RIGHT
, or simply JOIN
.
SELECT *
FROM a
NATURAL [INNER | LEFT | RIGHT] JOIN b;
For NATURAL JOIN
, the columns with the same name are automatically matched. Joins with USING
can be simplified with NATURAL JOIN
.
The problem is, if there are multiple columns with the same name, natural join will try to find a match across all those columns.
In other words, SELECT * FROM a NATURAL JOIN b
with two matching column names key1
and key2
is equivalent to:
SELECT *
FROM a
JOIN b
ON a.key1 = b.key1 AND a.key2 = b.key2;
Commonly used column names such as created_time
or last_updated
can easily mess up natural join and lead to empty results. It is not recommended to use NATURAL JOIN
because it can lead to ambiguity.
SELF JOIN
Self join is simply joining a table with itself.
The only caveat is that aliasing becomes mandatory.
SELECT *
FROM tbl t1
JOIN tbl t2 ON t1.key = t2.key;