Back to Basics: SQL Joins

I recently had to explain to some people the difference between the different kinds of SQL joins. Now I have been using SQL for well over 15 years and feel myself quite the expert, however this stumped me like a number 10 batsmen facing an Alan Donald yorker.

The problem is more about terminology and theory, even though I knew exactly what the different joins were, when I would use each one and what the benefits and drawbacks were, it was not that simple to explain (for me anyway).

Thus, a few blog posts to cover some of the OLD theory and refresh the brain.

Basic SQL Join Types

There are four basic types of SQL joins: inner, left, right, and full. The easiest and most intuitive way to explain the difference between these four types is by using a Venn diagram, which shows all possible logical relations between data sets. 

Let’s say we have two sets of data in our relational database: table A and table B, with some sort of relation specified by primary and foreign keys. The result of joining these tables together can be visually represented by the following diagram: 

The extent of the overlap, if any, is determined by how many records in Table A match the records in Table B. Depending on what subset of data we would like to select from the two tables, the four join types can be visualized by highlighting the corresponding sections of the Venn diagram:

Select all records from Table A and Table B, where the join condition is met.
Select all records from Table A and Table B, where the join condition is met.
Select all records from Table A, along with records from Table B for which the join condition is met (if at all).
Select all records from Table A, along with records from Table B for which the join condition is met (if at all).
Select all records from Table B, along with records from Table A for which the join condition is met (if at all).
Select all records from Table B, along with records from Table A for which the join condition is met (if at all).
Select all records from Table A and Table B, regardless of whether the join condition is met or not.
Select all records from Table A and Table B, regardless of whether the join condition is met or not.

Examples of SQL Join Types

Let’s use the tables we introduced in the “What is a SQL join?” section to show examples of these joins in action. The relationship between the two tables is specified by the customer_id key, which is the “primary key” in customers table and a “foreign key” in the orders table:

customer_idfirst_namelast_nameemailaddresscitystatezipcode
1GeorgeWashingtongwashington@usa.gov3200 Mt Vernon HwyMount VernonVA22121
2JohnAdamsjadams@usa.gov1250 Hancock StQuincyMA02169
3ThomasJeffersontjefferson@usa.gov931 Thomas Jefferson PkwyCharlottesvilleVA22902
4JamesMadisonjmadison@usa.gov11350 Constitution HwyOrangeVA22960
5JamesMonroejmonroe@usa.gov2050 James Monroe ParkwayCharlottesvilleVA22902
order_idorder_dateamountcustomer_id
107/04/1776$234.561
203/14/1760$78.503
305/23/1784$124.002
409/03/1790$65.503
507/21/1795$25.5010
611/27/1787$14.409

Note that (1) not every customer in our customers table has placed an order and (2) there are a few orders for which no customer record exists in our customers table.

Inner Join

Let’s say we wanted to get a list of those customers who placed an order and the details of the order they placed. This would be a perfect fit for an inner join, since an inner join returns records at the intersection of the two tables.

select first_name, last_name, order_date, order_amount
from customers c
inner join orders o
on c.customer_id = o.customer_id

view rawinner-join.sql hosted with ❤ by GitHub

first_namelast_nameorder_dateorder_amount
GeorgeWashington07/4/1776$234.56
JohnAdams05/23/1784$124.00
ThomasJefferson03/14/1760$78.50
ThomasJefferson09/03/1790$65.50

Note that only George Washington, John Adams and Thomas Jefferson placed orders, with Thomas Jefferson placing two separate orders on 3/14/1760 and 9/03/1790.

Left Join

If we wanted to simply append information about orders to our customers table, regardless of whether a customer placed an order or not, we would use a left join. A left join returns all records from table A and any matching records from table B.

select first_name, last_name, order_date, order_amount
from customers c
left join orders o
on c.customer_id = o.customer_id

view rawleft-join.sql hosted with ❤ by GitHub

first_namelast_nameorder_dateorder_amount
GeorgeWashington07/04/1776$234.56
JohnAdams05/23/1784$124.00
ThomasJefferson03/14/1760$78.50
ThomasJefferson09/03/1790$65.50
JamesMadisonNULLNULL
JamesMonroeNULLNULL

Note that since there were no matching records for James Madison and James Monroe in our orders table, the order_date and order_amount are NULL, which simply means there is no data for these fields.

So why would this be useful? By simply adding a “where order_date is NULL” line to our SQL query, it returns a list of all customers who have not placed an order:

select first_name, last_name, order_date, order_amount
from customers c
left join orders o
on c.customer_id = o.customer_id
where order_date is NULL

view rawleft-join-alt.sql hosted with ❤ by GitHub

Right Join

Right join is a mirror version of the left join and allows to get a list of all orders, appended with customer information.

select first_name, last_name, order_date, order_amount
from customers c
right join orders o
on c.customer_id = o.customer_id

view rawright-join.sql hosted with ❤ by GitHub

first_namelast_nameorder_dateorder_amount
GeorgeWashington07/04/1776$234.56
ThomasJefferson03/14/1760$78.50
JohnAdams05/23/1784$124.00
ThomasJefferson09/03/1790$65.50
NULLNULL07/21/1795$25.50
NULLNULL11/27/1787$14.40

Note that since there were no matching customer records for orders placed in 1795 and 1787, the first_name and last_name fields are NULL in the resulting set.

Also note that the order in which the tables are joined is important. We are right joining the orders table to the customers table. If we were to right join the customers table to the orders table, the result would be the same as left joining the orders table to the customers table.

Why is this useful? Simply adding a “where first_name is NULL” line to our SQL query returns a list of all orders for which we failed to record information about the customers who placed them:

select first_name, last_name, order_date, order_amount
from customers c
right join orders o
on c.customer_id = o.customer_id
where first_name is NULL

view rawright-join-alt.sql hosted with ❤ by GitHub

Full Join

Finally, for a list of all records from both tables, we can use a full join.

select first_name, last_name, order_date, order_amount
from customers c
full join orders o
on c.customer_id = o.customer_id

view rawfull-join.sql hosted with ❤ by GitHub

first_namelast_nameorder_dateorder_amount
GeorgeWashington07/04/1776$234.56
ThomasJefferson03/14/1760$78.50
JohnAdams05/23/1784$124.00
ThomasJefferson09/03/1790$65.50
NULLNULL07/21/1795$25.50
NULLNULL11/27/1787$14.40
JamesMadisonNULLNULL
JamesMonroeNULLNULL

Leave a Reply

Your email address will not be published. Required fields are marked *