In SQL, what’s the difference between an inner and outer join?
Joins are used to combine the data from two tables, with the result being a new, temporary table. The temporary table is created based on column(s) that the two tables share, which represent meaningful column(s) of comparison. The goal is to extract meaningful data from the resulting temporary table. Joins are performed based on something called a predicate, which specifies the condition to use in order to perform a join. A join can be either an inner join or an outer join, depending on how one wants the resulting table to look.
It is best to illustrate the differences between inner and outer joins by use of an example. Here we have 2 tables that we will use for our example:
It’s important to note that the very last row in the Employee table does not exist in the Employee Location table. Also, the very last row in the Employee Location table does not exist in the Employee table. These facts will prove to be significant in the discussion that follows.
Let’s start the explanation with outer joins. Outer joins can be be further divided into left outer joins, right outer joins, and full outer joins. Here is what the SQL for a left outer join would look like, using the tables above:
In this SQL we are joining on the condition that the employee ID’s match in the rows tables. So, we will be essentially combining 2 tables into 1, based on the condition that the employee ID’s match. Note that we can get rid of the "outer" in left outer join, which will give us the SQL below. This is equivalent to what we have above.
A left outer join retains all of the rows of the left table, regardless of whether there is a row that matches on the right table. The SQL above will give us the result set shown below.
The Join Predicate – a geeky term you should know
Earlier we had mentioned something called a join predicate. In the SQL above, the join predicate is "on employee.empID = location.empID". This is the heart of any type of join, because it determines what common column between the 2 tables will be used to "join" the 2 tables. As you can see from the result set, all of the rows from the left table are returned when we do a left outer join. The last row of the Employee table (which contains the "Johson" entry) is displayed in the results even though there is no matching row in the Location table. As you can see, the non-matching columns in the last row are filled with a "NULL". So, we have "NULL" as the entry wherever there is no match.
A right outer join is pretty much the same thing as a left outer join, except that the rows that are retained are from the right table. This is what the SQL looks like:
Using the tables presented above, we can show what the result set of a right outer join would look like:
We can see that the last row returned in the result set contains the row that was in the Location table, but not in the Employee table (the "Bangalore, India" entry). Because there is no matching row in the Employee table that has an employee ID of "39", we have NULL’s in the result set for the Employee columns.
Now that we’ve gone over outer joins, we can contrast those with the inner join. The difference between an inner join and an outer join is that an inner join will return only the rows that actually match based on the join predicate. Once again, this is best illustrated via an example. Here’s what the SQL for an inner join will look like:
This can also be written as:
Now, here is what the result of running that SQL would look like:
Inner vs Outer Joins
We can see that an inner join will only return rows in which there is a match based on the join predicate. In this case, what that means is anytime the Employee and Location table share an Employee ID, a row will be generated in the results to show the match. Looking at the original tables, one can see that those Employee ID’s that are shared by those tables are displayed in the results. But, with a left or right outer join, the result set will retain all of the rows from either the left or right table.