What is a self join? Explain it with an example

 

Let’s illustrate the need for a self join with an example. Suppose we have the following table called emp_location, with both the employee name and their location:

employee
employee_name employee_location
Joe New York
Sunil India
Alex Russia
Albert Canada
Jack New York

Now, suppose we want to find out which employees are from the same location as the employee named Joe. What we could do is write a nested SQL query (basically a query within another query) like this:

SELECT employee_name
FROM employee
WHERE employee_location in
( SELECT employee_location
FROM employee
WHERE employee_name = "Joe")
              

A nested subquery for such a simple question is inefficient. Is there a more efficient and elegant solution to this problem?

It turns out that there is – we can use something called a self join. A self join is basically when a join is done on the same table – the best way to think of it is that we have 2 identical copies of the table, and we want to join them based on some predicate. If you need a refresher on join predicates (or just joins in general) then check this link out: Inner vs. Outer joins

Now, the key question is what would be our join predicate in this example? Well, we want to find all the employees who have the same location as Joe – so if we are doing a join we would want to make sure that the location is the same and that the employee name is Joe. So, our join predicate would be where e1.employee_location = e2.employee_location AND employee_name = "Joe". Note that e1 and e2 will represnt the 2 employee tables that we are doing a self join on. Now, here is what the SQL for a self join would look like to solve this problem:

SELECT e1.employee_name
FROM employee e1, employee e2
WHERE e1.employee_location = e2.employee_location
AND e2.employee_name="Joe";

This query will return the names Joe and Jack – since Jack is the only other person who lives in New York like Joe.

Generally, queries that refer to the same table can be greatly simplified by re-writing the queries as self joins. And, there is definitely a performance benefit for this as well.