What is a self join? Explain it with an example |
||||||||||||||||
|
Let’s illustrate the need for a self join with an example. Suppose we have the following table called emp_location, with both the employee name and their location:
Now, suppose we want to find out which employees are from the same location as the employee named Joe. What we could do is write a nested SQL query (basically a query within another query) like this:
A nested subquery for such a simple question is inefficient. Is there a more efficient and elegant solution to this problem? It turns out that there is – we can use something called a self join. A self join is basically when a join is done on the same table – the best way to think of it is that we have 2 identical copies of the table, and we want to join them based on some predicate. If you need a refresher on join predicates (or just joins in general) then check this link out: Inner vs. Outer joins Now, the key question is what would be our join predicate in this example? Well, we want to find all the employees who have the same location as Joe – so if we are doing a join we would want to make sure that the location is the same and that the employee name is Joe. So, our join predicate would be where e1.employee_location = e2.employee_location AND employee_name = "Joe". Note that e1 and e2 will represnt the 2 employee tables that we are doing a self join on. Now, here is what the SQL for a self join would look like to solve this problem:
This query will return the names Joe and Jack – since Jack is the only other person who lives in New York like Joe. Generally, queries that refer to the same table can be greatly simplified by re-writing the queries as self joins. And, there is definitely a performance benefit for this as well. |

