What is a self join? Explain it with an example and tutorial.

Let’s illustrate the need for a self join with an example. Suppose we have the following table – that is called employee. The employee table has 2 columns – one for the employee name (called employee_name), and one for the employee location (called employee_location):

employee
employee_name employee_location
Joe New York
Sunil India
Alex Russia
Albert Canada
Jack New York

Now, suppose we want to find out which employees are from the same location as the employee named Joe. In this example, that location would be New York. Let’s assume – for the sake of our example – that we can not just directly search the table for people who live in New York with a simple query like this (maybe because we don’t want to hardcode the city name) in the SQL query:

SELECT employee_name
FROM employee
WHERE employee_location = "New York"

So, instead of a query like that what we could do is write a nested SQL query (basically a query within another query – which is more commonly called a subquery) like this:

SELECT employee_name
FROM employee
WHERE employee_location in
( SELECT employee_location
FROM employee
WHERE employee_name = "Joe")
              

A subquery is inefficient

Using a subquery for such a simple question is inefficient. Is there a more efficient and elegant solution to this problem?

It turns out that there is a more efficient solution – we can use something called a self join. A self join is basically when a table is joined to itself. The way you should visualize a self join for a given table is by imagining that a join is performed between two identical copies of that table. And that is exactly why it is called a self join – because of the fact that it’s just the same table being joined to another copy of itself rather than being joined with a different table.

How does a self join work

Before we come up with a solution for this problem using a self join, we should go over some concepts so that you can fully understand how a self join works. This will also make the SQL in our self join tutorial a lot easier to understand, which you will see further below.

A self join must have aliases




In a self join we are joining the same table to itself by essentially creating two copies of that table. But, how do we distinguish between the two different copies of the table – because there is only one table name after all? Well, when we do a self join, the table names absolutely must use aliases otherwise the column names would be ambiguous. In other words, we would not know which of the two copies of the table’s columns is being referenced without using an alias for each copy of the table. If you don’t already know what an alias is, it’s simply another name given to a table – think of an alias as a nickname – and that nickname is then used in the SQL query to reference the table. Because we need two copies of the employee table, we will just use the aliases e1 and e2 for the employee table when we do a self join.

Self join predicate

As with any join there must be a condition upon which a self join is performed – we can not just arbitrarily say “do a self join”, without specifying some condition. That condition will be our join predicate. If you need a refresher on join predicates (or just joins in general) then check this link out: Inner vs. Outer joins.

Now, let’s come up with a solution to the original problem using a self join instead of a subquery. This will help illustrate how exactly a self join works. The key question that we must ask ourselves is what should our join predicate be in this example? Well, we want to find all the employees who have the same location as Joe.

Because we want to match between our two tables (both of which are the same table – employee – aliased as e1 and e2) on location our join predicate should clearly be “WHERE e1.employee_location = e2.employee_location”. But is that enough to give us what we want? No, it’s not, because we also want to filter the rows returned since we only want people who are from the same location as Joe.

So, how can we filter the rows returned so that only people from Joe’s location are returned? Well, what we can do is simply add a condition on one of the tables (e2 in our example) so that it only returns the row where the name is Joe. Then, the other table (e1) will match up all the names that have the same location in e2, because of our join predicate – which is “WHERE e1.employee_location = e2.employee_location”. We will then just select the names from e1, and not e2 because e2 will only have Joe’s name. If that’s confusing then keep reading further to understand more about how the query will work.

So, the self join query that we come up with looks like this:

Self Join SQL Example

SELECT e1.employee_name
FROM employee e1, employee e2
WHERE e1.employee_location = e2.employee_location
AND e2.employee_name="Joe";

This query will return the names Joe and Jack – since Jack is the only other person who lives in New York like Joe.

Generally, queries that refer to the same table can be greatly simplified by re-writing the queries as self joins. And, there is definitely a performance benefit for this as well.

What does a self join look like?

It will help tremendously to actually visualize the actual results of a self join internally. Remember that a self join is just like any other join, where the two tables are merged into one temporary table. First off, you should visualize that we have two separate copies of the employee table, which are given aliases of e1 and e2. These copies would simply look like this – note that we shortened the column names from employee_name and employee_location to just Name and Location for convenience:

e1 e2
Name Location
Joe New York
Sunil India
Alex Russia
Albert Canada
Jack New York
Name Location
Joe New York
Sunil India
Alex Russia
Albert Canada
Jack New York

And the final results of running the self join query above – the actual joined table – would look like this:

e1.employee_name e1.employee_location e2.employee_name e2.employee_location
Joe New York Joe New York
Jack New York Joe New York

Self joins versus inner joins




Are self joins and inner joins the same? You might be wondering if all self joins are also inner joins. After all, in our example above our self join uses an inner join because only the rows that match based on the join predicate are returned – non-matching rows are not returned. Well, it turns out that a self join and inner join are completely different concepts. A self join could just as well be an outer join or an inner join – it just depends on how the query is written. We could easily change the query we used above to do a LEFT OUTER JOIN – while the query still remains a self join – but that wouldn’t give us the results we want in our example. So, we use an implied inner join instead because that gives us the correct results. Remember that a query is a self join as long as the two tables being joined are exactly the same table, but whether it’s an inner join or outer join depends on what is specified in the SQL. And, inner/outer joins are separate concepts entirely from a self join.

Self joins manager employee example


The most commonly used example for self joins is the classic employee manager table. The table is called Employee, but holds all employees – including their managers. Every employee has an ID, and there is also a column for the manager ID. So, for example, let’s say we have a table that looks like this – and we call it Employee:

EmployeeID Name ManagerID
1 Sam 10
2 Harry 4
4 Manager NULL
10 AnotherManager NULL

Notice that in the table above there are two managers, conveniently named “Manager” and “AnotherManager”. And, those managers don’t have managers of their own – as noted by the NULL value in their Manager column.

Now, given the table above, how can we return results that will show each employee’s name, and his/her manager’s name in nicely arranged results – with the employee in one column and his/her manager’s name in the other column. Well, it turns out we can use a self join to do this. Try to come up with the SQL on your own before reading our answer.

Self join manager employee answer

In order to come up with a correct answer for this problem, our goal should be to perform a self join that will have both the employee information and manager information in one row. First off, since we are doing a self join, it helps to visualize the one table as two tables – let’s give them aliases of e1 and e2. Now, with that in mind, we want the employee’s information on one side of the joined table and the manager’s information on the other side of the joined table. So, let’s just say that we want e1 to hold the employee information and e2 to hold the corresponding manager’s information. What should our join predicate be in that case?

Well, the join predicate should look like “ON e1.ManagerID = e2.EmployeeID” – this basically says that we should join the two tables (a self join) based on the condition that the manager ID in e1 is equal to the employee ID in e2. In other words, an employee’s manager in e1 should have the manager’s information in e2. An illustration will help clarify this. Suppose we use that predicate and just select everything after we join the tables. So, our SQL would look like this:

SELECT *
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID

The results of running the query above would look like this:

e1.EmployeeID e1.Name e1.ManagerID e2.EmployeeID e2.Name e2.ManagerID
1 Sam 10 10 AnotherManager NULL
2 Harry 4 4 Manager NULL

Note that there are only 2 rows returned – this is because an inner join is performed, which means that only when there is a match between employee ID’s and manager ID’s will there be a result returned. And since there are 2 people without managers (who have a manager ID of NULL), they will not be returned as part of table e1, because no employees have a matching ID of NULL.

Now, remember that we only want to return the names of the employee and corresponding manager as a pair. So, we can fine-tune the SQL as follows:

SELECT e1.Name, e2.Name
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID

Running the SQL above would return:

Sam    AnotherManager
Harry  Manager

And that is the answer to the employee manager problem using a self join! Feel free to post any comments.

Hiring? Job Hunting? Post a JOB or your RESUME on our JOB BOARD >>

Subscribe to our newsletter for more free interview questions.

  • Kelly

    select e1.Name, e2.Name as manager_name from employee e1 left join employee e2
    on e1.ManagerID = e2.employeeID gives the same results as the solution provided. shouldn’t it be different?

  • Kelly

    Why do I get the same result if I use “LEFT JOIN” instead of inner join? Also why do I get 0 rows if I put RIGHT JOIN?

  • Md Farooq

    awesome explanation, I am enjoying reading this, like I enjoy reading Novel.

  • Satish Kumar Lakshmanan

    Very good explanation of self join illustrating the inner details. This is what is missed in some of the other tutorials that explain self join. For a beginner, this would be of great help.

  • Shivanand Havannavar

    one of the finest explained regarding self join with very nice and simple example

  • Anirudh Garg

    Best explanation so far on Self join I have read.. Thanks a lot !! Looking forward to reading more posts.

  • Swapna

    So much informative. I really had a very good understanding on self join.Thank you so much

  • Marco Peters

    Excellent explanation.

  • venkatesh kannan

    I have the same doubt as RAMAA.
    Also for the employee manager example, is the below SQl same as the one mentioned in the tutorial.

    SELECT e1.Name, e2.Name
    FROM Employee e1
    WHERE Employee e1 = Employee e2
    ON e1.ManagerID = e2.EmployeeID

    Please comment. Thank you for your time.

  • Emma Abbas

    the best explanation for self joins!
    Thankyou.

  • Sandeep

    I was fortunate to find this web site!! Thanks for this great stuff.

  • titas bhattacharya

    most nice and detailed explanation that I have ever come across.. OSM OSM WEBSITE , PLEASE KEEP UP THE GOOD WORK.. and continue to feed us with your invaluable content. BIG THANKS

  • Joseph

    Should this query in the article be using inner join instead?

    SELECT e1.employee_name
    FROM employee e1, employee e2
    WHERE e1.employee_location = e2.employee_location
    AND e2.employee_name=”Joe”;

    According to http://stackoverflow.com/questions/894490/sql-left-join-vs-multiple-tables-on-from-line

    it is old syntax.

  • kumari k

    very nice explanation

  • Bala

    good explanation but still need more clear answer about self join

  • Anjali

    so confused before regarding selfjoin, now i get it. Thanks a lot…

  • Md Farooq

    what an explantion, hats off to you guys.

  • RAMAA

    SELECT e1.employee_name
    FROM employee e1, employee e2
    WHERE e1.employee_location = e2.employee_location
    AND e2.employee_name=”Joe”;

    In this SQL self-join example, the query don’t have any form of JOIN keyword,
    is this’s still a JOIN??

    whether a query meant for join must have JOIN keyword in it or it’s optional ??

  • baidpunit1017

    it is already solved using aliasing.

  • kanu

    one of the best explanation

  • Smiji John

    Good one .Simple and clear.

    if an employee is manager for more than one employees, that manager name appear in multiple rows.

    If I want to get the list of only managers from the table and I don’t want employee name , we may have to use either distinct or a sub query.

    Is there any better approach to avoid duplicates in such cases.

    eg:

    select Name from Employee where EmployeeID in (select ManagerID from Employee);
    or
    select distinct(e2.Name) “Manager Name” from Employee e1 inner join Employee e2 on e1.ManagerID=e2.EmployeeID;

  • Bhavin Rathod

    Very handy tool for last minute preparation. Explains every topic perfectly and deeply.

  • sunil tewatia

    propert_details table name or property_id,property_name two columns then how toget all data

  • G

    Very clear explanation! Thanks

  • Ashi

    explained in easy way… thumbsup

  • Nishtha

    Excellent explanation. Thank You

  • Cici

    awesome!

  • raj

    great explanation. Thank you

  • From India

    thank u soo much !!!! 🙂

  • how can we solve employee-manager problem using aliasing?

  • Shubham

    Awesome explanation..!! Thanks..!!

  • juhi

    Best explanation available on internet..:)

  • Ranju

    Great explanation. Thank you!

  • Amit

    really superb usefull explaination…Thank you very much

  • cromateu

    seems to b easy yet complicated

  • Steve-0

    The first explanation I grokked. You’re awesome!

  • alia

    THanx..for explanation,

  • Sampath

    Thanks for the excellent post Varoon..It is really really helpful for people who want to understand the basics. May God Bless you.

  • in very simple words, the very useful explanation. Thanks

  • allwyn mascarenhas

    i tried the same example . .this is what i get. some assistance please!
    http://postimg.org/image/ntukt234b/

  • allwyn mascarenhas

    I got lil confused in the last example. Is it that only 1 table holds both employees and managers then what is the meaning of manager and anothermanager. Also the name column is common for both managers and employees? I’m also thinking maybe something was left out somewhere. .please comment someone. And great answers, you are adding so much value to our lives! THANKS!

  • harami

    brilliant don

  • Fai

    Super tutu thank you you Ace It : )

  • Eswaran Thirumalairaj

    very usefull

  • nitin chauhan

    I would like to assert that this is tutorial is amongst the top of java and database tutorials available on other web sites and I sincerely thanks for providing such an excellent quality of technical material on subjects like database and Java. The main thing that I have observed is the elaborate, efficient and indepth explanation of the key concepts with examples that have helped me a lot in further sharpening my skills and knowledge.
    I hope that you will continue the legacy of providing the sameon other hot subjects related to java and database in the future.
    Thanks,
    Nitin Chauhan

  • abhynay

    Very insightful explanation.Really loved the way you handled the topic.

  • Shwetha

    Excellent explanation . One of the best tutorial websites I have visited so far. Everything is explained in such a simple way. great work !!

  • anonymous

    one of the best explanations… great work !

  • Joe Mann

    Great Example! Keep it up!