Suppose that you are given the following simple database table called Employee that has 2 columns named Employee ID and Salary:
Write a SQL query to get the second highest salary from the table above.
The easiest way to start with a problem like this is to ask yourself a simpler question first. With that in mind, how would you find the highest salary in a table? Well, as you probably know that is actually really easy – just use the MAX aggregate function:
select MAX(Salary) from Employee;
Remember that SQL is based on set theory
You should remember that SQL uses sets as the foundation for most of its queries. So, the question is how can we use set theory to find the 2nd highest salary in the table above? Think about it on your own for a bit – even if you do not remember much about sets, the answer is very easy to understand and something that you might be able to come up with on your own.
How to get the answer
What if we try to exclude the highest salary value from the result set returned by the SQL that we run? Then we will be left with a new set of values where the highest value is actually the 2nd highest value in the original Employee table. So, if we can somehow select the highest value from a result set that excludes the highest value, then we would actually be selecting the 2nd highest salary value. Think about that carefully and see if you can come up with the actual SQL yourself before you read the answer that we provide below. Here is a small hint to help you get started: you will have to use the “NOT IN” SQL operator.
Solution to finding the 2nd highest salary
Now, here is what the SQL will look like:
select MAX(Salary) from Employee WHERE Salary NOT IN (select MAX(Salary) from Employee )
Running the SQL above would return us “450″, which is of course the 2nd highest salary in the Employee table.
An explanation of the solution
The SQL above first finds the highest salary value in the Employee table using “(select MAX(Salary) from Employee)”. Then, adding the “WHERE Salary NOT IN” in front basically creates a new set of Salary values that does not include the highest Salary value. For instance, if the highest salary in the Employee table is 200,000 then that value will be excluded from the results using the “NOT IN” operator, and all values except for 200,000 will be retained in the results.
This now means that the highest value in this new set will actually be the 2nd highest value in the Employee table. So, we then select the max Salary from the new result set, and that gives us 2nd highest Salary in the Employee table. That’s pretty interesting right?
An alternative solution using the not equals SQL operator
We can actually use the not equals operator – the “<>” – instead of the NOT IN operator as an alternative solution to this problem. This is what the SQL would look like:
select MAX(Salary) from Employee WHERE Salary <> (select MAX(Salary) from Employee )
How would you write a SQL query to find the Nth highest salary?
What we did above was find the 2nd highest Salary value in the Employee table. But, what if we want to find the Nth highest salary, where N can be any number whether it’s the 3rd highest, 4th highest, 5th highest, 10th highest, etc? This is also an interesting question – try to come up with an answer yourself before reading the one below to see what you come up with.
The answer and explanation to finding the nth highest salary
The SQL below will give you the correct answer – but you will have to plug in an actual value for N of course. This SQL to find the Nth highest salary should work in SQL Server, MySQL, DB2, Oracle, and almost any other RDBMS:
SELECT * FROM Employee Emp1 WHERE (N-1) = ( SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > Emp1.Salary)
How does the query above work?
The query above can be quite confusing if you have not seen anything like it before – the inner query is what’s called a correlated sub-query because the inner query (the subquery) uses a value from the outer query (in this case the Emp1 table) in it’s WHERE clause. If you want to read more about the differences between correlated and uncorrelated subqueries you can go here: Correlated vs Uncorrelated Subqueries.
The most important thing to understand in the query above is that the sub-query is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the Emp1 value as well.
Finding nth highest salary example and explanation
Let’s step through an actual example to see how the query above will actually execute step by step. Suppose we are looking for the 2nd highest Salary value in our table above, so our N is 2. This means that the query will look like this:
SELECT * FROM Employee Emp1 WHERE (1) = ( SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > Emp1.Salary)
You can probably see that Emp1 and Emp2 are just aliases for the same Employee table – it’s like we just created 2 separate clones of the Employee table and gave them different names.
From a very high level, the way the query above works is by finding the number of salaries that are greater than the current salary being searched for. If that number is exactly equal to N-1, then we know that we have the Nth highest salary, otherwise we continue searching through the table. If this description does not make sense, then read on for a more detailed description, which should make sense to you.
The way to visualize this query running with our sample data above (and assuming N is 2), is that the first row of the Emp1 table will be processed in the outer query and the value used for Emp1.Salary in the inner query will be 200. This means that the inner query will look for the number of distinct salaries (using “COUNT(DISTINCT(Emp2.Salary))”) in the Emp2 table that are greater than 200. Remember that Emp2 is just a clone (an alias) of the Employee table, as is Emp1. Since 450 and 800 are both greater than 200, the result returned from the inner query will be 2, since that is the count of salaries greater than 200. And there will be no match with the “WHERE (1) = ….” statement, so the SQL processor will move on to the next row in the Emp1 table.
Now, the next row in Emp1 will be processed, which will be for the salary of 800 and Employee ID of 4. Since there are no salaries in Emp2 greater than 800, the count will be 0, which does not equal 1 so the outer query will move on to the next row. Finally, it reaches the last row, which has a salary of 450. Since 800 is greater than 450, the inner query will return a 1, which matches and then the outer query will execute and select everything from that particular row in table Emp1. This means that these will be the results returned:
Hopefully now you understand how to solve a problem like this, and you have improved your SQL skills in the process!