This is part 2 of our advanced practice SQL interview questions and answers. We highly suggest that you read part 1 of our Advanced SQL interview Questions before reading this, since a lot of the concepts presented in this portion are discussed in more depth in part 1.

The problem is based on the tables presented below where salespeople have orders with certain customers that are in the Customers table.

Salesperson Customer
ID Name Age Salary
1 Abe 61 140000
2 Bob 34 44000
5 Chris 34 40000
7 Dan 41 52000
8 Ken 57 115000
11 Joe 38 38000
ID Name City Industry Type
4 Samsonic pleasant J
6 Panasung oaktown J
7 Samony jackson B
9 Orange Jackson B
Orders
Number order_date cust_id salesperson_id Amount
10 8/2/96 4 2 2400
20 1/30/99 4 8 1800
30 7/14/95 9 1 460
40 1/29/98 7 2 540
50 2/3/98 6 7 600
60 3/2/98 6 7 720
70 5/6/98 9 7 150

In the tables above, each order in the Orders table is associated with a given Customer through the cust_id foreign key column that references the ID column in the Customer table.

Here is the problem: find the largest order amount for each salesperson and the associated order number, along with the customer to whom that order belongs to. You can present your answer in any database’s SQL – MySQL, Microsoft SQL Server, Oracle, etc.

The answer to the problem and explanation

This question seems to be quite simple – but as you will soon find out it is deceptively complex. For each salesperson, all we need to retrieve is the largest order amount, and the associated order number. In order to retrieve that information we should be able to simply do a join between the Orders and Salesperson tables wherever the Salesperson.ID is equal to the Orders.salesperson_id (this would be our join predicate). Then, we could group the results of that join by the Orders.salesperson_id column and retrieve both the highest valued order (by using max(Amount)), and the associated Order number.

Let’s say that we choose to write our answer in MySQL. In MySQL we could legally write some code that looks like this:

SELECT Orders.Number, max(Amount) 
FROM Orders JOIN Salesperson 
ON Salesperson.ID = Orders.salesperson_id 
GROUP BY Orders.salesperson_id

And, if we run that code above it will return this as a result:

Number	      max(Amount)
30	        460
10	        2400
50	        720
20	        1800

The problem with the data returned in MySQL

But, there’s a problem with the results returned from running the SQL above, and it should be fairly obvious once you actually look at the data in the tables we have above. Here is the problem: Order number 50 does not have an amount of “720″ – that amount actually belongs to order number 60. So, what is going on here? Why do the results return an order number that is not even in the same row as the max(Amount) of 720? And why are all of the other results correct?

Understanding the group by statement is critical

Well, we will have to explain a bit more about what’s going on with the group by. If you already read Part 1 of the advanced SQL interview questions then you should understand exactly what the problem is with the SQL above, and you can safely skip down to the section that says “New approach to the problem – start with a subquery”. Unless you want to reinforce the concepts presented in part 1, in which case we highly recommend that you read this entire explanation to this rather difficult interview question.

When we group by the salesperson ID, there will be one group created for each and every salesperson ID. So, there will be 6 groups created – 1 for ID of 1, another for ID of 2, and others for ID’s 5, 7, 8, and 11. Inside those groups will be any rows that share the same salesperson ID values.

When we select the max(Amount), MySQL will simply look for the highest value for Amount within each group and return that value. And when we select Orders.Number, MySQL is not going to return every Orders.Number value from each group – it is only going to select one value from each group.

Subscribe to our newsletter for more free interview questions.

Our SQL is not specific enough

But, the question is which order number should be returned from each group? Each group can potentially have more than just one order number as long as there are more than one rows belonging to the group. And that is the exact problem – the SQL that we wrote is not specific enough, and MySQL will just arbitrarily/randomly return one of the values of the Orders.Number within each group. In this case, because order number 50 is part of the group created by the salesperson_id’s of 7, it will return 50. MySQL could just as well have returned order numbers 60 or 70 – the point is that it just randomly chooses one order number from each group. For the group created by salesperson ID of 2, the fact that the order number 10 is chosen (order number 10 corresponds to the largest order amount of 2400) is just pure coincidence – MySQL could have returned us order number 40, which is also a part of the same group as salesperson ID of 2.

Most relational database implementations would have thrown an error if we tried to run the SQL above because the results are potentially arbitrary, as we just illustrated. MySQL is the exception, because it allows us to run the SQL above error-free, but as we illustrated the data returned could potentially not make any sense. Be sure to read Part 1 of the advanced SQL interview questions for more details on why.

Well, now we know that there is definitely an issue with the SQL above, so how can we write a good query that would give us exactly what we want – along with the correct order number?

New approach to the problem – start with a subquery

Now let’s instead just try to break the problem down into more manageable pieces – starting with a simple subquery. Here is a subquery to get the highest valued order for each salesperson:

SELECT salesperson_id, MAX(Amount) AS MaxOrder
FROM Orders
GROUP BY salesperson_id

Running the query above will return this:


salesperson_id MaxOrder
1 460
2 2400
7 720
8 1800

The query above gives us the salesperson_id and that salesperson’s associated highest order amount – but it still does not give us the order number associated with the highest order amount. So, how can we find the order number as well?

Clearly we need to do something else with the subquery we have above that will also give us the correct order number. What are our options? Try to come up with an answer on your own before reading on.

Well, we can do a join with the results of the subquery above. But, on what condition should our join be done and what exactly should we be joining the subquery above with?

What if we join our subquery above with data from the Orders table, where the join is done on the basis that the salesperson_id matches, AND that the value in the Order table’s Amount column is equal to the amount (MaxOrder) returned from the subquery? This way, we can match up the correct Order Number with the correct corresponding value for the maximum Order Amount for a given salesperson_id.

With that in mind, we can write this query:

select salesperson_id, Number as OrderNum, Amount from Orders 
JOIN (  -- this is our subquery from above:
SELECT salesperson_id, MAX(Amount) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) as TopOrderAmountsPerSalesperson
USING (salesperson_id)
 where Amount = MaxOrder

Running the query above returns us this:

salesperson_id OrderNum Amount
8 20 1800
1 30 460
2 10 2400
7 60 720

How does the query work exactly?

How does the query above work exactly? It’s actually pretty simple. First, the subquery (which is basically a derived table here, named TopOrderAmountsPerSalesperson) returns the orders with the highest dollar amounts per salesperson, and the associated salesperson ID. So, now we have each salesperson’s highest valued order and his/her ID in a derived table. That derived table (the results from the subquery) is then joined with the entire Orders table on the condition that the salesperson ID matches and that the Amount from the Orders table matches the MaxOrder amount returned from the derived table. What’s the point of this? Well, that join will give us the correct OrderNumber since it is matching on both the salesperson ID and the amount. Even if there are 2 rows with the same exact salesperson ID and amount it will not even matter because no matter which ordernumber is associated with that row, the result set will be exactly the same.

And remember that the whole reason we are doing this is to avoid the original problem with not being able to select a non-aggregated column with a group by.

Now, retrieving the salesperson name is simple. Try to figure it out on your own.

Here is how we retrieve the salesperson name – we just use another join with the Salesperson table and select the Name:

SELECT salesperson_id, Name, 
Orders.Number AS OrderNumber, Orders.Amount
FROM Orders
JOIN Salesperson 
ON Salesperson.ID = Orders.salesperson_id
JOIN (
SELECT salesperson_id, MAX( Amount ) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) AS TopOrderAmountsPerSalesperson
USING ( salesperson_id ) 
WHERE Amount = MaxOrder

Running the query above returns this:

salesperson_id Name OrderNumber Amount
1 Abe 30 460
2 Bob 10 2400
7 Dan 60 720
8 Ken 20 1800

And, finally we have our answer! But one last thing – let’s check for corner cases. What would happen if we add one more row to the table where a given salesperson has 2 or more orders that have the same value for the highest amount? For example, let’s add this row to the Orders table:


Number order_date cust_id salesperson_id Amount
80 02/19/94 7 2 2400

This now means that the salesperson with an ID of 2 has 2 orders with an amount of 2400 in the Orders table. And, if we run the SQL above again, we will get this as a result (note the extra row for Bob):


salesperson_id Name OrderNumber Amount
1 Abe 30 460
2 Bob 40 2400
7 Dan 60 720
8 Ken 20 1800
2 Bob 80 2400

Now, the question is if we only want one of Bob’s orders to show up, how can we eliminate the duplicate? Again, try to figure this out on your own before reading our answer.

Well, we could add a GROUP BY salesperson_id, Amount to the end of the query, which would create separate groups for each unique combination of the salesperson ID and the Amount. This would give us a query that looks like this:

SELECT salesperson_id, Salesperson.Name, 
Number AS OrderNumber, Amount
FROM Orders
JOIN Salesperson 
ON Salesperson.ID = Orders.salesperson_id
JOIN (
SELECT salesperson_id, MAX( Amount ) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) AS TopOrderAmountsPerSalesperson
USING ( salesperson_id ) 
WHERE Amount = MaxOrder
GROUP BY salesperson_id, Amount

Now, running this query even with the duplicate row in the Orders table would return us this:


salesperson_id Name OrderNumber Amount
1 Abe 30 460
2 Bob 40 2400
7 Dan 60 720
8 Ken 20 1800

And that’s it – we are now good to go, and we have a final answer to this difficult interview question! This concludes our series of complex SQL interview questions – hopefully you found them challenging!

Hiring? Job Hunting? Post a JOB or your RESUME on our JOB BOARD >>

Subscribe to our newsletter for more free interview questions.



FOLLOW Varoon Sahgal, Author of ProgrammerInterviewon

  • omar salem

    how is the very last result different from the one before it?
    (Bob) still appears 2 times

    • varoon10

      My mistake! I've removed the last entry for Bob in the last result. Thanks for pointing this out!

  • Yogesh Singh

    Please correct the last result.

  • JerryMie

    I believe the original question asked for the customer to whom the largest order belongs to and not the salesperson's name and therefore the query needs to be modified and maybe look something like this:
    SELECT salesperson_id, Customer.Name AS Customer,
    Number AS OrderNumber, Amount
    FROM Orders
    JOIN Customer
    ON Customer.ID = Orders.cust_id
    JOIN (
    SELECT salesperson_id AS sid, MAX( Amount ) AS MaxOrder
    FROM Orders
    GROUP BY salesperson_id
    ) AS TopOrders
    ON TopOrders.sid = Orders.salesperson_id
    WHERE Orders.Amount = TopOrders.MaxOrder

  • student

    Hi,
    The final query doesn't run in MS SQL Server because not all of the select fields are in the group by clause.
    How can I do the same in MS SQL Server?
    Thank for the helpful site.

  • vhf

    SELECT sp.name, MAX( amount ) , od.number, cst.name

    FROM salesperson AS sp
    INNER JOIN order AS od ON od.salesperson_id = sp.id
    INNER JOIN customer AS cst ON od.cust_id = cst.id
    GROUP BY sp.name

    this would work too, or am I wrong?

    • vhf

      never mind, it's explained in the first section why this would not work, sorry!