Advanced SQL Interview Questions and Answers Part 2 - Programmer and Software Interview Questions and Answers

This is part 2 of our advanced practice SQL interview questions and answers. We highly suggest that you read part 1 of our Advanced SQL interview Questions before reading this, since a lot of the concepts presented in this portion are discussed in more depth in part 1.

The problem is based on the tables presented below where salespeople have orders with certain customers that are in the Customers table.

Salesperson

Customer

ID	Name	Age	Salary
1	Abe	61	140000
2	Bob	34	44000
5	Chris	34	40000
7	Dan	41	52000
8	Ken	57	115000
11	Joe	38	38000

ID	Name	City	Industry Type
4	Samsonic	pleasant	J
6	Panasung	oaktown	J
7	Samony	jackson	B
9	Orange	Jackson	B

Orders

Number	order_date	cust_id	salesperson_id	Amount
10	8/2/96	4	2	2400
20	1/30/99	4	8	1800
30	7/14/95	9	1	460
40	1/29/98	7	2	540
50	2/3/98	6	7	600
60	3/2/98	6	7	720
70	5/6/98	9	7	150

In the tables above, each order in the Orders table is associated with a given Customer through the cust_id foreign key column that references the ID column in the Customer table.

Here is the problem: find the largest order amount for each salesperson and the associated order number, along with the customer to whom that order belongs to. You can present your answer in any database’s SQL – MySQL, Microsoft SQL Server, Oracle, etc.

The answer to the problem and explanation

This question seems to be quite simple – but as you will soon find out it is deceptively complex. For each salesperson, all we need to retrieve is the largest order amount, and the associated order number. In order to retrieve that information we should be able to simply do a join between the Orders and Salesperson tables wherever the Salesperson.ID is equal to the Orders.salesperson_id (this would be our join predicate). Then, we could group the results of that join by the Orders.salesperson_id column and retrieve both the highest valued order (by using max(Amount)), and the associated Order number.

Let’s say that we choose to write our answer in MySQL. In MySQL we could legally write some code that looks like this:

SELECT Orders.Number, max(Amount) 
FROM Orders JOIN Salesperson 
ON Salesperson.ID = Orders.salesperson_id 
GROUP BY Orders.salesperson_id

And, if we run that code above it will return this as a result:

Number	      max(Amount)
30	        460
10	        2400
50	        720
20	        1800

The problem with the data returned in MySQL

But, there’s a problem with the results returned from running the SQL above, and it should be fairly obvious once you actually look at the data in the tables we have above. Here is the problem: Order number 50 does not have an amount of “720” – that amount actually belongs to order number 60. So, what is going on here? Why do the results return an order number that is not even in the same row as the max(Amount) of 720? And why are all of the other results correct?

Understanding the group by statement is critical

Well, we will have to explain a bit more about what’s going on with the group by. If you already read Part 1 of the advanced SQL interview questions then you should understand exactly what the problem is with the SQL above, and you can safely skip down to the section that says “New approach to the problem – start with a subquery”. Unless you want to reinforce the concepts presented in part 1, in which case we highly recommend that you read this entire explanation to this rather difficult interview question.

When we group by the salesperson ID, there will be one group created for each and every salesperson ID. So, there will be 6 groups created – 1 for ID of 1, another for ID of 2, and others for ID’s 5, 7, 8, and 11. Inside those groups will be any rows that share the same salesperson ID values.

When we select the max(Amount), MySQL will simply look for the highest value for Amount within each group and return that value. And when we select Orders.Number, MySQL is not going to return every Orders.Number value from each group – it is only going to select one value from each group.

Subscribe to our newsletter for more free interview questions.

Our SQL is not specific enough

But, the question is which order number should be returned from each group? Each group can potentially have more than just one order number as long as there are more than one rows belonging to the group. And that is the exact problem – the SQL that we wrote is not specific enough, and MySQL will just arbitrarily/randomly return one of the values of the Orders.Number within each group. In this case, because order number 50 is part of the group created by the salesperson_id’s of 7, it will return 50. MySQL could just as well have returned order numbers 60 or 70 – the point is that it just randomly chooses one order number from each group. For the group created by salesperson ID of 2, the fact that the order number 10 is chosen (order number 10 corresponds to the largest order amount of 2400) is just pure coincidence – MySQL could have returned us order number 40, which is also a part of the same group as salesperson ID of 2.

Most relational database implementations would have thrown an error if we tried to run the SQL above because the results are potentially arbitrary, as we just illustrated. MySQL is the exception, because it allows us to run the SQL above error-free, but as we illustrated the data returned could potentially not make any sense. Be sure to read Part 1 of the advanced SQL interview questions for more details on why.

Well, now we know that there is definitely an issue with the SQL above, so how can we write a good query that would give us exactly what we want – along with the correct order number?

New approach to the problem – start with a subquery

Now let’s instead just try to break the problem down into more manageable pieces – starting with a simple subquery. Here is a subquery to get the highest valued order for each salesperson:

SELECT salesperson_id, MAX(Amount) AS MaxOrder
FROM Orders
GROUP BY salesperson_id

Running the query above will return this:



                
                   salesperson_id 
                  MaxOrder 
                
                
                   1
                   460
                
                
                   2 
                   2400 
                  
                
                
                   7 
                   720 
                
                
                   8
                   1800

The query above gives us the salesperson_id and that salesperson’s associated highest order amount – but it still does not give us the order number associated with the highest order amount. So, how can we find the order number as well?

Clearly we need to do something else with the subquery we have above that will also give us the correct order number. What are our options? Try to come up with an answer on your own before reading on.

Well, we can do a join with the results of the subquery above. But, on what condition should our join be done and what exactly should we be joining the subquery above with?

What if we join our subquery above with data from the Orders table, where the join is done on the basis that the salesperson_id matches, AND that the value in the Order table’s Amount column is equal to the amount (MaxOrder) returned from the subquery? This way, we can match up the correct Order Number with the correct corresponding value for the maximum Order Amount for a given salesperson_id.

With that in mind, we can write this query:

select salesperson_id, Number as OrderNum, Amount from Orders 
JOIN (  -- this is our subquery from above:
SELECT salesperson_id, MAX(Amount) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) as TopOrderAmountsPerSalesperson
USING (salesperson_id)
 where Amount = MaxOrder

Running the query above returns us this:


                
                   salesperson_id 
                   OrderNum 
                   Amount 
                
                
                   8
                   20
                   1800
                 
                
                
                   1 
                   30 
                   460 
                
                
                   2 
                   10 
                   2400 
                
                
                   7
                   60 
                   720

How does the query work exactly?

How does the query above work exactly? It’s actually pretty simple. First, the subquery (which is basically a derived table here, named TopOrderAmountsPerSalesperson) returns the orders with the highest dollar amounts per salesperson, and the associated salesperson ID. So, now we have each salesperson’s highest valued order and his/her ID in a derived table. That derived table (the results from the subquery) is then joined with the entire Orders table on the condition that the salesperson ID matches and that the Amount from the Orders table matches the MaxOrder amount returned from the derived table. What’s the point of this? Well, that join will give us the correct OrderNumber since it is matching on both the salesperson ID and the amount. Even if there are 2 rows with the same exact salesperson ID and amount it will not even matter because no matter which ordernumber is associated with that row, the result set will be exactly the same.

And remember that the whole reason we are doing this is to avoid the original problem with not being able to select a non-aggregated column with a group by.

Now, retrieving the salesperson name is simple. Try to figure it out on your own.

Here is how we retrieve the salesperson name – we just use another join with the Salesperson table and select the Name:

SELECT salesperson_id, Name, 
Orders.Number AS OrderNumber, Orders.Amount
FROM Orders
JOIN Salesperson 
ON Salesperson.ID = Orders.salesperson_id
JOIN (
SELECT salesperson_id, MAX( Amount ) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) AS TopOrderAmountsPerSalesperson
USING ( salesperson_id ) 
WHERE Amount = MaxOrder

Running the query above returns this:


                
                   salesperson_id 
                   Name

                   OrderNumber 
                   Amount 
                
              
                
                   1 
                   Abe 

                   30 
                   460 
                
                
                   2 
                   Bob 

                   10 
                   2400 
                
                
                   7
                   Dan 

                   60 
                   720 
                
 
                   8
                   Ken 

                   20 
                   1800

And, finally we have our answer! But one last thing – let’s check for corner cases. What would happen if we add one more row to the table where a given salesperson has 2 or more orders that have the same value for the highest amount? For example, let’s add this row to the Orders table:



                
                   Number 
                   order_date
                 cust_id
                   salesperson_id 
                   Amount

                
                
                   80
                   02/19/94 
                   7
                   2
                   2400

This now means that the salesperson with an ID of 2 has 2 orders with an amount of 2400 in the Orders table. And, if we run the SQL above again, we will get this as a result (note the extra row for Bob):



                
                   salesperson_id 
                   Name
                   OrderNumber 
                   Amount 
                
            
                
                   1 
                   Abe 

                   30 
                   460 
                
                
                   2 
                   Bob 

                   40 
                   2400 
                
                
                   7
                   Dan 

                   60 
                   720 
                
 
                   8
                   Ken 

                   20 
                   1800 
                

                   2
                   Bob 
                   80 
                   2400

Now, the question is if we only want one of Bob’s orders to show up, how can we eliminate the duplicate?
Again, try to figure this out on your own before reading our answer.

Well, we could add a GROUP BY salesperson_id, Amount to the end of the query, which would create separate groups for each unique combination of the salesperson ID and the Amount. This would give us a query that looks like this:

SELECT salesperson_id, Salesperson.Name, 
Number AS OrderNumber, Amount
FROM Orders
JOIN Salesperson 
ON Salesperson.ID = Orders.salesperson_id
JOIN (
SELECT salesperson_id, MAX( Amount ) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) AS TopOrderAmountsPerSalesperson
USING ( salesperson_id ) 
WHERE Amount = MaxOrder
GROUP BY salesperson_id, Amount

Now, running this query even with the duplicate row in the Orders table would return us this:



                
                   salesperson_id 
                   Name
                   OrderNumber 
                   Amount 
                
               
                
                   1 
                   Abe 

                   30 
                   460 
                
                
                   2 
                   Bob 

                   40 
                   2400 
                
                
                   7
                   Dan 

                   60 
                   720 
                
 
                   8
                   Ken 

                   20 
                   1800

And that’s it – we are now good to go, and we have a final answer to this difficult interview question! This concludes our series of complex SQL interview questions – hopefully you found them challenging!

Hiring? Job Hunting? Post a JOB or your RESUME on our JOB BOARD >>

Subscribe to our newsletter for more free interview questions.

Follow @programmerintvw

Follow Us

Pages

DFP-300×250-1

Newsletter Subscription

Jobboard