Practice SQL Interview Question #2
This question was asked in a Google interview: Given the 2 tables below, User and UserHistory:
1. Write a SQL query that returns the name, phone number and most recent date for any user that has logged in over the last 30 days (you can tell a user has logged in if the action field in UserHistory is set to "logged_on").
Every time a user logs in a new row is inserted into the UserHistory table with user_id, current date and action (where action = "logged_on").
2. Write a SQL query to determine which user_ids in the User table are not contained in the UserHistory table (assume the UserHistory table has a subset of the user_ids in User table). Do not use the SQL MINUS statement. Note: the UserHistory table can have multiple entries for each user_id.
Note that your SQL should be compatible with MySQL 5.0, and avoid using subqueries.
Let’s start with #1 by breaking down the problem into smaller, more manageable problems. Then we can take the pieces and combine them to provide a solution to the overall problem.
Figuring out how to tell whether a user has logged on in the past 30 days seems like a good place to start. We want to see how we can express this in MySQL. You can look online for some Mysql functions that will help with this calculation. MySQL has a "date_sub" function, in which we can pass the current date (as in today’s date) and an interval of 30 days, and it will return us the date 30 days ago from today. Once we have that date, we can compare it with the date in the UserHistory table to see if it falls within the last 30 days. One question that remains is how we will retrieve the current date. This is simple, because MySQL comes built in with a function called curdate() that will return the current date.
So, using the date_sub function, we can come up with this piece of SQL:
This will check to see that the date in the UserHistory table falls within the last 30 days. Note that we use the ">=" operator to compare dates – in this case, we are simply saying that the date in the UserHistory table is greater than or equal to the date returned from the date_sub function. A date is "greater" than another date when it occurs further in the future than the other date. So, 2007-9-07 will be considered "greater" than 2006-08-19, because 2007-9-07 occurs further in the future than 2006-08-19.
Now, that’s only one piece of the overall problem, so let’s continue. The problem asks us to retrieve the name, phone number, and the most recent date for any user that’s logged in over the last 30 days. We have one table with the user_id and the phone number, but only the other table contains the actual date. Clearly, we will have to do a join on the 2 tables in order to combine the data into a form that will allow us to solve this problem. And since the 2 tables only share one column – the user_id column – it’s clear what common column we will use to join the 2 tables. Doing a join, selecting the required fields, and using the date condition will look like this:
So far, we are selecting the name, phone number, and the date for any user that’s logged in over the last 30 days. But, wait a minute – the problem specifically asks for "the most recent date for any user that’s logged in over the last 30 days." The problem with this is that we could get multiple entries for a user that logged on more than once in the last 30 days. That is not what we want – we want to see the most recent date that someone logged on in the last 30 days – this will return a maximum of 1 entry per user.
Now, the question is how do we get the most recent date? This is quite simple again, as MySQL provides a MAX aggregate function that we can use to find the most recent date. Given a group of dates, the MAX function will return the "maximum" date – which is basically just the most recent date (the one furthest in the future). Because this is an aggregate function, we will have to provide the GROUP BY clause in order to specify what column we would like to use as a ‘container’ of the group of dates. So, now our SQL looks like this:
Now all we need is to add the condition that checks to see that the user’s action equals "logged_on". So, the final SQL, and the answer to the problem looks like this:
Phew! We are finally done with question 1, click next to check out the answer to question #2.