Practice SQL Interview Question #2

 

This question was asked in a Google interview: Given the 2 tables below, User and UserHistory:

User
user_id
name
phone_num
UserHistory
user_id
date
action

1. Write a SQL query that returns the name, phone number and most recent date for any user that has logged in over the last 30 days (you can tell a user has logged in if the action field in UserHistory is set to "logged_on").

Every time a user logs in a new row is inserted into the UserHistory table with user_id, current date and action (where action = "logged_on").

2. Write a SQL query to determine which user_ids in the User table are not contained in the UserHistory table (assume the UserHistory table has a subset of the user_ids in User table). Do not use the SQL MINUS statement. Note: the UserHistory table can have multiple entries for each user_id.

Note that your SQL should be compatible with MySQL 5.0, and avoid using subqueries.

Let’s start with #1 by breaking down the problem into smaller, more manageable problems. Then we can take the pieces and combine them to provide a solution to the overall problem.




Figuring out how to tell whether a user has logged on in the past 30 days seems like a good place to start. We want to see how we can express this in MySQL. You can look online for some Mysql functions that will help with this calculation. MySQL has a "date_sub" function, in which we can pass the current date (as in today’s date) and an interval of 30 days, and it will return us the date 30 days ago from today. Once we have that date, we can compare it with the date in the UserHistory table to see if it falls within the last 30 days. One question that remains is how we will retrieve the current date. This is simple, because MySQL comes built in with a function called curdate() that will return the current date.

So, using the date_sub function, we can come up with this piece of SQL:

UserHistory.date >= date_sub(curdate(), interval 30 day)  

This will check to see that the date in the UserHistory table falls within the last 30 days. Note that we use the ">=" operator to compare dates – in this case, we are simply saying that the date in the UserHistory table is greater than or equal to the date returned from the date_sub function. A date is "greater" than another date when it occurs further in the future than the other date. So, 2007-9-07 will be considered "greater" than 2006-08-19, because 2007-9-07 occurs further in the future than 2006-08-19.

Now, that’s only one piece of the overall problem, so let’s continue. The problem asks us to retrieve the name, phone number, and the most recent date for any user that’s logged in over the last 30 days. We have one table with the user_id and the phone number, but only the other table contains the actual date. Clearly, we will have to do a join on the 2 tables in order to combine the data into a form that will allow us to solve this problem. And since the 2 tables only share one column – the user_id column – it’s clear what common column we will use to join the 2 tables. Doing a join, selecting the required fields, and using the date condition will look like this:

select name, phone_num, date from User, UserHistory 
where User.user_id=UserHistory.user_id 
and UserHistory.date >= date_sub(curdate(), interval 30 day) 




So far, we are selecting the name, phone number, and the date for any user that’s logged in over the last 30 days. But, wait a minute – the problem specifically asks for "the most recent date for any user that’s logged in over the last 30 days." The problem with this is that we could get multiple entries for a user that logged on more than once in the last 30 days. That is not what we want – we want to see the most recent date that someone logged on in the last 30 days – this will return a maximum of 1 entry per user.

Now, the question is how do we get the most recent date? This is quite simple again, as MySQL provides a MAX aggregate function that we can use to find the most recent date. Given a group of dates, the MAX function will return the "maximum" date – which is basically just the most recent date (the one furthest in the future). Because this is an aggregate function, we will have to provide the GROUP BY clause in order to specify what column we would like to use as a ‘container’ of the group of dates. So, now our SQL looks like this:

select User.name, User.phone_num, max(UserHistory.date) 
from User, UserHistory 
where User.user_id = UserHistory.user_id and 
UserHistory.date >= date_sub(curdate(), interval 30 day) 
group by (User.user_id);

Now all we need is to add the condition that checks to see that the user’s action equals "logged_on". So, the final SQL, and the answer to the problem looks like this:

select User.name, User.phone_num, max(UserHistory.date) 
from User, UserHistory 
where User.user_id = UserHistory.user_id 
and UserHistory.action = 'logged_on' 
and UserHistory.date >= date_sub(curdate(), interval 30 day) 
group by (User.user_id);

Phew! We are finally done with question 1, click next to check out the answer to question #2.

Hiring? Job Hunting? Post a JOB or your RESUME on our JOB BOARD >>

Subscribe to our newsletter for more free interview questions.

  • Angela

    why “group by (User.user_id)”? what does this line help with the overall results???

  • Ajit Kumar

    I think, It should be grouped by name and phone number instead of user_id.

  • Sandeep Singh Thukral

    I have a doubt. Are we looking for the latest date when the user logged on, or the latest date or any action of the user? I fear that the SQL given as above will find the latest date when the user logged on. I understand that usually a person would not be logged on for multiple days at a stretch, but its possible. Also, the solution should be correct and not nearly-correct, as a user who logs in usually does at least a few more actions after logging in.

    The solution I Came up was to first identify (via a sub-query) the users who have logged on in the last 30 days. I joined this sub-query to Users and UserHistory to find the max date for any action of such a user.

  • Amr

    Solution in SQL SERVER:

    select U.[name],U.phone_num , max(H.[date]) from [user] U, UserHistory H where U.[user_ID] = H.[user_ID]
    and H.[Action] = ‘logged_on’
    and H.[date]> (GETDATE()-30)
    group by (U.[name]) , (U.Phone_Num)

  • vins

    Solution in SQL:

    SELECT USER1.name,USER1.phone_num,max(UserHistory.date1)
    FROM USER1,UserHistory
    WHERE user1.user_id =UserHistory.user_id
    AND
    userhistory.action=’logged_on’
    AND
    UserHistory.date1>=dateadd(day,-30,getdate())
    GROUP BY user1.name,user1.phone_num

  • nalini

    user is a keyword change the tablename it will work

  • Phoenix

    Hi!!! First and foremost, nice work. Explained in a real lucid manner. Keep it up.
    But I have a question about the columns in select and group by. As far as I know, result set of a select query with group by clause must be:

    an expression used as one of the group by criteria , or …

    an aggregate function , or …

    a literal value

    Most of the databases do not support the syntax that you use (pretty often in your tutorial).
    Is there something that I am missing here, can we do what you have written in your SQL statements?? Please help.

  • deepa

    how to fetch the deptno 10 without using where condition?

  • cromateu

    I am using Microsoft Sql Server Management Studio Express.
    I am getting this error:

    Incorrect syntax near the keyword ‘user’.

  • deepesh

    This SQL will throw error because grouped by columns do not exist in SQL.

    An expression starting with specified in a SELECT clause, HAVING clause, or ORDER BY clause is not specified in the GROUP BY clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause with a column function and no GROUP BY clause is specified. SQLSTATE=42803

  • narendra

    select name, phone_num, date from User, UserHistory
    where User.user_id=UserHistory.user_id
    and UserHistory.date>=DATEADD(DAY,-30, GETDATE())

    this works fine:)

  • Simi

    Can you please provide the solution in SQL

  • Anonymous

    Solution in SQL SERVER:

    select users.name,users.phone_num, MAX(UserHistory.date)
    from
    UserHistory,Users
    where
    UserHistory.User_id=Users.User_id
    group by users.name,users.phone_num
    having MAX(UserHistory.date)>=DATEADD(dd,-30,CONVERT(date,GETDATE()))

  • Kris

    can you please check whether the below solutions are correct or not.

    1. SELECT U.NAME,U.PHONE_NUM,max(H.DATE) AS MOST_RECENT_DATE
    FROM USER U,USERHISTORY H
    WHERE U.USER_ID=H.USER_ID
    AND H.ACTION = 'logged_on'
    AND TRUNC(H.DATE) BETWEEN TO_DATE('SYSDATE-30','DD-MM-YYYY') AND TO_DATE('SYSDATE','DD-MM-YYYY')
    GROUP BY U.NAME,U.PHONE_NUM

    INSERT INTO USERHISTORY(USER_ID,DATE,ACTION) (SELECT USER_ID,TO_CHAR(DATE,'YYYY-DD'), ACTION FROM USERHISTORY WHEN H.ACTION = 'logged_on')

    2. SELECT DISTINCT U.USER_ID FROM USER U WHERE NOT EXIST ( SELECT 1 FROM H.USERHISTORY H WHERE U.USER=H.USER );