Jazzmine Jazzmine - 3 months ago 6
SQL Question

R how to find how many users only have one value

I have a dataframe called user_role where there is a key value (userid). I want to see how many of these user ids only have one value for a categorical variable called role.

Here's some sample data:

userid role
ame1 programmer
ame1 programmer
ame1 analyst
cal5 programmer
cal5 programmer
up2 sales


I would like cal5 to be in the output dataset because there was only one role but ame1 should not be in the output data frame since it had more than one role value.

I've tried something like this:

("select userid, role, count(*) from user_role group by userid, role having count(*) == 1")


but only get users with 1 row or role value.

("select userid, role, count(*) from user_role group by userid, role having count(*) > 1")


doesn't get me only those with one role value even though they have multiple rows

I want to have both cal5 and up2 show up in the output. Ideally, I would like to be able to have a clause where I only include users with 2 or more rows.

Thanks

Answer

Use count with distinct to limit the results to only those users with a single role. Then to check for multiple rows, use count(*) > 1:

select userid, min(role) as role
from user_role  
group by userid
having count(*) > 1 and count(distinct role) = 1