Rob Rob - 10 months ago 54
R Question

Looping through a dataset, comparing values to another dataset

I have two datasets. One(DFlogin) contains a set of user ids, and two digit "zip codes" they attempted to log in from. The other(Dfrecords) contains a list of acceptable "zip codes" for the user to log in from.

For a user, they could have any number of attempted log in zip codes, and any number of acceptable, record zip codes.

The goal is to loop through the rows of the DFlogin dataset, and compare each users attempted logins to all of the acceptable logins for the specific user.

So user 1 can login in zipcode 34 only, but has logged in from zip code 21. This user should be flagged in a new column (bad_login).

zipcode2<-c(NA, 34, 32)


recordzipcode2<-c(NA, 34, 32)
recordzipcode3<-c(NA, 21,61)

DFrecords<-data.frame(userid, recordzipcode1,recordzipcode2, recordzipcode3)

I would guess the solution could work with a couple loops and an if statement, but I am not really sure where to start.

Answer Source

You can apply row-wise on DFlogin.

Find the matching ID in DFrecords table. Select only the columns that have zip code values (2:4). Remove the NA values. Check if all the values i.e logins are from the acceptable zip codes.

DFlogin$bad_login <- apply(DFlogin, 1, function(x)  {
      x1 = DFrecords[match(x[1], DFrecords$userid),2:4] 
#2:4 are the columns having zipcodes in DFrecords
      y = x1[!]
      as.integer(!all(x[2:3] %in% y))
#2:3 are the columns having zipcodes in DFlogin

#[1] 1 0 1