Rob Rob - 2 months ago 7
R Question

Loop through df column, comparing to list and creating new column

I have a column of numbers, like social security numbers for example. I would like to compare this column to a list of unacceptable values ( like

11111111
or
12345678
for example). There also some grepl operations i would like to perform, like the first 3 digits can't be
000
. Below is a skeleton of what I think the code could look like, I prefer a for loop logic.

ssns <- c(12343210,23454321,34565432,11111111)
badssns <- c(11111111,22222222)

for( i in 1:length(ssns)) {
if(ssns[i] %in% badssn_list) {
ssns$newcolumn==BADSSN
}
else if( grepl(first 3 numbers 0){
ssns$newcolumn==BADSSN
}
else{ssns$newcolumn==GOODSSN}
}

Answer

Just using a nested ifelse should do the job imo:

ssns$newcolumn <- ifelse(ssns$num %in% badssns, 'BADSSN', 
                         ifelse(substr(ssns$num,1,3)=='000', 'BADSSN', 'GOODSSN'))

or shorter using an OR statement (|):

ssns$newcolumn <- ifelse(ssns$num %in% badssns| substr(ssns$num,1,3)=='000', 'BADSSN', 'GOODSSN')

which gives:

> ssns
       num newcolumn
1 12343210   GOODSSN
2 23454321   GOODSSN
3 34565432   GOODSSN
4 11111111    BADSSN
5 00065432    BADSSN

Used data:

ssns <- data.frame(num = c('12343210','23454321','34565432','11111111','00065432'), stringsAsFactors = FALSE)
badssns <- c('11111111','22222222')