user1883050 user1883050 - 1 month ago 8
R Question

consolidate columns that may or may not have overlapping data, using R

Let's say you have a data set that looks like this:

Vietnam Gulf War Iraq War
veteran1 1 0 0
veteran2 0 1 0
veteran3 0 0 1
veteran4 0 1 1 # <---- Note this row


You want to consolidate these columns without affecting other columns in the dataframe like so:

Service
veteran1 1
veteran2 2
veteran3 3
veteran4 2 # <---- Note this row


Where


  • 1 = Vietnam
    ,
    2 = Gulf War
    ,
    3 = Iraq War

  • If a veteran has served 2 or more it should pick only one (as is the case with
    veteran4
    where it picked their left-most column)

  • there are many other columns in the dataframe, and they shouldn't be affected by any of this



Questions:

How would you do this in
R
?


(Note: if it's easier to do in some other free open source program, please feel free to share which program and how you would do it. This is a massive dataset: 3 million rows, the American Community Survey.)

dmg dmg
Answer

Looking at your data, it seems a simple issue of:

if Vietnam > 0, then use 1, otherwise if Gulf war > 0 then 2, otherwise if Iraq > 0 then 3, else 0

vietnam = c(1, 0, 0,0) 
gulfwar = c(0,1,0,1)
iraq = c(0,0,1,1)
df = data.frame(vietnam, gulfwar, iraq) 
df$service <- ifelse(df$vietnam > 0,1,ifelse(df$gulfwar>0,2,ifelse(df$iraq>0,3,0)))
df

Result:

       vietnam gulfwar iraq service
  1       1       0    0       1
  2       0       1    0       2
  3       0       0    1       3
  4       0       1    1       2