madmaxthc madmaxthc - 2 days ago 5
R Question

"for" loop not working

I am trying to isolate some values from a data frame
example:

test_df0<- data.frame('col1'= c('string1', 'string2', 'string1'),
'col2' = c('value1', 'value2', 'value3'),
'col3' = c('string3', 'string4', 'string3'))


I want to obtain a new dataframe with only unique strings from col1, and the relevant strings from col3 (which will be identical for rows with identical col1.
This is the loop I wrote, but I must be doing some blunt mistake:

test_df1<- as.data.frame(matrix(ncol= 2, nrow=0))
colnames(test_df1)<- c('col1', 'col3')
for (i in unique(test_df0$col1)){
first_matching_row<- match(x = i, table = test_df0$col1)
temp_df<-
data.frame('col1'= i,
'col3'= test_df0[first_matching_row, 'col3'])
rbind(test_df1, temp_df)}


The resulting test_df1 though is empty. Cannot spot the mistake with the loop, I would be grateful for any suggestion.

Edit: the for loop is working, if its last line is
print(temp_df)
instead of the rbind command, I get the correct results. I am not sure why the rbind is not working

Answer

An easier and faster way to do with is with the use of the duplicated() function. duplicated() looks through and input vector and returns TRUE if that value has been seen at an earlier index in the vector. For example:

> duplicated(c(0,0,0,1,2,3,0,3))
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE

Because for the first value of 0 it hadn't seen one before, but for the next two it had. The for 1, 2, and the first 3 it hadn't seen those numbers before, but it it had seen the last two numbers 0 and 3 previously. This means that !duplicated() will return TRUE for the unique values of the data.

We can use this to index into the data frame to get the rows of test_df0 with unique values of col1 as follows:

test_df0[!duplicated(test_df0[["col1"]]), ]

But this returns all columns of the data frame. If we just want col1 and col3 we can index into the columns as well using:

test_df0[!duplicated(test_df0[["col1"]]), c("col1", "col3")]

As for why the loop isn't working, as @Jacob mentions, you aren't assigning the value you are creating with rbind to a value, so the value you create disappears after the function call.

Comments