Meesha - 1 year ago 118
R Question

# Remove rows in R using data.table

df:-

``````Date    Name  Salary
Q1 2015 ABC   \$10
Q2 2015 ABC   \$11
Q3 2015 ABC   \$15
Q1 2015 XYZ   \$25
Q2 2015 XYZ   \$20
``````

I want to remove the rows from the data whose total frequency is less than 3. For e.g. XYZ have a frequency of 2 and so I want to remove row 4 and 5.

``````test <- setDT(df)[,.I[.N>2],by=Name]
``````

Output:-

``````> test
Name V1
1:  ABC  1
2:  ABC  2
3:  ABC  3
``````

Filtering is done correctly but I don't get the whole data set, I only get the Name column in the output.

We need to extract the 'V1' column and use it as row index in 'i' to subset the rows.

``````setDT(df)[df[,.I[.N>2],by=Name]\$V1]
#       Date Name Salary
#1: Q1 2015  ABC    \$10
#2: Q2 2015  ABC    \$11
#3: Q3 2015  ABC    \$15
``````

Or a concise option with `if` and `.SD`

``````setDT(df)[, if(.N >2) .SD, by = Name]
#    Name    Date Salary
#1:  ABC Q1 2015    \$10
#2:  ABC Q2 2015    \$11
#3:  ABC Q3 2015    \$15
``````

Just in case, if we need a `dplyr` method

``````library(dplyr)
df %>%
group_by(Name) %>%
filter(n() >2 )
#      Date  Name Salary
#     <chr> <chr>  <chr>
#1 Q1 2015   ABC    \$10
#2 Q2 2015   ABC    \$11
#3 Q3 2015   ABC    \$15
``````

Or with `base R`, we can have a number of options, one with `ave`

``````df[with(df, ave(seq_along(Name), Name, FUN = length)>2),]
``````

or using `table`

``````tbl <- table(df\$Name)> 2
subset(df, Name %in% names(tbl)[tbl])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download