Lisa Lisa - 2 months ago 14
R Question

Subsetting data frame based on columns

I would like to remove certain rows from data based on values in one column. I have tried a few approaches:

#reads in data
sbc016formants.df <- read.table("file path", sep="\t", header = F, strip.white = T)

# names columns
names(sbc016formants.df) <- c("fileName", "start", "end", "vowelLabel")

# list of values I want to remove
list16 <- c(615.162, 775.885)

# produces a subset of data - removes rows with values from list 16 in the start column
sbc016formants.df <- subset(sbc016formants.df, !start %in% list16)


which produces this error message for some, but not all of my data files:

Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments


I also tried this, based on the second answer in this topic

sbc002formants.df <- sbc002formants.df[ apply(sbc002formants.df, 1 , function(x) any(unlist(x) %in% list2) ) , ]


And this gets rid of some of the items on the list (
list16
), but not all. I wanted to use the first answer, but I don't understand the code (I'm not sure what
bl
is, in the example).

Here is the code to make a reproducible example:

# creates dataframe
fileName <- c("sbc016", "sbc016", "sbc016", "sbc016")
start <- c(1.345, 2.345, 615.162, 775.885)
end <- c(100.345, 200.345, 715.162, 875.885)
sbc016formants.df <- data.frame(fileName, start, end)

# list of what I want to get rid of
list16 <- c(615.162, 775.885)

Answer

Presuming I understand the question correctly, dplyr should be able to do this easily and efficiently.

fileName <- c("sbc016", "sbc016", "sbc016", "sbc016")
start <- c(1.345, 2.345, 615.162, 775.885)
end <- c(100.345, 200.345, 715.162, 875.885)
sbc016formants.df <- data.frame(fileName, start, end)

# list of what I want to get rid of
list16 <- c(615.162, 775.885)

install.packages("dplyr", dependencies = TRUE)
library(dplyr)
sbc016formants.df %>% filter(!start %in% list16)

or

sbc016formants.df %>% filter(start != list16)
Comments