PigWolf PigWolf - 3 months ago 10
R Question

String matching in R to return "does not contain" results

I would like to drop all rows in my data frame where column A contain either a "_" or "(" or ")" and where column D does not contain "InService".

I believe you can use grep or grepl for the contains section but I don't know how to tie it into my expression below.

Atoll <- read.csv("ATOLL_TABLE20160803_084232.csv")
AtollInService <- Atoll[(Atoll$MILESTONE=="InService" & grep(???)),]


Below is an example of the excel file I am importing, please note I've hidden a few columns since the data is spread across many fields.

NOMINAL_ID MILESTONE
WW_4752 (MD) Planned
WW_4752 (MD) Planned
WW_4752 (MD) Planned
LX0022 (OZ) Planned
LX0022 (OZ) Planned
LX0023 InService
LX0023 InService
LX0023 InService
LX0023 InService
LX0023 InService
LX0023 InService


And below is what I am looking to achieve:

NOMINAL_ID MILESTONE
LX0023 InService
LX0023 InService
LX0023 InService
LX0023 InService
LX0023 InService
LX0023 InService

Answer

Use grepl; see ?grepl.

Atoll <- read.csv("ATOLL.csv")

Atoll_filtered <- 
  with(Atoll, Atoll[grepl("[_()]", NOMINAL_ID) & 
                      !grepl("InService", MILESTONE), ])

nrow(Atoll)
# [1] 65

nrow(Atoll_filtered)
# [1] 36

head(Atoll_filtered)
#      NOMINAL_ID MILESTONE
# 1 WW_4664 (KNP)   Planned
# 2 WW_4664 (KNP)   Planned
# 3 WW_4664 (KNP)   Planned
# 4 WW_4664 (KNP)   Planned
# 5 WW_4664 (KNP)   Planned
# 9 WW_4665 (KNP)   Planned