giacomoV giacomoV - 2 months ago 6
R Question

R - grep remove UPPER case rows

I would like to remove all the rows containing UPPERCASE words.

My data looks like this :

dt
1 TRAVEL AND UNSPECIFIED TIME USE
2 TRAVEL BY PURPOSE
3 Travel related to unspecified time use
4 Travel related to personal business


I don't understand why this isn't working

dt[-c(grep('[A-Z]', dt$dt)) , ]


Because, strangely it works when I generate random data on
mtcars
like this :

l = sample( c(letters[1:16], LETTERS[1:16]) )
mtcars$code = l
mtcars[-c( grep('[A-Z]', mtcars$code) ) , ]


Can someone help me ?

dt = c("TRAVEL AND UNSPECIFIED TIME USE",
"TRAVEL BY PURPOSE",
"Travel related to unspecified time use",
"Travel related to personal business")
dt = as.data.frame(dt)
dt$dt = as.character(dt$dt)

Answer

In addition to capital letteres, there is also space, so we can match one or more capital letters including space ([A-Z ]+) from start (^) of string to end ($) in the grepl, and negate (!) to return elements that includes lower-case or lower-case with upper case (mixed) or all other possibilities.

dt[!grepl("^[A-Z ]+$",dt$dt),, drop = FALSE]
#                                   dt
#3 Travel related to unspecified time use
#4    Travel related to personal business

In the OP's other example 'l', there is only a single character per string. So, using [A-Z] works, however, it is better not to use -. For example, suppose we have a vector with all the elements in lower-case

v1 <- c('a', 'aB', 'b')
v1[-grep("^[A-Z]+$", v1)]
#character(0)

as

grep("^[A-Z]+$", v1)
#integer(0)

However, negating (!) will get the expected output

 v1[!grepl("^[A-Z]+$", v1)]
 #[1] "a"  "aB" "b"