gabx gabx - 1 month ago 8
R Question

R: get dataframe row with specific characters

I need to detect rows of a df/tibble containing a specific sequence of characters.

seq <- "RT @AventusSystems"
is my sequence

df <- structure(list(text = c("@AventusSystems Wow, what a upgrade from help of investor",
"RT @AventusSystems: A recent article about our investors as shown in Forbes! t.co/n8oGwiEDpu #Aventus #GlobalAdvisors #4thefans #Ti…",
"@AventusSystems Very nice to have this project", "RT @AventusSystems: Join the #TicketRevolution with #Aventus today! #Aventus #TicketRevolution #AventCoin #4thefans t.co/OPlyCFmW4a"
), Tweet_Id = c("898359464444559360", "898359342952439809", "898359326552633345",
"898359268226736128"), created_at = structure(c(17396, 17396,
17396, 17396), class = "Date")), .Names = c("text", "Tweet_Id",
"created_at"), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))

select(df, contains(seq))
# A tibble: 4 x 0


sapply(df$text, grepl, seq)
return only 4 FALSE

What do I wrong? What is the correct solution?
Thank you for help

Answer Source

First, grepl is already vectorized over its argument x, so you don't need sapply. You could just do grepl(seq, df$text).

Why your code doesn't work is that sapply passes each element of the X argument to the function in FUN argument as the first argument (so you are looking for the search pattern "@AventusSystems Wow, what a upgrade from help of investor", etc. in your seq object.

Lastly, dplyr::select selects columns, whereas you want to use dplyr::filter, which filters rows.