Rprogr Rprogr - 3 months ago 14
R Question

How to extract everything before the first space?

charvct <- c("amc rebel sst","amc ambassador dpl","amc hornet","amc gremlin" ,"amc 1212")


is my vector.

I want to get result as

"amc","amc","amc","amc","amc".


My Code is :

y <- gsub("amc*[A-z][0-9]","amc",charvct)


But output is same as input.

Answer

We can match the 'amc' followed by word boundary (\\b)) followed by zero or more (*) alpha numeric characters along with the space ([[:alnum:] ]) and replace it with "amc"

sub("amc\\b[[:alnum:] ]*","amc", charvct)
#[1] "amc" "amc" "amc" "amc" "amc"

Or capture 'amc' as a group ((amc)) and in the replacement we provide the backreference (\\1)

sub("(amc)\\b[[:alnum:] ]*","\\1", charvct)

Based on the vector in the comments, we match one or more punctuation characters along with space ([[:punct:] ]+) followed by characters until the end of the string (.*) and replace it with blank ("").

sub("[[:punct:] ]+.*", '', v1)
#[1] "amc" "bcd" "xyz" "amc" "amc" "dfz"

data

v1 <- c("amc rebel sst","bcd ambassador dpl","xyz hornet",
                             "amc gremlin" ,"amc 1212(a)" ,"dfz+2")