I have several years worth of data on individuals, but their names are formatted differently each year. Half of the names are already in "First Last" order but I can't figure out how to successfully edit the other half ("Last, First").
Here's a sample df:
name <- c("First1 Last1","Last2, First2", "Last3, First3", "First4 Last4", "First5 Last5")
salary <-c(51000, 72000,125000,67000,155000)
df <- data.frame(name, salary, year, stringsAsFactors=FALSE)
df$name2 <- strsplit(df$name, ", ") #to split the character string by comma
df$name3 <-paste(df$name2, collapse=" ") #to collapse the newly created vectors back into a string
df$name4 <-paste(rev(df$name2)) #to try pasting each vector in reverse order
df$name5 <-paste(rev(df$name2)[2:1]) #trying again...
You can use a regular expression:
df$name <- sub("(L[A-Za-z0-9]+).*\\s+(F[A-Za-z0-9]+).*","\\2 \\1",df$name) # df # name salary year # 1 First1 Last1 51000 2012 # 2 First2 Last2 72000 2014 # 3 First3 Last3 125000 2013 # 4 First4 Last4 67000 2013 # 5 First5 Last5 155000 2014
The code looks for a word beginning with an uppercase L, followed by some letters / digits, then by some symbols, a space, then a word beginnign with an uppercase F, some letters / digits and then some symbols.
It then reorders the two words by putting first the one beginning with an F (that is,
(F[A-Za-z0-9]+)), then the one beginning with an L ( that is,
As you can see, the code removes the comma (it seems to be your desired output).
With the new info, use the code :
df$name <- sub('(.*)\\,\\s+(.*)','\\2 \\1', df$name) # sub('(.*)\\,\\s+(.*)','\\2 \\1',name) #  "John Smith" "Marcus Green" "Mario Sanchez" "Jennifer Roberts" "Sammy Lee"
Here, we are looking for characters before a comma, followed by a space and then by other characters. We then reorder the first and the second group to have the desired output.
Note: I assumed that if there is no comma, then the names are already in the correct order (that seems to be the case in your comment).