I have a string (fasta format)
a = ">atttaggacctta\nattgtcggta\n>ccattnnnn\ncccatt\n>ttaggccta"
unlist(strsplit(a, "(?<=>)", perl=T))
Your regex only contains a lookbehind that matches any empty location after a
>, see your regex demo. The engine processes a string from left to right, checks if there is a
> to the left of the current location, and then returns a valid empty string match if
< is found.
You may use
> res <- unlist(strsplit(a, "(?<=[^>])(?=>)", perl=T)) > res  ">atttaggacctta\nattgtcggta\n" ">ccattnnnn\ncccatt\n"  ">ttaggccta" > gsub("\n", "", res, fixed=TRUE)  ">atttaggaccttaattgtcggta" ">ccattnnnncccatt"  ">ttaggccta"
The pattern matches a location that is preceded with a non-
> char and is followed with
Note that using a lookbehind pattern only with
strsplit often leads to unexpected behavior. See Why does strsplit use positive lookahead and lookbehind assertion matches differently?