nhoff nhoff - 1 month ago 19
R Question

Removing certain regular expressions in r

I have a character string in which I would like to only remove the line breaks followed immediately by a lowercase letter. For example, my string might contain:


one line of text \r\n another line \r\nof text,


which would show up as:


one line of text

another line

of text.


In this example, I would only want to remove the second line break, so that the text would then read:


one line of text

another line of text


I know that the pattern is "\r\n[a-z]", and so the code should be something like

gsub("\r\n[a-z]","")


but I cannot come up with code that removes the line break while retaining the lowercase letter.

Thanks!

Answer

We can use a regex lookaround

txtN <- gsub("\r\n(?=[a-z])", "", txt, perl = TRUE)
cat(txtN, sep="\n")
# one line of text 
# another line of text,