lll lll - 3 months ago 14
R Question

R: how to remove \n and <br /> from text data

I have a text data like the following:

hold that\nagainst me. i spend most of my days trying to build cool stuff for\nmy company. <br />\n<br />\ni'm an entrepreneur (like everyone else in sf, it seems) and i love\nwhat i do.


I used the following command but it has only removed \n and still remains.

gsub("\n <br />", " ", h)


and when I tried this command, both of the characters are removed but the "re" in the actual text data is also removed. So, I am wondering what is a proper way to remove both.

gsub("[\n <br />]", " ", h)

Answer
  text <- "hold that\nagainst me. i spend most of my days trying to build cool stuff for\nmy company. <br />\n<br />\ni'm an entrepreneur (like everyone else in sf, it seems) and i love\nwhat i do."

Use (pat1|pat2) to select different patterns ([] only works for single-character alternatives).

 gsub("(\n|<br />)"," ",text)
 ## [1] "hold that against me. i spend most of my days trying to build cool stuff for my company.     i'm an entrepreneur (like everyone else in sf, it seems) and i love what i do."