Dmitry Leykin Dmitry Leykin - 1 year ago 39
R Question

capture repetition of letters in a word with regex

I'm trying to detect conditions where words have repetition of letters, and i would like to replace such matched conditions with the repeated letter. The text is in Hebrew. For instance,

should just become
Basically,when a letter repeats itself 3 times or more - it should be detected and replaced.

I want to use the regex expression for r
df$text <- gsub

Thank you for all suggestions

Answer Source

You can use

> x = "שללללוווווםםםם"
> gsub("(.)\\1{2,}", "\\1", x)
#[1] "שלום"

NOTE :- It will replace any character (not just hebrew) which is repeated more than three times.

or following for only letter/digit from any language

> gsub("(\\w)\\1{2,}", "\\1", x)