Dmitry Leykin Dmitry Leykin - 3 months ago 10
R Question

capture repetition of letters in a word with regex

I'm trying to detect conditions where words have repetition of letters, and i would like to replace such matched conditions with the repeated letter. The text is in Hebrew. For instance,

שללללוווווםםםם
should just become
שלום
.
Basically,when a letter repeats itself 3 times or more - it should be detected and replaced.

I want to use the regex expression for r
gsub
.
df$text <- gsub
("?","?",
df$text
)

Thank you for all suggestions

Answer

You can use

> x = "שללללוווווםםםם"
> gsub("(.)\\1{2,}", "\\1", x)
#[1] "שלום"

NOTE :- It will replace any character (not just hebrew) which is repeated more than three times.

or following for only letter/digit from any language

> gsub("(\\w)\\1{2,}", "\\1", x)
Comments