johnny utah johnny utah - 1 month ago 4
Python Question

replacing the hyphenated words in a sentence created by new lines

reso- lution
sug- gest
evolu- tion


are all words that have contain hyphens due to limited space in a line in a piece of text. e.g.


Analysis of two high reso- lution nucleosome maps revealed strong
signals that even though they do not constitute a definite proof are
at least consistent with such a view. Taken together, all these
findings sug- gest the intriguing possibility that nucleosome
positions are the product of a mechanical evolu- tion of DNA
molecules.


I would like to replace with their natural forms i.e.

resolution
suggest
evolution


How can I do this in a text with python?

Answer

Make sure there is a lowercase letter before - and a lowercase letter after the -+space, capture the letters and use backreferences to get these letters back after replacement:

([a-z])- ([a-z])

See regex demo (replace with \1\2 backreference sequence). Note that you may adjust the number of spaces with {1,max} quantifier (say, if there is one or two spaces between the parts of the word, use ([a-z])- {1,2}([a-z])). If there can be any whitespace, use \s rather than .

Python code:

import re
s = 'Analysis of two high reso- lution nucleosome maps revealed strong signals that even though they do not constitute a definite proof are at least consistent with such a view. Taken together, all these findings sug- gest the intriguing possibility that nucleosome positions are the product of a mechanical evolu- tion of DNA molecules.'
s = re.sub(r'([a-z])- ([a-z])', r'\1\2', s)
print(s)
Comments