Defoe Defoe - 15 days ago 6x
Ruby Question

Ruby regex eliminate new line until . or ? or capital letter

I'd like to do the following with my strings:

line1= "You have a house\nnext to the corner."

if the sentence doesn't finish in new line after dot or question mark or capital letter, so the desired output will be in this case:

"You have a house next to the corner.\n"

So another example, this time with the question mark:

"You like baggy trousers,\ndon't you?

should become:

"You like baggy trousers, don't you?\n".

I've tried:

line1.gsub!(/(?<!?|.)"\n"/, " ")

this immediately preceding \n there must NOT be either question mark(?) or a comma

But I get the following syntax error:

SyntaxError: (eval):2: target of repeat operator is not specified: /(?<!?|.)"\n"/

And for the sentences where in the middle of them there's a capital letter, insert a \n before that capital letter so the sentence:

"We were winning The Home Secretary played a important role."

Should become:

"We were winning\nThe Home Secretary played a important role."


NOTE: The answer is not meant to provide a generic way to remove unnecessary newline symbols inside sentences, it is only meant to serve OP purpose to only remove or insert newlines in specific places in a string.

Since you need to replace matches in different scenarios differently, you should consider a 2-step approach.

.gsub(/(?<![?.])\n/, ' ')

This one will replace all newlines that are not preceded with ? and . (as (?<![?.]) is a negative lookbehind failing the match if there is a subpattern match before the current location inside the string).

The second step is

.sub(/(?<!^) *+(?=[A-Z])/, '\n')


.sub(/(?<!^) *+(?=\p{Lu})/, '\n')

It will match 0+ spaces ( *+) (possessively, no backtracking into the space pattern) that are not at the beginning of the line (due to the (?<!^) negative lookbehind, replace ^ with \A to match the start of the whole string), and that is followed with a capital letter ((?=\p{Lu}) is a positive lookahead that requires a pattern to appear right after the current location to the right).