CodingMadeEasy CodingMadeEasy - 1 month ago 8
Javascript Question

Regex to gather any word in array of words continuously

I have some text with a series of keywords.

Ex:

Text: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

FooKeyword: Foo
AnotherKeyword: Yay!


I need to be able to match the keyword as well as all the text leading up to the next keyword.

So something like:

Match 1:
Group[0] = FooKeyword
Group[1] = Foo


So far this is what I have:

[\s\S]?(Text:|FooKeyword:|AnotherKeyword:).*


it works for the most part but the issue is that it doesn't work for new lines. I need to gather everything in-between each keyword. How do I go about that?

Answer

You can try this: /(Keyword\d+): ?(.+?)(?=\nKeyword|$)/gs

See it working here: https://regex101.com/r/zkLoYZ/1.

[EDIT] Add explanations:

  • the s flag is very important here as you want to treat multilines
  • I optimized your (Keyword1:|Keyword2:|Keyword3:) into (Keyword\d+)
  • I detect each section by the presence of next 'Keyword' at begining of new line or end of string ($)
  • (?=something) is a positive lookahead
  • in (.+?), the ? means lazy you can learn more here

[EDIT] after question edit.

So if you want to have distinct keywords, you can keep the same regex pattern but replace (Keyword\d+) by a list of prior-generated keywords separated by | like you had actually.

So at worse the generated /(Text|FooKeyword|AnotherKeyword): ?(.+?)(?=\nText|FooKeyword|AnotherKeyword|$)/ will work like here: https://regex101.com/r/zkLoYZ/4

Now you should be able to reuse the match with \1 let me try: actually no way to reuse previous capture because the keyword to match next is not the same as the one just passed and in regex memory.

So (Text|FooKeyword|AnotherKeyword): ?(.+?)(?=\n(Text|FooKeyword|AnotherKeyword)|$) is the best way with regex.