InvokerLaw InvokerLaw - 3 months ago 12
Markdown Question

markdown emph regex match

raw string:


These * should * not \*be\* selected. This* neither! *should be. This *neither should\* be* *this should* and*This*


expect:


These * should * not *be* selected. This* neither! *should be. This *neither should* be* <em>this should</em> ~~and<em>This</em>~~


old regex:


"(^|[\\W_])(?:(?!\\1)|(?=^))(\\*|_)(?=\\S)((?:(?!\\2).)*?\\S)\\2(?!\\2)(?=[\\W_]|$)"


the old one is not good enough to deal with that situation

could someone help? swift regex

right answer here "\\.|(\B\*\b(?:(?!\\[]).)?\b\*\B)". – Wiktor Stribiżew

Answer

You should be careful with a regex approach when parsing markdown with regex since your data can have escape sequences. That means, you cannot just use lookarounds to match something if it is not preceded with backslash. What you can try to do with regex is to match escape sequences coming before the markdown into one group and the markdown parts into another.

"(?u)(\\\\.)|(\\*\\b(?:(?!\\\\[*]).)*?\\b\\*)"

See this regex demo. Inside the code, you need to handle these 2 groups differently as per your specifications.

Pattern details:

  • (?u) - make the word boundaries Unicode-aware in the pattern
  • (\\\\.) - Group 1 - an escape sequence
  • | - or
  • (\\*\\b(?:(?!\\\\[*]).)*?\\b\\*) - Group 2 matching
    • \\*\\b - a * that is followed with a word char
    • (?:(?!\\\\[*]).)*? - any char that is not a starting char of a \* sequence, as few as possible
    • \\b\\* - a * that is preceded with a word char

Better option is a custom parsing code.

Comments