MohanL MohanL - 3 months ago 7
Javascript Question

looking for a better regex solution

my input is:

<span question_number="18"> blah blah blah 1</span><span question_number="19"> blah blah blah 2</span>


and I want my regex to match this
<span question_number="somenumber">xxxx</span>
pattern

and the desired output is 1.somenumber 2.xxxx

I wrote a naive solution which could cover

<span question_number="18"> blah blah blah 1</span>


<span question_number="19"> blah blah blah 2</span>


notice: they are on different lines

the output is :
18
,
blah blah blah 1
and
19
,
blah blah blah 2



but when the input is
<span question_number="18"> blah blah blah 1</span><span question_number="19"> blah blah blah 2</span>


which is on the same line

my output is
18
,
blah blah blah 1</span><span question_number="19"> blah blah blah 2


how could I bypass this problem?

Update:
regex:
/\<span question_number=(?:\")*(\d*)(?:\")*>(.*)<\/span>/ig


testinput:

case1 -> two lines of code

<span question_number="54">often graces doorways tied into ropes called</span>


<span question_number="54">often graces doorways tied into ropes called <i>ristras</i>.</span>


case2 -> one line of code

<span question_number="54">often graces doorways tied into ropes called</span><span question_number="54">often graces doorways tied into ropes called <i>ristras</i>.</span>


Update2:

This is not a dom , it is just a plain text that I want to process.

Update3:
so my problem about Regex is solved, now I have a question about comparing the proessing speed between regex or dom operation ? how could implement such a test ?

Answer

If it really isn't HTML (hmm?) you could do it with

<span question_number="(\d+)">(.*?)<\/span>

See it here at regex101.

The problem with your original regex is that it's greedy. The part (.*) will match as many characters it can, making sure the remaining <\/span> still can be matched. So it finds the first <span... and matches up to the last </span>. My attempt at a solution is non-greedy (The ? in (.*?)), thus just matching to the first </span>.