jiajianrong jiajianrong - 12 days ago 5
Node.js Question

Comparing speed of regexp and split when parsing url query

I want to make sure what is the fast in node to parse a url query value. For example,

hello.org/post.html?action=newthread&fid=32&fpage=1
, if I want to get fid value, I have 3 choice:

1
str.match(/[?/&]fid=(.*?)($|[&#])/)


2
req.query.fid
in express, which I found is actually calling
https://github.com/ljharb/qs/blob/master/lib/parse.js
, which is I found is using
str.split('&')
in behind

3
str.split('/[&/#?]/')
and then use
for
loop to determine which is start with
fid


I'm guessing 1st is the slowest, and the 2nd is the fast. But I don't know if it's correct (though I can make a test), but I do want to know some deep reason, thanks.

Answer

1) regexp is advanced string operations. For every character encountered while parsing, it has to match with each token in the entire regexp string. The complexity is a non-linear function of the length of source and the length of the regexp string.

2) whereas string tokenizer (split) on single char, the task is clearly cut out, as you sequentially traverse the source string, 'cut' and tokenize the word when encountered the pattern char, and move forward. The complexity is as good as order of n, where n is the number of chars in the string.

3) is actually a variant of (2), but with more chars in the splitter. So in case if the first char matches, there is additional work involved to match the subsequent chars etc. So the complexity increases, and move towards regexp. The performance is still better than regexp, as the regexp require further interpretation of its own tokens.

Hope this helps.

Comments