ooronning ooronning - 5 months ago 139
Node.js Question

Regular Expression for Route Params in Express.js

As per various docs and blogs, as well as this question and others, I'm aware that one can validate route parameters by using regular expressions. This, however, has had me searching for roughly an hour and a half:

app.get('/api/:url(/^http:\/\/(.+?)\.(.+?)$/)', (req, res) => {
// do stuff with req.params.url
});


Where every time I run the server locally and enter
localhost://8000/api/http://www.google.com
, the response is
Cannot GET /new/http://www.google.com


I know the Regular Expression does what I want it to do, because :

/^http:\/\/(.+?)\.(.+?)$/.test('http://www.google.com');


... returns true.

I've also tried changing the route string to look like...

'/api/:url(^http:\/\/(.+?)\.(.+?)$)'

'/api/:url(http:\/\/(.+?)\.(.+?))'

'/api/:url/^http:\/\/(.+?)\.(.+?)$/'


And several other formats that look anything remotely like the examples given in Stack Overflow threads and blogs (the Express docs have absolutely no examples of a Regular Expression used outside of the first route param). Perhaps there are some formatting constraints that I'm unaware of, or a limitation to RegExp support in Express? It'd be great if the docs said anything at all.

I appreciate any help I can get here.

Answer

Your regular expression is wrong, for what you're trying to achieve.

TL;DR Use this one:

app.get('/api/:url(https?:\/\/?[\da-z\.-]+\.[a-z\.]{2,6}\/?)', (req, res) => {
});

Some explaining:

Firstly, path-to-regexp, the module used by express does exactly what its name says: converts a path to a regular expression. As such, the ^ and $ anchors are either illegal or can be interpreted differently when placed inside your :url() contents (path-to-regexp already uses them). Also, do not include the slashes used in JavaScript to identify RegExp objects, but only the expression content.

Here's how express sees your regular expressions:

path: '/api/:url(/^http://(.+?).(.+?)$/)'
regexp: /^\/api\/(?:(\/^http:\/\/(?:\.+?))\.(\.+?)$\/)\/?$/i

path: '/api/:url(^http://(.+?).(.+?)$)'
regexp: /^\/api\/(?:(^http:\/\/(?:\.+?))\.(\.+?)$)\/?$/i

path: '/api/:url(http://(.+?).(.+?))'
regexp: /^\/api\/(?:(http:\/\/(?:\.+?))\.(\.+?))\/?$/i

path: '/api/:url/^http://(.+?).(.+?)$/
regexp: /^\/api\/(?:([^\/]+?))\/^http:\/\/(?:\.+?)\.(\.+?)$\/?$/i

And, using a code snippet like this, you can see that none match:

const url = 'http://www.google.com';
for (let layer of app._router.stack) {
  if (layer.name === 'bound dispatch') {
    console.log(layer.regexp + ': matches', url, '=', layer.regexp.test(url));
  }
}

It's also advisable to avoid using capturing parentheses in an already named capture.

Example (notice the extra parentheses around the scheme of the URI, and how they change the req.params.url value):

app.get('/api/:url((https?:\/\/)?[\da-z\.-]+\.[a-z\.]{2,6}\/?)', (req, res) => {})

> req.params: { '0': 'http://', url: 'http://' }

Later edit: Please note, this entire post is only about string routes that contain regular expression within, not routes defined with RegExp objects.

Comments