PeregrineStudios PeregrineStudios - 1 month ago 8
Javascript Question

RegEx for specific pattern, excluding URLs

Long story, but I need to take some fakey-HTML and replace it with real HTML using JavaScript. For example:

{span class:text-bold data:attribute}TITLE{/span}


Needs to change into:

<span class="text-bold" data="attribute">TITLE</span>


I'm using RegEx to do this as I can't possibly anticipate every attribute that could be placed on every element. The expression that is more or less working to find every instance of data:attribute:

/(\w+\:)(.[^\s\}]*)/g


However, there is an issue; this expression also matches URLs, for example:

http://www.google.ca


In an attempt to exclude any URLs from matching, I changed the expression like so:

/(?!http)(\w+\:)(.[^\s\}]*)/g


However, this did not have the expected effect, the pattern continues to match URLs, just without the leading 'h'. For example,

ttp://www.google.ca


I'll admit I haven't used RegEx in quite a while, so I'm probably misunderstanding something. How can I tell a RegEx pattern to NOT match any match that begins with a specific set of characters?

Answer

You need a negated look-ahead right before the possible //, i.e. after the colon.

"foo://bar".match(/(\w+:)(?!\/\/)([^\s\}]*)/); //no dice
"foo:bar".match(/(\w+:)(?!\/\/)([^\s\}]*)/); //dice

Of course, this will also block any attribute values that legitimately begin with //, but I assume that's a risk worth taking.