warkentien2 warkentien2 - 1 month ago 9
Javascript Question

RegEx leaves unwanted space behind

I have a list of words separated by comma.
e.g.:

list.join(' ');


How do I remove a word (variable) using RegEx and without leaving a space behind?

view example code:



var testClasses = document.getElementsByTagName("div")[0].className;
var classToRemove = "test3";

document.getElementsByTagName('p')[0].innerHTML = "Removing class ." + classToRemove + " from: <strong>" + testClasses + "</strong>";

var re = new RegExp(classToRemove + "\s?", "g");
testClasses = testClasses.replace(re, "");

// I ran into the same problem trying to be more specific
// var re = new RegExp("(\S+\s?)*(" + classToRemove + "\s?)(\S+\s?)*", "g");
// testClasses = testClasses.replace(re, "$1$3");


document.getElementsByTagName('p')[1].innerHTML = "becomes: <strong>" + testClasses + "</strong>" + " // which looks great on the DOM.";
console.log(testClasses);
console.log(testClasses.split(' '));

<div class="test1 test2 test3 test4 test5"></div>
<p></p>
<p></p>
<p>However, if you check console, the space is there. <br><strong>How do I remove this extra space?</strong> Without having to run a second replace.</p>





Restrictions:


  • I know this could be achieved with string or array manipulation. However, I'm trying to understand RegEx.

  • Only use one RegEx. Two replaces seems ugly and unnecessary.

  • I can't assume there'll always be an empty space before/after the given word.


Answer

May I interest you in Element.classList? This API allows mutating class attribute through convenient methods like .add(), .remove() and .toggle(). This is far superior to rolling your own RegExp solution.


If it doesn't have to be a RegExp solution, you could try Array.filter:

'alpha bravo charlie'
  .split(' ')
  .filter(function(token) { return token !== 'alpha' })
  .join(' ');

But let's get on with solving your RegExp riddle. In a string "alpha bravo charlie" you want to be able to remove any of the three tokens without leaving behind any unnecessary spaces before, after or between the remaining tokens. This can be done with the help of a negative look-ahead assertion (x(?!y)):

function removeToken(text, token) {
  var pattern = new RegExp('(\\s+(?!\\S+\\s+))?' + token + '\\s*');
  return text.replace(pattern, '');
}

The negative look-ahead assertion (\s(?!\S+\s+))? will only include the space in front of your token, if there is no space after the token. This way you avoid removing both spaces in case you're removing a token in the middle. The expression reads "capture one or more space characters, unless they are followed by one or more non-space characters that are followed by one or more space characters". The "non-space characters" match your token, without having to inject the token in there as well. As these leading spaces are not always there, the capture group is made optional by a trailing ?.

To test this code, we can run all four cases:

var text = 'alpha bravo charlie';
var tests = {
  // <token to remove>: <resulting string>
  'alpha': 'bravo charlie',
  'bravo': 'alpha charlie',
  'charlie': 'alpha bravo',
  'delta': 'alpha bravo charlie',
};

Object.keys(tests).forEach(function(token) {
  var expected = tests[token];
  var result = removeToken(text, token);
  console.log('removed "' + token + '" got "' + result + '" which is', expected === result ? 'correct' : 'WRONG');
});

and that should print

removed "alpha" got "bravo charlie" which is correct
removed "bravo" got "alpha charlie" which is correct
removed "charlie" got "alpha bravo" which is correct
removed "delta" got "alpha bravo charlie" which is correct

If you expect your tokens to contain characters that have a meaning in RegExp, you'd want to escape them.