ben_aaron ben_aaron - 16 days ago 7
Javascript Question

Cannot recognise substring in parent-string JQuery

I have a parent string of which I want to replace certain entities:

FIDDLE example here

var parent_string = "Steven Paul Steve Jobs (February 24, 1955 – October 5, 2011) was an American information technology entrepreneur and inventor who worked with Steve Wozniak.";

var entities = {
PERSON: ['Steven Paul Steve Jobs',
'Steve Wozniak'
],
DATE: ['February 24, 1955',
'October 5, 2011'
]
};


I now loop through the entities and check whether the values are substrings of the parent string. If so, I replace them with the string
"REPLACED"
.

var replacement = 'REPLACED';

$.each(entities, function(key, value) {
$.each(this, function(index, val) {
console.log(val);
tester = parent_string.indexOf(val);
console.log(tester);
var re = new RegExp(val);
parent_string = parent_string.replace(re, replacement);
});
console.log(parent_string);
});


Now here is my problem: This works for the entities except for
'Steven Paul Steve Jobs'
.

The expected output would be this string:

"REPLACED (REPLACED – REPLACED) was an American information technology entrepreneur and inventor who worked with REPLACED."


If I do this more manually like this:

str = "Steven Paul Steve Jobs (February 24, 1955 – October 5, 2011)";
val = "Steven Paul Steve Jobs";
str.indexOf(val);


... it seems to work.

Why does this not work in my loop?

Answer

It seems the whitespace between those names is some Unicode whitespace. I suggest replacing all literal spaces with \s+ pattern in the regex:

var re = new RegExp(val.replace(/\s+/g, '\\s+'));
                       ^^^^^^^^^^^^^^^^^^^^^^^^

Then, the regex will look like /Steven\s+Paul\s+Steve\s+Jobs/ and \s will match all those whitespaces.

See the updated fiddle.

Comments