jitendrapurohit jitendrapurohit - 7 months ago 21
Javascript Question

unicode chars give "unterminated string literal" in js

This error is generated when my HTML has some weird characters seen as a whitespace.



<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<p>Some
 Text</p>
</body>
</html>





Note that there is a character between
Some
and
Text
, but it is not seen here. I need to pass this to a function toJson(), but it returns an error saying
unterminated string literal
.

Everything just works fine when I use a simple text instead of this like:

Some<space>Text
works fine.

I've tried all the str_replace function which I found while searching for the same -

1) var re = /(?![\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})./g;
params.body_html = html.replace(re, '');
angular.toJson(params); // gives error


2) params.body_html.replace(/\uFFFD/g, '');
angular.toJson(params); // gives error


I don't know what character is this(may be unicode). When I copy this to a emacs file, it is seen as

Answer

Got this working with:

params.body_html = params.body_html.replace(/\u2028/g, '');
angular.toJson(params); //works fine.

Thanks to @Gothdo for providing the character link.

But the problem is it'll only replace if html has only this particular unicode char. Is there any function with which all unicode characters gets replaced or trimmed ?