Jimmy Mattsson Jimmy Mattsson - 3 months ago 22
Javascript Question

Convert japanese to HTML entities

When sending a form with japanese chars through this ajax function, the chars are sent to the server in japanese format and the data are stored as ¿ in the database.

var strAction = "/_ajax/save/"+sSavePage+"?action=saveseo&intFolderID="+iFolderID+"&intPageID="+iPageID;
var frm = $("#frmSmartPage");
var data = frm.serialize();

$.ajax({
type: frm.attr('method'),
url: strAction,
data: data,
success: function (data) {
alert('ok');
}
});


On the same page the form can also be posted through a submit. The japansese chars are then converted to
&#<number>
format.

<form method="post" target="ajax_save" autocomplete="off" name="frmSmartPage" id="frmSmartPage" action="<%=constBetaPath%>/_ajax/save/pages_save.asp?intPageID=<%=intPageID%>&intFolderID=<%=intFolderID%>&action=save" onSubmit="return validateSave()">


I would prefer to be able to convert the japanese chars to the
&#<number>
format in the ajax call, but so far I havn't had any luck.

Things I've allready tried:

var data = unescape(encodeURIComponent(frm.serialize()));
---
var data = escape(frm.serialize());
---
accepts: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
---
contentType: 'application/x-www-form-urlencoded;'
---
contentType: 'application/x-www-form-urlencoded; charset=UTF-8'


EDIT:

Html encoding:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />


EDIT 2:

Backend code is decoding the iso-8859-1 to UTF8

'******************************************************************************************************************
'' @SDESCRIPTION: Decodes from ISO-8859-1 to UTF8
'' @PARAM: - s [string]: your string to be decoded
'' @RETURN: [string] decoded string
'' @DESCRIPTION: Usefull to use when saving special chars from a ISO-8859-1 post to an UTF-8 page, example via AJAX
'******************************************************************************************************************
public function DecodeUTF8(s)
dim i
dim c
dim n

s = s + " "

i = 1
do while i <= len(s)
c = asc(mid(s,i,1))
if c and &H80 then
n = 1
do while i + n < len(s)
if (asc(mid(s,i+n,1)) and &HC0) <> &H80 then
exit do
end if
n = n + 1
loop
if n = 2 and ((c and &HE0) = &HC0) then
c = asc(mid(s,i+1,1)) + &H40 * (c and &H01)
else
c = 191
end if
s = left(s,i-1) + chr(c) + mid(s,i+n)
end if
i = i + 1
loop
DecodeUTF8 = Left(s, Len(s)-1)
end function


SOLUTION
Thanks to Álvaro González reply I was able to create a workaround, by creating a temp form to use for submiting.

var strAction = "/_ajax/save/"+sSavePage+"?action=saveseo&intFolderID="+iFolderID+"&intPageID="+iPageID;
var newForm = $('<form />');
var orginalForm = $("#frmSmartPage");

newForm.append(orginalForm.clone().children());
newForm.attr('method', 'post');
newForm.attr('target', 'ajax_save');
newForm.attr('action', strAction);
newForm.css('display', 'none');

orginalForm.parent().append(newForm);

var target = $("#ajax_save");

target.one('load', function () {
newForm.remove();
});

newForm.submit();

Answer

You have a serious root problem: the ISO-8859-1 charset (also known as Latin-1, which should already give you a clue) is designed for the Latin script used by Western Europe languages and can simply not encode Japanese characters. Everywhere else you are using UTF-8, which is the only sensible encoding choice as of today and doesn't have any restriction of this kind, but ISO-8859-1 is the weak link in the chain that makes it all terribly complicated.

To make it worse, I spot some details that worry me. You are using AJAX to send the information and, since AJAX mandates UTF-8, jQuery will take care of converting it to UTF-8 automatically. However, server-side code incorrectly assumes ISO-8859-1 and will make a bogus conversion. If this code is already in Production, it has possibly been corrupting the data you already have.

You basically have two choices:

  1. Switch everything to UTF-8. This will save you all encoding issues in the future but requires a careful migration of current codebase.

  2. Figure out a way to encode Japanese as ISO-8859-1 in client-side code and decode it properly in server-side code. Thankfully, browsers are already aware of the problem and (since HTML is their master language) they normally decide to use HTML-entities (that's what those &#<number> are and come from) when they have to submit a form that contains character not supported by the document encoding.

    In this case, what you need to do is to change your server-side code to:

    1. Do not make any encoding conversion (data is already UTF-8)
    2. Decode HTML entities (taking into account that the string is UTF-8)
Comments