wordermorr wordermorr - 11 days ago 12
PHP Question

Character encoding error in idHTTP

I'm having a situation with TIdHTTP and TIdMultipartFormDataStream.

My code is:

FormPHP := TIdMultiPartFormDataStream.Create;
FormPHP.AddFile('imagem',AImagem,'image/jpeg');
FormPHP.AddFormField('iduser',AIDUser,'text/plain');
FormPHP.AddFormField('nome',ANome,'text/plain');
FormPHP.AddFormField('data',AData,'text/plain');
FormPHP.AddFormField('hora',AHora,'text/plain');
FormPHP.AddFormField('mensagem',AMensagem,'text/plain');
FormPHP.AddFormField('latitude','1','text/plain');
FormPHP.AddFormField('longitude','1','text/plain');

Response := TStringStream.Create('', TEncoding.ANSI);

HTTP:= TIdHTTP.Create(self);
HTTP.Request.CustomHeaders.Clear;
HTTP.Request.Clear;
HTTP.Request.ContentType:= 'multipart/form-data'; //application/x-www-form-urlencoded
HTTP.Request.ContentEncoding:= 'MeMIME';
HTTP.Request.CharSet:= 'utf-8';
HTTP.Request.Referer:= 'http://observadordecascavel.blog.br/cadastro.php';
HTTP.Post('http://observadordecascavel.blog.br/cadastro.php',FormPHP,Response);


This is the PHP script:

<?php
#cadastro.php - Cadastra os dados enviados na tabela online.
$mysqli = new mysqli("mysqlhost","username","password","dbname");

$iduser = $_POST['iduser'];
$nome = $_POST['nome'];
$data = $_POST['data'];
$hora = $_POST['hora'];
$mensagem = $_POST['mensagem'];
$latitude = $_POST['latitude'];
$longitude = $_POST['longitude'];
$imagem = $_FILES["imagem"]['tmp_name'];
$tamanho = $_FILES['imagem']['size'];

if ( $imagem != "none" )
{
$fp = fopen($imagem, "rb");
$conteudo = fread($fp, $tamanho);
$conteudo = addslashes($conteudo);
fclose($fp);

$queryInsercao = "INSERT INTO tabpainel (iduser, nome, data, hora, mensagem, latitude, longitude, imagem) VALUES ('$iduser', '$nome', '$data','$hora','$mensagem', '$latitude', '$longitude', '$conteudo')";

mysqli_query($mysqli,$queryInsercao) or die("Algo deu errado ao inserir o registro. Tente novamente.");

if(mysqli_affected_rows($mysqli) > 0)
print "Sucesso!";
else
print "Não foi possível inserir o registro";
}
else
print "Não á foi possível carregar a imagem.";
?>


Explaining: My application post these fields to this PHP script and the php saves the data into a MySQL database and returns a response of "Sucesso!" to the application to inform the user that the data were saved. This text response is encoded in ANSI. I discovered that when i had to change the TStringStream encode to TEncoding.ANSI so it could recognize the "Não" word when something goes wrong.

Until the post, the variable AMensagem is ok, however, when the PHP receives the text, it's not correct. A text like this: "á Á é É" looks like this "=E1 =C1 =E9 =C9". This is saved in the mysql database.

I don't know if the problem is with the idHTTP or with TIdMultipartFormDataStream, or even with the PHP code. Everything works fine, it's just the encoding that i have no clue why it's not working.

Answer

The text that is transmitted to the server is not being encoded in UTF-8.

All of your AddFormField() calls are specifying the text/plain media type in the ACharset parameter instead of the AContentType parameter. Unlike AddFile(), the 3rd parameter of AddFormField() is the charset, and the 4th parameter is the media type.

function AddFormField(const AFieldName, AFieldValue: string; const ACharset: string = ''; const AContentType: string = ''; const AFileName: string = ''): TIdFormDataField; overload;

By passing an invalid charset, TIdMultipartFormDataStream ends up using Indy's built-in raw 8bit encoding instead, which encodes Unicode characters U+0000 - U+00FF as bytes $00 - $FF, respectively, and all other characters as byte $3F ('?'). The text you are sending happens to fall in that first range.

TIdFormDataField does not currently inherit a charset from TIdMultipartFormDataStream or TIdHTTP (work on that is in-progress), so you have to specify it on a per-field basis.

On a side note, MeMIME is not a valid ContentEncoding value. And you should not be setting any ContentEncoding value for a multipart/form-data post anyway.

Try something more like this instead:

FormPHP := TIdMultiPartFormDataStream.Create;

FormPHP.AddFile('imagem', AImagem, 'image/jpeg');
FormPHP.AddFormField('iduser', AIDUser, 'utf-8');
FormPHP.AddFormField('nome', ANome, 'utf-8');
FormPHP.AddFormField('data', AData, 'utf-8');
FormPHP.AddFormField('hora', AHora, 'utf-8');
FormPHP.AddFormField('mensagem', AMensagem, 'utf-8');
FormPHP.AddFormField('latitude', '1');
FormPHP.AddFormField('longitude', '1');

Response := TStringStream.Create('');

HTTP := TIdHTTP.Create(Self);
HTTP.Request.Referer := 'http://observadordecascavel.blog.br/cadastro.php';
HTTP.Post('http://observadordecascavel.blog.br/cadastro.php', FormPHP, Response);

Alternatively:

FormPHP := TIdMultiPartFormDataStream.Create;

FormPHP.AddFile('imagem', AImagem, 'image/jpeg');
FormPHP.AddFormField('iduser', AIDUser).Charset := 'utf-8';
FormPHP.AddFormField('nome', ANome).Charset := 'utf-8';
FormPHP.AddFormField('data', AData).Charset := 'utf-8';
FormPHP.AddFormField('hora', AHora).Charset := 'utf-8';
FormPHP.AddFormField('mensagem', AMensagem).Charset := 'utf-8';
FormPHP.AddFormField('latitude', '1');
FormPHP.AddFormField('longitude', '1');

Response := TStringStream.Create('');

HTTP := TIdHTTP.Create(Self);
HTTP.Request.Referer := 'http://observadordecascavel.blog.br/cadastro.php';
HTTP.Post('http://observadordecascavel.blog.br/cadastro.php', FormPHP, Response);

Either way, the field text will be encoded using UTF-8 instead of Ansi.


Update: Now, with that said, AddFormField() sets the TIdFormDataField.ContentTransfer property to quoted-printable by default. However, PHP's $_POST does not decode quoted-printable by default, you would have to call quoted_printable_decode() manually:

$iduser         = quoted_printable_decode($_POST['iduser']);
$nome           = quoted_printable_decode($_POST['nome']);
$data           = quoted_printable_decode($_POST['data']);
$hora           = quoted_printable_decode($_POST['hora']);
$mensagem       = quoted_printable_decode($_POST['mensagem']);
$latitude       = quoted_printable_decode($_POST['latitude']);
$longitude      = quoted_printable_decode($_POST['longitude']);

If you don't want TIdFormDataField to encode the UTF-8 text using quoted-printable, you can set the ContentTransfer property to 8bit instead:

FormPHP.AddFormField('iduser', AIDUser, 'utf-8').ContentTransfer := '8bit';
FormPHP.AddFormField('nome', ANome, 'utf-8').ContentTransfer := '8bit';
FormPHP.AddFormField('data', AData, 'utf-8').ContentTransfer := '8bit';
FormPHP.AddFormField('hora', AHora, 'utf-8').ContentTransfer := '8bit';
FormPHP.AddFormField('mensagem', AMensagem, 'utf-8').ContentTransfer := '8bit';
FormPHP.AddFormField('latitude', '1');
FormPHP.AddFormField('longitude', '1');

Alternatively:

with FormPHP.AddFormField('iduser', AIDUser) do begin
  Charset := 'utf-8';
  ContentTransfer := '8bit';
end;
with FormPHP.AddFormField('nome', ANome) do begin
  Charset := 'utf-8';
  ContentTransfer := '8bit';
end;
with FormPHP.AddFormField('data', AData) do begin
  Charset := 'utf-8';
  ContentTransfer := '8bit';
end;
with FormPHP.AddFormField('hora', AHora) do begin
  Charset := 'utf-8';
  ContentTransfer := '8bit';
end;
with FormPHP.AddFormField('mensagem', AMensagem) do begin
  Charset := 'utf-8';
  ContentTransfer := '8bit';
end;
FormPHP.AddFormField('latitude', '1');
FormPHP.AddFormField('longitude', '1');

Either way, you can then use your original PHP code again:

$iduser         = $_POST['iduser'];
$nome           = $_POST['nome'];
$data           = $_POST['data'];
$hora           = $_POST['hora'];
$mensagem       = $_POST['mensagem'];
$latitude       = $_POST['latitude'];
$longitude      = $_POST['longitude'];

Whether you use quoted-printable or not, the PHP variables will end up holding UTF-8 encoded text. If you need the variables to be in another encoding, you will have to convert them as needed, by using either:

  1. utf8_decode() (which decodes to ISO-8859-1):

    $iduser         = utf8_decode($iduser);
    $nome           = utf8_decode($nome);
    $data           = utf8_decode($data);
    $hora           = utf8_decode($hora);
    $mensagem       = utf8_decode($mensagem);
    $latitude       = utf8_decode($latitude);
    $longitude      = utf8_decode($longitude);
    
  2. mb_convert_encoding()

    $iduser         = mb_convert_encoding($iduser, 'desired charset', 'utf-8');
    $nome           = mb_convert_encoding($nome), 'desired charset', 'utf-8');
    $data           = mb_convert_encoding($data, 'desired charset', 'utf-8');
    $hora           = mb_convert_encoding($hora, 'desired charset', 'utf-8');
    $mensagem       = mb_convert_encoding($mensagem, 'desired charset', 'utf-8');
    $latitude       = mb_convert_encoding($latitude, 'desired charset', 'utf-8');
    $longitude      = mb_convert_encoding($longitude, 'desired charset', 'utf-8');
    
  3. iconv():

    $iduser         = iconv('utf-8', 'desired charset', $iduser);
    $nome           = iconv('utf-8', 'desired charset', $nome);
    $data           = iconv('utf-8', 'desired charset', $data);
    $hora           = iconv('utf-8', 'desired charset', $hora);
    $mensagem       = iconv('utf-8', 'desired charset', $mensagem);
    $latitude       = iconv('utf-8', 'desired charset', $latitude);
    $longitude      = iconv('utf-8', 'desired charset', $longitude);
    

Finally, when sending a response back to the client, you need to encode text when it contains non-ASCII characters. You should also be using header() to let the client know which charset is being used for that encoding:

header($_SERVER["SERVER_PROTOCOL"] . " 200 OK"); 
header('Content-Type: text/plain; charset="utf-8"');

if ( $imagem != "none" )
{
    ...
    if (mysqli_affected_rows($mysqli) > 0)
        print utf8_encode("Sucesso!");
    else
        print utf8_encode("Não foi possível inserir o registro");
}
else
    print utf8_encode("Não á foi possível carregar a imagem.");
Comments