Jake M Jake M - 4 months ago 11
PHP Question

Is the error upon Sending or Receiving? Full message not sent or message not decoded properly

So I am posting some data from my C++ desktop application to my server (a PHP script).

Not all the post data is received by the server. Where do you think the error is happening? At server-side decoding (of UTF-8) or at client-side transmission?

C++ Code: Note its Unicode. If I send ASCII the script receives/decodes the whole post data string:

static TCHAR hdrs[] =
_T("Content-Type: application/x-www-form-urlencoded; charset=UTF-8\0\0");
static TCHAR frmdata[] =
_T("name=John+Doe&auth=abc\0\0"); // use 2 null chars just incase
static LPSTR accept[2] = { "*/*", NULL };


HINTERNET hSession = InternetOpen(_T("MyAgent"),
INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);
// error checking removed but none of these fail

HINTERNET hConnect = InternetConnect(hSession, _T("mydomain.com"),
INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 1);

HINTERNET hRequest = HttpOpenRequest(hConnect, _T("POST"),
_T("upload.php"), NULL, NULL, (LPCWSTR*)&accept, INTERNET_FLAG_NO_CACHE_WRITE, 1);


HttpSendRequest(hRequest, hdrs, _tcslen(hdrs), frmdata, _tcslen(frmdata));
// The above function returns true and I query the response code and its HTTP 200 ok so sending is working


Simple PHP script:

$data = file_get_contents("php://input");
file_put_contents("post.txt", $data); // outputs "name=John+D" so its missing text

// To make things even more confusing
echo mb_detect_encoding($data); // outputs ASCII!!!???


Weirdly if I send as ASCII the script receives/decodes the whole post data

static char hdrs[] =
_T("Content-Type: application/x-www-form-urlencoded; charset=UTF-8\0\0");
static char frmdata[] =
_T("name=John+Doe&auth=abc\0\0");
static LPCSTR accept[2] = { "*/*", NULL };

...

HttpSendRequestA(hRequest, hdrs, strlen(hdrs), frmdata, strlen(frmdata));
// The above function returns true and I query the response code and its HTTP 200 ok so sending is working


With ASCII post.txt contains
name=John+Doe&auth=abc
. So where would the error be occurring? Is it not the whole post string is sent or the PHP script is not handling unicode correctly?

Answer

You don't send all characters. You also specify encoding incorrectly.

wchar_t *s1 = L"abc"; is not UTF-8 encoded char *s2 = "abc"; happens to be UTF-8 encoded (that's a nice property of of UTF-8) but with this notation you are limited to latin characters. See example below.

_tcslen(frmdata) returns number of characters, not bytes. The string takes more bytes than characters if you define Unicode. Your server expects UTF-8 byte sequence but the actual encoding is not UTF-8.

Few examples on how to specify literal strings encoding in C++ 11

// Greek small letter tau
char const *tau8 = u8"\u03C4"; // UTF-8
char16_t tau16 = u'\u03C4';    // UTF-16
wchar_t tau32 = U'\U000003C4'; // UTF-32