Vidar S. Ramdal Vidar S. Ramdal - 26 days ago 15
C++ Question

What is the safe way to create a v8::String from a wchar_t with non-ASCII characters?

I'm writing a Node.js frontend for a DAB development board, which will eventually run on a Raspberry Pi. I am a Java and web developer, and I'm struggling with C++ and converting between different types of strings.

The DAB board comes with a C++ SDK, with a number of handy functions. It allows me to get the number of available programs with

GetTotalProgram()
. For each program I can call
GetProgramName
to get the program's name:

GetProgramName(char mode, long dabIndex, char namemode, wchar_t * programName)


... where
mode
means
FM
or
DAB
,
namemode
means long or short name. The program´s name will be returned in
programName
.

In order to convert the
wchar_t *programName
into a
v8::String
, I found this snippet that I'm using, and understand the basics of:

wchar_t buff[300];
char cbuff[600];
GetProgramName(0, i, 1, buff);
wcstombs( cbuff, buff, wcslen(buff) );
Local<String> str = String::NewFromUtf8(isolate, (const char *) cbuff, v8::String::kNormalString, wcslen(buff));


I iterate through the available programs and build up a
v8::Array
:

void GetPrograms(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = Isolate::GetCurrent();
HandleScope scope(isolate);

wchar_t buff[300];
char cbuff[600];
int numberOfPrograms, i;

numberOfPrograms = GetTotalProgram();
Local<v8::Array> ARRAY = Array::New(isolate, totalprogram);

for (i = 0; i < numberOfPrograms; i++) {
if (GetProgramName(0, i, 1, buff)) {
wcstombs( cbuff, buff, wcslen(buff) );
Local<String> str = String::NewFromUtf8(isolate, (const char *) cbuff, v8::String::kNormalString, wcslen(buff));
Local<Object> obj = Object::New(isolate);
obj->Set(String::NewFromUtf8(isolate, "name"), str);
ARRAY->Set(i, obj);
}
}
args.GetReturnValue().Set(ARRAY);
}


I call the C++ method from my Node app:

var programs = ext.getPrograms();
for (var i = 0; i < programs.length; i++) {
console.log(programs[i][name]);
}


This mostly works, but when the program's name contains a non ASCII-character, like
Æ
,
Ø
,
Å
, the next elements in ARRAY has a borked name
.

Here's what the Node snippet actually outputs (
console.log
), compared to the expected output:

| ACTUAL | EXPECTED |
| --------- | ---------- |
| NRK SUPER | NRK SUPER |
| NRK VUPER | NRK VÆR |
| NRK P1 ER | NRK P1 |


It seems as though the non-ASCII character causes the next
wcstombs
to quit early, not copying the later characters.

Why does this happen? Is there a better way to create a
v8::String
from my
wchar_t
?

Note:
I have now been able to isolate this problem down to the
wcstombs
method when running on the Raspberry Pi. The following code:

#include <stdio.h>
#include <string>
#include <cstring>
#include <cstdlib>

char cbuff[600];
wchar_t buff[300] = L"ABCø123abc";

int main( int argc, const char* argv[] ) {
wcstombs( cbuff, buff, wcslen(buff) );
wprintf(L"wcslen of wchar_t array: %u - strlen of char array: %u\n", (char) wcslen(buff), strlen(cbuff));
}


when run on a Mac, outputs

wcslen of wchar_t array: 10 - strlen of char array: 10
,

but when run on the Raspberry, outputs

wcslen of wchar_t array: 10 - strlen of char array: 3
- that is, it counts only characters before the
ø
character

This looks similar to this unanswered question.

Answer

The problem was in the wcstombs( cbuff, buff, wcslen(buff) ) call, which would stop copying characters when it encountered a non-ASCII character. The docs say The behavior of this function depends on the LC_CTYPE category of the selected C locale.

So setting the locale to a UTF-8 variant solved the problem:

setlocale(LC_CTYPE, "C.UTF-8");

Having done this, I can now create v8::Strings this way:

wchar_t buff[300] = L"Something non-ASCII ÆØÅ here";
char cbuff[600];
wcstombs( cbuff, buff, wcslen(buff) );
Local<String> str = String::NewFromUtf8(isolate, (const char *) cbuff, v8::String::kNormalString, wcslen(buff));
Comments