jackjop jackjop - 5 months ago 21
Javascript Question

Substring of a Turkish String

I have a string like this

var element = "İstanbul";


and when I convert it to lower case like this:

var element = element .toLowerCase();


it becomes

"istanbul"


I need the substring of the lower case string
"istanbul
".

So, when I do this before the lowerCase operation

element.substr(0,2)


the output is correct

enter image description here

but when I do the following it's wrong from which I know
substr(0,2)
should give
"is"
instead of
i


enter image description here

Why is it happening and how can I correct this?

Answer

It is happening because during changing to lower case the string is normalised, and the İ turns into 2 characters: "i" ( http://www.fileformat.info/info/unicode/char/0069/index.htm) and "̇" (the latter is a diacritical mark http://www.fileformat.info/info/unicode/char/0307/index.htm).

To prevent it you may split the string into characters using the ES2015 string iteration facilities and lower case the characters separately:

const arr_l_new = [...str].map(s => s.toLowerCase());

Then you can take the first N characters:

const first_2_chars = arr_l_new.slice(0, 2).join('');

Note: that if you count the length of the first_2_chars you will notice it has the length of 3, due to the diacritic character, which is actually not visible for the lower case i.

var str = "İstanbul";
const arr_l = [...str].map(s => s.toLowerCase());
const first_2_l = arr_l.slice(0, 2).join('');

console.log(first_2_l, first_2_l.length);

Comments