SharonBL SharonBL - 4 months ago 30
Java Question

How to convert a string with Unicode encoding to a string of letters

I have a string with Unicode encoding,

, and I want to convert it to a regular letter (UTF-8). For example:

String myString = "\u0048\u0065\u006C\u006C\u006F World";

should become

"Hello World"

I know that when I print the string it shows
Hello world
. My problem is I read file names from a file on a Unix machine, and then I search for them. The files names are with Unicode encoding, and when I search for the files, I can't find them, since it searches for a file with
in its name.


Technically doing:

String myString = "\u0048\u0065\u006C\u006C\u006F World";

automatically converts it to "Hello World", so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX and just get XXXX) then do Integer.ParseInt(XXXX, 16) to get a hex value and then case that to char to get the actual character.

Edit: Some code to accomplish this:

String str = myString.split(" ")[0];
str = str.replace("\\","");
String[] arr = str.split("u");
String text = "";
for(int i = 1; i < arr.length; i++){
    int hexVal = Integer.parseInt(arr[i], 16);
    text += (char)hexVal;
// Text will now have Hello