w.eric w.eric - 4 months ago 42
Java Question

UTF-8 coded japanese characters don't appear on android display

i'm trying to code a vocabulary app with android studio. I have a txt file with the vocabulary in UTF-8 format like this.

akarui _ あかるい _ bright


The Code to read the file and add to dictionary looks as follows:

public Map<String, String> adjectives_ej = new HashMap<String, String>();
try {
InputStream in = am.open("adjectives_utf8.txt");
//BufferedReader reader = new BufferedReader(new InputStreamReader(in, "UTF-8"));
BufferedReader br = new BufferedReader(new InputStreamReader(in, StandardCharsets.UTF_8));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null){
// printout first line
if (line != ""){

String[] parts = line.split("_");
byte[] bytes = parts[1].getBytes("UTF-8");
String japaneseString = new String(bytes, "UTF-8");
Log.d("voc", japaneseString);
adjectives_ej.put(parts[2].replaceAll(" ",""), new String(bytes, "UTF-8"));
adjectives_je.put(new String(bytes, "UTF-8"), parts[2].replaceAll(" ",""));
}

}
TextView textView = new TextView(this);
textView.setText(adjectives_ej.get("bright"));
ViewGroup layout = (ViewGroup)
findViewById(R.id.activity_adjectives);
layout.addView(textView);


If i want to see the output of
Log.d("test", adjectives_ej.get("bright"));
i get the error message:

java.lang.RuntimeException: Unable to start activity ComponentInfo{ericwolf.genkiii/ericwolf.genkiii.Adjectives}: java.lang.NullPointerException: println needs a message


But the
Log.d("voc", japaneseString);
gives me the right output:
07-31 19:42:41.600 25439-25439/ericwolf.genkiii D/voc: くらい


Additionally setting
textView.setText(parts[1]);
inside the "while" loop works just fine. So i don't understand the difference here.Is there a problem with saving it in a dictionary?

Answer

Thanks for sharing the txt file. It looks fine. Although it does contain a BOM, but I don't think that would cause any issues.

It's either one of these problems:

  1. Font issue. Maybe the font you are using to display doesn't support Asian character sets.

  2. More likely, the multiple decode/encode from UTF8 and back. Instead of this:

            String[] parts = line.split("_");
            byte[] bytes = parts[1].getBytes("UTF-8");
            String japaneseString = new String(bytes, "UTF-8");
            Log.d("voc", japaneseString);
            adjectives_ej.put(parts[2].replaceAll(" ",""), new String(bytes, "UTF-8"));
            adjectives_je.put(new String(bytes, "UTF-8"), parts[2].replaceAll(" ",""));
    

Recognize that line has already been decoded from UTF8 as a result of BufferedReader. There's no reason to encode it back to UTF8 only to decode it again. We can also cleanup that replaceAll stuff with a simple trim call.

So change the above to this:

            String[] parts = line.split("_");
            String japaneseString = parts[1].trim();
            String englishString = parts[2].trim();

            Log.d("voc", japaneseString + " : " + englishString);

            adjectives_ej.put(englishString, japaneseString);
            adjectives_je.put(japaneseString, englishString );