MrJalapeno MrJalapeno - 3 months ago 16
Java Question

Interpret a string from one encoding to another in java

I've looked around for answers to this (I'm sure they're out there), and I'm not sure it's possible.

So, I got a HUGE file that contains the word "för". I'm using RandomAccessFile because I know where it is (kind of) and can therefore use the seek() function to get there.

To know that I've found it I have a String "för" in my program that I check for equality. Here's the problem, I ran the debugger and when I get to "för" what I get to compare is "för".

So my program terminates without finding any "för".

This is the code I use to get a word:

private static String getWord(RandomAccessFile file) throws IOException {
StringBuilder stb = new StringBuilder();
String word;
char c;
c = (char)file.read();
int end;
do {
stb.append(c);
end = file.read();
if(end==-1)
return "-1";
c = (char)end;

} while (c != ' ');
word = stb.toString();
word.trim();
return word;
}


So basically I return all the characters from the current point in the file to the first ' '-character. So basically I get the word, but since (char)file.read(); reads a byte (I think), UTF-8 'ö' becomes the two characters 'Ã' and '¶'?

One reason for this guess is that if I open my file with encoding UTF-8 it's "för" but if I open the file with ISO-8859-15 in the same place we now have exactly what my getWord method returns: "för"

So my question:

When I'm sitting with a "för" and a "för", is there any way to fix this? Like saying "read "för" as if it was an UTF-8 string" to get "för"?

Answer
import java.nio.charset.Charset;
String encodedString = new String(originalString.getBytes("ISO-8859-15"), Charset.forName("UTF-8"));