CrazyPig CrazyPig - 11 months ago 68
Java Question

how to use uniVocity-parsers to process non-printable character

I would like to use Java with uniVocity-parsers to parse the csv data which is produced by mysql

select into outfile
.

Now I encounter one situation of processing non-printable characters ! The mysql table contains
bit(1)
column and when using
select into outfile
to save it's data into file, I found that the
bit(1)
column data become non-printable character. When using uniVocity-parsers to get line data, I get
null
value of the
bit(1)
columns. I expect to get real data of the
bit(1)
column. What should I do ?

Answer Source

The problem here is that the bit(1) values are being exported by MySQL as characters \u0000 and \u0001, and the parser by default trims all values (meaning any character <= ' '). The trimming process will wipe out the \u0000 and \u0001 as their integer representations are 0 and 1 respectively, while the integer representation of a whitespace character ' ' is 32.

You just need to configure that parser to prevent trimming the values:

    settings.trimValues(false);

Also, the file you gave has lines terminated with \r\n. If you parse this on OSX or Linux you need to define the line endings explicitly:

    settings.getFormat().setLineSeparator("\r\n");

Or enable auto-detection with:

    settings.setLineSeparatorDetectionEnabled(true);

Hope this helps