view raw
CrazyPig CrazyPig - 8 months ago 53
Java Question

how to use uniVocity-parsers to process non-printable character

I would like to use Java with uniVocity-parsers to parse the csv data which is produced by mysql

select into outfile

Now I encounter one situation of processing non-printable characters ! The mysql table contains
column and when using
select into outfile
to save it's data into file, I found that the
column data become non-printable character. When using uniVocity-parsers to get line data, I get
value of the
columns. I expect to get real data of the
column. What should I do ?


The problem here is that the bit(1) values are being exported by MySQL as characters \u0000 and \u0001, and the parser by default trims all values (meaning any character <= ' '). The trimming process will wipe out the \u0000 and \u0001 as their integer representations are 0 and 1 respectively, while the integer representation of a whitespace character ' ' is 32.

You just need to configure that parser to prevent trimming the values:


Also, the file you gave has lines terminated with \r\n. If you parse this on OSX or Linux you need to define the line endings explicitly:


Or enable auto-detection with:


Hope this helps