NexusDuck NexusDuck - 29 days ago 15
Java Question

Converting EBCDIC to ASCII in java

so I am supposed to convert an EBCDIC file to ASCII by using Java. So far I have this code:

public class Migration {
InputStreamReader reader;
StringBuilder builder;

public Migration(){
try {
reader = new InputStreamReader(new FileInputStream("C:\\TI3\\Legacy Systemen\\Week 3\\Oefening 3\\inputfile.dat"),
java.nio.charset.Charset.forName("ibm500") );
} catch(FileNotFoundException e){
e.printStackTrace();
}
builder = new StringBuilder();
}

public void read() throws IOException {
int theInt;
while((theInt = reader.read()) != -1){
char theChar = (char) theInt;
builder.append(theChar);

}

reader.close();
}

@Override
public String toString(){
return builder.toString();
}
}


The file description is the following:

02 KDGEX.
05 B1-LENGTH PIC S9(04) USAGE IS COMP.
05 B1-CODE PIC S9(04) USAGE IS COMP.
05 B1-NUMBER PIC X(08).
05 B1-PPR-NAME PIC X(06).
05 B1-PPR-FED PIC 9(03).
05 B1-PPR-RNR PIC S9(08) USAGE IS COMP.
05 B1-DATA.
10 B1-VBOND PIC 9(02).
10 B1-KONST.
20 B1-AFDEL PIC 9(03).
20 B1-KASSIER PIC 9(03).
20 B1-DATZIT-DM PIC 9(04).
10 B1-BETWYZ PIC X(01).
10 B1-RNR PIC X(13).
10 B1-BETKOD PIC 9(02).
10 B1-VOLGNR-INF PIC 9(02).
10 B1-QUAL-PREST PIC 9(03).
10 B1-REKNUM PIC 9(12).
10 B1-REKNR REDEFINES B1-REKNUM.
20 B1-REKNR-PART1 PIC 9(03).
20 B1-REKNR-PART2 PIC 9(07).
20 B1-REKNR-PART3 PIC 9(02).
10 B1-VOLGNR-M30 PIC 9(03).
10 B1-OMSCHR.
15 B1-OMSCHR1 PIC X(14).
15 B1-OMSCHR2 PIC X(14).
10 B1-OMSCHR-INF REDEFINES B1-OMSCHR.
15 B1-AANT-PREST PIC 9(02).
15 B1-VERSTR PIC 9(01).
15 B1-LASTDATE PIC 9(06).
15 B1-HONOR PIC 9(06).
15 B1-RIJKN PIC X(13).
10 FILLER--1 PIC 9(02).
10 B1-INFOREK PIC 9(01).
10 B1-BEDRAG-EUR PIC 9(08).
10 B1-BEDRAG-DV PIC X(01).
10 B1-BEDRAG-RMG-DV REDEFINES B1-BEDRAG-DV PIC X(01).
05 FILLER PIC X(5).


We can ignore the first 2 bytes on every line. The problem is the bytes where there's a USAGE IS COMP since the reader is not converting them properly, I think I am supposed to read these as bytes or something, though I have no idea how.

Answer

If I am interpreting this format correctly you have a binary file format with fixed-length records. Some of these records are not character data (COBOL computational fields?)

So, you will have to read the records using a more low-level approach processing individual fields of each record:

import java.io.*;

public class Record {
  private byte[] kdgex = new byte[2]; // COMP
  private byte[] b1code = new byte[2]; // COMP
  private byte[] b1number = new byte[8]; // DISPLAY
  // other fields

  public void read(DataInput data) throws IOException {
    data.readFully(kdgex);
    data.readFully(b1code);
    data.readFully(b1number);
    // other fields
  }

  public void write(DataOutput out) throws IOException {
    out.write(kdgex);
    out.write(b1code);
    out.write(b1number);
    // other fields
  }
}

Here I've used byte arrays for the first three fields of the record but you could use other more suitable types where appropriate (like a short for the first field with readShort.) Note: my interpretation of the field widths is likely wrong; it is just an example.

The DataInputStream is generally used as a DataInput implementation.

Since all characters in the source and target encodings use a one-octet-per code point you should be able to transcode the character data fields using a method like this:

public static byte[] transcodeField(byte[] source, Charset from, Charset to) {
  byte[] result = new String(source, from).getBytes(to);
  if (result.length != source.length) {
    throw new AssertionError(result.length + "!=" + source.length);
  }
  return result;
}

I suggest tagging your question with COBOL (assuming that is the source of this format) so that someone else can speak with more authority on the format of the data source.