Jean Henry Jean Henry - 6 days ago 3
Java Question

Java - Differentiate ZIP file from CSV file

I'm using a webservice that is always sending me a plain/text file. However, that file can either be a zip or a csv but I'm not being informed of its type beforehand.

Is there a way to know the file type by looking through its content programmatically wise of course. As one is in byte code and the other one an actually readeable text.

I've already thought of looking for lots of commas in the file content but that seems inaccurate.

Answer

You could make use of the ZIP file structure. As per the file header, each file should start with the bytes: 0x04 0x03 0x4b 0x50.

You could also use a MIME detection library such as Apache Tika import org.apache.tika.Tika; import org.apache.tika.mime.MediaType;

import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Detect {

    /**
     * Resolves the MediaType using Tika and prints it to the standard output.
     * @param file the path of the file to probe.
     * @throws IOException whenever an I/O exception occurs.
     */
    private void detect(Path file) throws IOException {
        Tika tika = new Tika();
        try(InputStream is = Files.newInputStream(file)){
            MediaType mediaType = MediaType.parse(tika.detect(is));
            System.out.println(mediaType);
        }
    }

    public static void main(String[] args) throws IOException {
        Detect d = new Detect();
        d.detect(Paths.get("zip_file"));
        d.detect(Paths.get("csv_file"));
    }
}