DoubleOseven DoubleOseven - 2 months ago 13
Java Question

Splitting up a text file into two files (java)

I need some help into figuring out how to split a text file into two files in java.

I have a text file in which each line contains in alphabetical order a word a space and its index, i.e.

...

stand 345

stand 498

stare 894

...


What I would like to do is to read in this file and then write two separate files. One file should contain only one instance of the word and the other the positions of the word in the document.
The file is really big and I was wondering if I can use an array or a list to store the word and index before creating the file or if there is a better way.
I don't really know how to think.

Answer

If your file is really long, then you should consider using a database. If your file is not too big then you can use a HashMap. You can also use a class like this, it requires that the file is sorted, and it writes the words in one file and the indices in another file:

public class Split {
private String fileName;
private PrintWriter fileWords;
private PrintWriter fileIndices;

public Split(String fname) {
    fileName = fname;
    if (initFiles()) {
        writeList();
    }
    closeFiles();
}

private boolean initFiles() {
    boolean retval = false;
    try {
        fileWords = new PrintWriter("words-" + fileName, "UTF-8");
        fileIndices = new PrintWriter("indices-" + fileName, "UTF-8");
        retval = true;
    } catch (Exception e) {
        System.err.println(e.getMessage());
    }
    return retval;
}

private void closeFiles() {
    if (null != fileWords) {
        fileWords.close();
    }
    if (null != fileIndices) {
        fileIndices.close();
    }
}

private void writeList() {
    String lastWord = null;
    List<String> wordIndices = new ArrayList<String>();
    Path file = Paths.get(fileName);
    Charset charset = Charset.forName("UTF-8");
    try (BufferedReader reader = Files.newBufferedReader(file, charset)) {
        String line = null;
        while ((line = reader.readLine()) != null) {
            int len = line.length();
            if (len > 0) {
                int ind = line.indexOf(' ');
                if (ind > 0 && ind < (len - 1)) {
                    String word = line.substring(0, ind);
                    String indice = line.substring(ind + 1, len);
                    if (!word.equals(lastWord)) {
                        if (null != lastWord) {
                            writeToFiles(lastWord, wordIndices);
                        }
                        lastWord = word;
                        wordIndices = new ArrayList<String>();
                        wordIndices.add(indice);
                    } else {
                        wordIndices.add(indice);
                    }
                }
            }
        }
        if (null != lastWord) {
            writeToFiles(lastWord, wordIndices);                    
        }
    } catch (IOException x) {
        System.err.format("IOException: %s%n", x);
    }
}

private void writeToFiles(String word, List<String> list) {

    boolean first = true;
    fileWords.println(word);
    for (String elem : list) {
        if (first) {
            first = false;
        }
        else {
            fileIndices.print(" ");
        }
        fileIndices.print(elem);

    }
    fileIndices.println();
}

}

Be careful that the file name handling is not very robust, you can use it that way:

Split split = new Split("data.txt") ;