Rob Rob - 2 months ago 14
Java Question

Fast way to write millions of small text files in Java?

I have to dump 6 million files which contain around 100-200 characters, and it's painfully slow. The actual slow part is the file writing, if I comment that part out (the call to the WriteSoveraFile method) the whole thing runs in 5-10 minutes. As it is, I ran it overnight (16 hours) and got done with 2 million records.

  1. is there any faster method?

  2. Would I be better off creating an array of arrays and then dumping it all at once? (my system only has 4 GB, wouldn't it die from the 6 GB of data consumed by this?)

Here is the procedure:

public static void WriteSoveraFile(String fileName, String path, String contents) throws IOException {

BufferedWriter bw = null;

try {
String outputFolderPath = cloGetAsFile( GenCCD.o_OutER7Folder ).getAbsolutePath() ;
File folder = new File( String.format("%1$s/Sovera/%2$s/", outputFolderPath, path) );

if (! folder.exists()) {

/* if (this.rcmdWriter != null)

File file = new File( String.format("%1$s/%2$s", folder.getAbsolutePath(),fileName) );

// if file doesnt exists, then create it
if (!file.exists()) {
FileWriter fw = new FileWriter(file.getAbsoluteFile());
bw = new BufferedWriter(fw);
/* else {
file.delete(); // want to delete the file?? or just overwrite it??

} catch (IOException e) {
} finally {
try {
if (bw != null) bw.close();
} catch (IOException ex) {


This is almost certainly an OS filesystem issue; writing lots of files simply is slow. I recommend writing a comparison test in shell and in C to get an idea of how much the OS is contributing. Additionally, I would suggest two major tweaks:

  • Ensure the system this is running on is using an SSD. Latency from seeking for filesystem journaling will be a major source of overhead.
  • Multithread your writing process. Serialized, the OS can't perform optimizations like batch operation writing, and the FileWriter may block on the close() operation.

(I was going to suggest looking into NIO, but the APIs don't seem to offer much benefit for your situation, since setting up an mmapped buffer would probably introduce more overhead than it would save for this size.)