mclopez mclopez - 3 months ago 14
Java Question

Fastest way to load huge text file into a int array

I have a big text file (+100MB), each line being an integer number (containing 10 million numbers). Of course, the size and amount may change, so I don't know this in advance.

I want to load the file into a

int[]
, making the process as fast as posible. First I came to this solution:

public int[] fileToArray(String fileName) throws IOException
{
List<String> list = Files.readAllLines(Paths.get(fileName));
int[] res = new int[list.size()];
int pos = 0;
for (String line: list)
{
res[pos++] = Integer.parseInt(line);
}
return res;
}


It was pretty fast, 5.5 seconds. Of which, 5.1s goes for the
readAllLines
call, and 0.4s for the loop.

But then I decided to try using BufferedReader, and came to this different solution:

public int[] fileToArray(String fileName) throws IOException
{
BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(fileName)));
ArrayList<Integer> ints = new ArrayList<Integer>();
String line;
while ((line = bufferedReader.readLine()) != null)
{
ints.add(Integer.parseInt(line));
}
bufferedReader.close();

int[] res = new int[ints.size()];
int pos = 0;
for (Integer i: ints)
{
res[pos++] = i.intValue();
}
return res;
}


This was even faster! 3.1 seconds, just 3s for the
while
loop and not even 0.1s for the
for
loop.

I know there is no much space here for optimization, at least in time, but using an ArrayList and then a int[] seems like too much memory to me.

Any ideas on how to make this faster, or avoid using the middle ArrayList?

Just for comparison, I do this same task with FreePascal in 1.9 seconds, using
TStringList
class and
StrToInt
function.

Answer

If you're using Java 8, you can eliminate this middle ArrayList by using lines() and then mapping to an int, then collecting the values into an array.

You should also be using try-with-resources for proper exception handling and auto-closing.

try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
    return br.lines()
             .mapToInt(Integer::parseInt)
             .toArray();
}

I'm not sure if this is faster, but it is certainly much easier to maintain.

Edit: It is apparently MUCH faster.