Java Question

How can I create an index of the positions of a byte in a file

I have this large file with the follow format:

Unique String


In my program I need to read this file to get the Information through the Unique String key. Since the performance is important, I can't read each line looking for the key everytime, besides I can't load the file in memory because it is too heavy. Then I'd like to read the file only once and then build an index with the String key and the position(in byte) of that in file. This index is something like a HashMap with the key been the Unique String and the value been the bytes in file where the key appears.

Seems that RandomAccessFile could do this, but I don't know how.

So, how can I build this index and then access an specific line by this index?

Answer Source

The way I am going to suggest is to read the file, and keep track of the position. Store the position along the way in a map so you can look it up later.

The first way to do this is to use your file as a DataInput, and use the RandomAccessFile#readline

RandomAccessFile raf = new RandomAccessFile("filename.txt", "r");
Map<String, Long> index = new HashMap<>();

Now, how is your data stored? If it is stored line by line, and the ecoding conforms to the DataInput standards, then you can use.

lond start = raf.getFilePointer();
String line = raf.readLine();
String key = extractKeyFromLine(line);
index.put(key, start);

Now anytime you need to go back and get the data.

long position = index.get(key);;
String line = raf.readLine();

There are many caveats to this. For example, if you need a more robust encoding, then the first time you read it you'll want to create a reader that can manage the encoding, and just use your RandomAccessFile as an input stream. readLine() can fail if the lines are too large. Then you would have to devise your own strategy for extracting the key/data pair.

