H.Burns H.Burns - 1 month ago 15
Perl Question

Deleting a line from a huge file in Perl

I have huge text file and first five lines of it reads as below :

This is fist line
This is second line
This is third line
This is fourth line
This is fifth line


Now, I want to write something at a random position of the third line of that file which will replace the characters in that line by the new string I am writing. I am able to achieve that with the below code :

use strict;
use warnings;

my @pos = (0);
open my $fh, "+<", "text.txt";

while(<$fh) {
push @pos, tell($fh);
}

seek $fh , $pos[2]+1, 0;
print $fh "HELLO";

close($fh);


However, I am not able to figure out with the same kind of approach how can I delete the entire third line from that file so that the texts reads below :

This is fist line
This is second line
This is fourth line
This is fifth line


I do not want to read the entire file into an array, neither do I want to use Tie::File. Is it possible to achieve my requirement using seek and tell ? A solution will be very helpful.

Answer

A file is a sequence of bytes. We can replace (overwrite) some of them, but how would we remove them? Once a file is written its bytes cannot be 'pulled out' of the sequence or 'blanked' in any way. (The ones at the end of the file can be dismissed, by indicating the end-of-file where we want.)

The rest of the content has to move 'down', so that what follows the text to be removed overwrites it. We have to rewrite the rest of the file. In practice it is often far simpler to rewrite the whole file.

As a very basic example

use warnings 'all';
use strict;
use File::Copy qw(mv);

my $file_in = '...';
my $file_out = '...';  # best use `File::Temp`

open my $fh_in,  '<', $file_in  or die "Can't open $file_in: $!";
open my $fh_out, '>', $file_out or die "Can't open $file_out: $!";

# Remove a line with $pattern
my $pattern = qr/this line goes/;

while (<$fh_in>) 
{
    print $fh_out $_  unless /$pattern/;
}
close $fh_in;
close $fh_out;

# Rename the new fie into the original one, thus replacing it
move ($file_out, $file_in) or die "Can't move $file_out to $file_in: $!";

This writes every line of input file into the output file, unless a line matches a given pattern. Then that file is renamed, replacing the original (what does not involve data copy). See this topic in perlfaq5.

Since we really use a temporary file I'd recommend the core module File::Temp for that.


This may be made more efficient, but far more complicated, by opening in update '+<' mode so to overwrite only a portion of the file. You iterate until the line with the pattern, record (tell) its position and the line length, then copy all remaining lines in memory. Then seek back to the position minus length of that line, and dump the copied rest of the file, overwriting what is on disk.

Note that now the data for the rest of the file is copied twice, albeit one copy is in memory. Going to this trouble may make sense if the line to be removed is far down the file. If there are more lines to remove this gets messier.

Comments