Fondah Fondah - 1 month ago 9
Perl Question

Delete subfolders and files by # of days using Perl

New to Perl, used only 3 times. I need to delete files and subfolders from a parent directory when they are over a week old. I've deleted files before using -M but never worked with subfolders. When I run the details below, no files are deleted from the subfolders and files over a week old exist in the subfolders. The testing messages show 'myAge' is zero, for ALL files in the subfolders. Not sure what I'm missing. Any assistance would be very much appreciated.

msg ("\n");
msg ("Start: \n");


my $parent = 'C:/temp/XYZ';
my ($par_dir, $sub_dir);

opendir($par_dir, $parent);
msg " parent is $parent \n";

while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..

my $path = $parent . '/' . $sub_folders;

next unless (-d $path); # skip anything that isn't a directory
next unless ( -M $subfolder < 7 );

msg " subfolder is $sub_folders is old enough to delete \n";

opendir($sub_dir, $path);
while (my $file = readdir($sub_dir)) {

# for testing
my $myAge = (-M $file) ;
msg " age ... $myAge __ file ... $file\n" ;

if ( -M $file > 7 ) {
msg " going to delete this file... $file \n";
} else {
msg " will keep this file not old enough $file\n";
}

}
closedir($sub_dir);
}
closedir($par_dir);

Answer

If this is really only the 3rd time you use Perl, then Congratulations! But there are some issues in your code:

  • Always add use strict; and use warnings; to your code. This will warn (beyond others) about undefined variables and eliminate common errors.
  • You had a typo with $sub_folders vs. $subfolder. use strict; and use warnings; would have shown that.
  • The return values of readdir do NOT contain the parent directory. The docs say:

    If you're planning to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir there, it would have been testing the wrong file.

    I assume that's exactly what happened here.

I changed your code a bit by prepending the directory to the filenames and it seems to work now. I also wrote a msg function that simply prints the given parameters. You would omit that if you have a "real" msg function.

#!/usr/bin/env perl

use strict;
use warnings;

sub msg
{
    print @_;
}

msg("\n");
msg("Start: \n");

my $parent = 'C:/temp/XYZ';
my ( $par_dir, $sub_dir );

opendir( $par_dir, $parent ) or die "cannot opendir $parent: $!\n";;
msg " parent is $parent \n";

while ( my $sub_folders = readdir($par_dir) ) {
    next if ( $sub_folders =~ /^..?$/ );    # skip . and ..

    my $path = "$parent/$sub_folders";

    next unless ( -d $path );               # skip anything that isn't a directory
    next unless ( -M $path < 7 );

    msg " subfolder is $sub_folders is old enough to delete \n";

    opendir( $sub_dir, $path ) or die "cannot opendir $path: $!\n";
    while ( my $file = readdir($sub_dir) ) {

        # for testing
        my $myAge = ( -M "$path/$file" );
        msg " age ... $myAge __ file ...  $path/$file\n";

        if ( -M "$path/$file" > 7 ) {
            msg " going to delete this file...  $path/$file \n";
        } else {
            msg " will keep this file not old enough $path/$file\n";
        }

    }
    closedir($sub_dir);
}
closedir($par_dir);

There are some things that can be improved in this code.

  1. I wouldn't check the directories for their modification time and drop the next unless ( -M $path < 7 );. My impression is that the attributes of directories (size, times) change at will — at least I couldn't ever figure out a pattern but perhaps I'm just too dumb for that.
  2. To speed up things, the -X operator (like -d, -M, etc.) caches the result of the last file. So instead of writing

    next unless ( -d $path );
    next unless ( -M $path < 7 );
    

    you could write

    next unless ( -d $path );
    next unless ( -M _ < 7 ); # the '_' means: get '-M' of $path
    

    See -X for details. Basically -X fetches all attributes of a given file (like size, type, access time, modification time, etc) at once. If you pass an underscore _ as the filename in subsequent calls, then the results of the previous call (with the real filename) are returned and another (expensive) systemcall is saved.

  3. The algorithm only considers one directory below the starting directory, i.e. it doesn't work recursively. Depending on what you actually want, this might or might not be OK.