user3781528 user3781528 - 1 year ago 42
Perl Question

Adding custom header to specific files in a directory

I would like to add a unique one line header that pertains to each file FOCUS*.tsv file in a specified directory. After that, I would like to combine all of these files into one file.

First I’ve tried


`my $cmd9 = `sed -i '1i$SampleID[4]' $tsv_file`;` print $cmd9;

It looked like it worked but after I’ve combined all of these files into one file in the next section of the code, the inserted row was listed four times for each file.

I’ve tried the following Perl script to accomplish the same but it deleted the content of the file and only prints out the added header.

I’m looking for the simplest way to accomplish what I’m looking for.
Here is what I’ve tried.

use strict;
use warnings;
use Tie::File;

my $home="/data/";
my $tsv_directory = $home."test_all_runs/".$ARGV[0];
my $tsvfiles = $home."test_all_runs/".$ARGV[0]."/tsv_files.txt";

my @run_directory = (); @run_directory = split /\//, $tsv_directory; print "The run directory is #############".$run_directory[3]."\n";

my $cmd = `ls $tsv_directory/FOCUS*\.tsv > $tsvfiles`; #print "$cmd";
my $cmda = "ls $tsv_directory/FOCUS*\.tsv > $tsvfiles"; #print "$cmda";

my @tsvfiles =();
#this code opens the vcf_files.txt file and passes each line into an array for indidivudal manipulation
open(TXT2, "$tsvfiles");
while (<TXT2>){
push (@tsvfiles, $_);

foreach (@tsvfiles){

#this loop works fine
for my $tsv_file (@tsvfiles){

open my $in, '>', $tsv_file or die "Can't write new file: $!";
open my $out, '>', "$" or die "Can't write new file: $!";

$tsv_file =~ m|([^/]+)-oncomine.tsv$| or die "Can't extract Sample ID";
my $sample_id = $1;
#print "The sample ID is ############## $sample_id\n";
my $headerline = $run_directory[3]."/".$sample_id;
print $out $headerline;
while( <$in> ) {
print $out $_;

close $out;
close $in;

rename("$", $tsv_file);


Thank you

Answer Source

Apparently, the wrong '>' when opening the file for reading was the problem and it got solved.

However, I'd like to make a few comments on some of the rest of the code.

  • The list of files is built by running external ls redirected to a file, then reading this file into an array. However, that is exactly the job of glob and all of that is replaced by

    my @tsvfiles = glob "$tsv_directory/FOCUS*.tsv";

    Then you don't need the chomp either, and the chop that is used would actually hurt since it removes the last character, not only the newline (or really $/).

  • Use of chop is probably not what you want. If you are removing the linefeed ($/) use chomp

  • To extract a match and assign it, a common idiom is

    my ($sample_id) = $tsv_file =~ m|([^/]+)-oncomine.tsv$| 
        or die "Can't extract Sample ID: $!";

    Note that I also added $!, to actually print the error. Otherwise we just don't know what it was.

  • The unlink and rename appear to be overwriting one file with another. You can do that by using move from the core module File::Copy

    use File::Copy qw(move);
    move ($tsv_file_new, $tsv_file)  
        or die "Can't move $tsv_file to $tsv_file_new: $!";

    It will truncate the target file $tsv_file, renaming the _new one into it, so overwriting.

As for how the files need to be combined, more precise explanation would be needed.