Neha Neha - 1 year ago 57
Perl Question

Concatenate column values when header is Matching from multiple files

there can be n number of columns but the number of columns and header name will remain same in all 3 files. Files are tab Delimited.


Name 9/1 9/2
X 1 7
y 2 8
z 3 9
a 4 10
b 5 11
c 6 12


Name 9/1 9/2
X 13 19
y 14 20
z 15 21
a 16 22
b 17 23
c 18 24


Name 9/1 9/2
X 25 31
y 26 32
z 27 33
a 28 34
b 29 35
c 30 36

required output

Name 9/1 9/2
X 1/13/25 7/19/31
y 2/14/26 8/20/32
z 3/15/27 9/21/33
a 4/16/28 10/22/34
b 5/17/29 11/23/35
c 6/16/30 12/24/36

I Want to merge all three file on the basis of matching header and if values are same in the row then print one value
and if different then concatenate values from all three files under that column. like in file a.txt,b.txt,c.txt,
row 2 has same value under column name "NAME" so print only X but column "9/1" has different values so print one header " 9/1" and under it print all values from three files e.g. 1/13/25.

tried below

join <(sort a.txt) <(sort b.txt) <(sort c.txt) >out.txt

but it is printing all the rows from file a then starting file b then c
same thing with this

awk 'FNR==NR{a[$1]=$2;next} ($1 in a){ print $0, a[$1]}' a.txt b.txt c.txt

Answer Source

Update   Added a script as the solution, following a clarification.

This reads a line from all files at a time and joins the same field from each, for any number of files and columns. The first field of each line across all files is assumed the same. (Submit sorted files.)

use warnings 'all';
use strict;

# Implicit filehandles. Use readline() to read from them, not <>
my @fh = map { open my $fh, '<', $_  or die "Can't open $_: $!"; $fh } @ARGV;

while (1) 
    # Read the line from all files. Exit loop if any is undefined
    my @line = map { scalar readline $_ } @fh;
    last if  grep { not defined $line[$_] } 0..$#line;
    # Print first line (header) from one file and skip processing
    if ($. == 1) { 
        print $line[0];

    # Get the first column from one file, assumed the same for all  
    my @out_line = (split ' ', $line[0])[0];
    # Join same column from all files with '/', for all columns
    for my $i (1..$#line) {
        push @out_line, join '/', map { (split)[$i] } @line;
    print "@out_line\n";

Produces the required output with a.txt b.txt c.txt, given sorted input files.


  • Assumes the same first column. Exits as soon as one file is exhausted

  • Uses "implicit filehandles" -- read from it via readline, not <> operator. See in perlfaq5

  • The scalar readline $_ forces scalar context so one line is read

  • Leaves the newline. To remove it add chomp(@lines), after last if ...

  • The first two lines can be used to form the while (...) condition

Let me know if fuller explanations would be helpful.

Original post, condensed. Left here for possible usefulness of discussion of join

This gets you almost there with join, and completes the job with Perl.

First, join takes only two files at the time. So you can run it with the first two, then run it with the resulting file and the last one. We also need to tell it how to format output. This is done with its -o option, which allows us to list the elements of an output line as FileN.fieldN, ...

What you need is: 1.1 1.2 2.2 1.3 2.3 -- Fields 1,2 from File 1 then 2 from 2, etc.

join -o 1.1,1.2,2.2,1.3,2.3 a.txt b.txt  > ab.txt

File ab.txt has a different format so the -o format of lines changes for the next step

join -o 1.1,1.2,1.3,2.2,1.4,1.5,2.3 ab.txt c.txt  > abc.txt

This has two glitches -- headers are joined, fields merged with space. Fix with Perl one-liners.

Collect in a bash script. A working sketch, to run as a.txt b.txt c.txt


join -o 1.1,1.2,2.2,1.3,2.3         $1     $2  > tmp.$$
join -o 1.1,1.2,1.3,2.2,1.4,1.5,2.3 tmp.$$ $3  > abc.txt

# Remove extra fields in the header    
perl -i -wpe '$.==1 && s{(\d/\d) \1 \1}{$1}g'  abc.txt

# Replace space with `/` between every three numbers
perl -i -wpe 's{(\d+) (\d+) (\d+)}{$1/$2/$3}g' abc.txt

rm -f tmp.$$

Please choose temporary names more carefully and add error checking. The output abc.txt

Name 9/1 9/2
X 1/13/25 7/19/31
y 2/14/26 8/20/32
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download