Mike Brown Mike Brown - 4 months ago 6
Linux Question

How to print the lines that contains certain strings by order?

I have two files

file

indv


COPDGene_P51515
COPDGene_V67803
COPDGene_Z75868
COPDGene_U48329
COPDGene_R08908
COPDGene_E34944


file
data


COPDGene_Z75868 1
COPDGene_A12318 3
COPDGene_R08908 5
COPDGene_P51515 8
COPDGene_U48329 2
COPDGene_V67803 8
COPDGene_E34944 2
COPDGene_D29835 9


I want to print the lines that contains the strings in the
indv
by the order of
indv
like following

COPDGene_P51515 8
COPDGene_V67803 8
COPDGene_Z75868 1
COPDGene_U48329 2
COPDGene_R08908 5
COPDGene_E34944 2


I tried to use

awk 'NR==FNR{a[$1]++;next} ($1 in a)' indv data


But I got

COPDGene_Z75868 1
COPDGene_R08908 5
COPDGene_P51515 8
COPDGene_U48329 2
COPDGene_V67803 8
COPDGene_E34944 2


which is not the order of
indv
.

Answer
$ awk 'FNR==NR{a[$1]=$0;next;} {print a[$1]}' data indv
COPDGene_P51515  8
COPDGene_V67803  8
COPDGene_Z75868  1
COPDGene_U48329  2
COPDGene_R08908  5
COPDGene_E34944  2

How it works

  • FNR==NR{a[$1]=$0;next;}

    For the first file read, data, save each line in associative array a under the index of its first field, $1. Skip the rest of the commands and start over on the next line.

  • print a[$1]

    If we get here, we are working on the second file, indv. For this file, print each line from data that corresponds to the first field on this line. In this way, the contents of each line is controlled by data but the order of printing is controlled by indv.

Comments