snowy_squirrel snowy_squirrel - 7 months ago 7
Perl Question

Compare multiple columns from a single file an count the occurrence of a value in these columns

I have a file with 21 columns;

Name Grade1 Grade2 Grade3 Grade4 Grade5 .... Grade20
Zoe 60 70 NA NA NA 67


Now, I want to keep only the names that have more than 5 Grades that are NA. Some names contains NA.

I know that awk can do the job. But I'm not sure how to compare all of the columns without having to compare them individually.

I tried:

more input_file.txt | awk 'total = count20[$2,$3,$4,$5,$6,$7,$8,$9,$10,
$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21]++, if (($2 == "NA" || $3 == "NA" ||
$4 == "NA" || $5 == "NA" || $6 == "NA" || $7 == "NA" || $8 == "NA" || $9 == "NA"
|| $10 == "NA" || $11 == "NA" || $12 == "NA" || $13 == "NA" || $14 == "NA" ||
$15 == "NA" || $16 == "NA" || $17 == "NA" || $18 == "NA" || $19 == "NA" ||
$20 == "NA" || $21 == "NA") && total > 4) { print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"
$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15"\t"$16"\t"$17"\t"
$18"\t"$19"\t"$20"\t"$21 }' > test.txt


It doesn't work and I'm not sure why or how to do it more efficiently.

Edit: to be more precise, the wanted output is a file containing name and all columns for students having more than 5 columns containing NA.

Answer

Here's a fairly basic way to do it using awk:

awk '{ count = 0; for (i = 2; i <= NF; ++i) if ($i == "NA") ++count } count > 5' file

This loops through each field and checks whether it is equal to "NA". If so, it adds to the total count for that record. Records are printed when the count is greater than 5.