Paul Paul - 10 days ago 5
Linux Question

How to use uniq after printf

I have lot of file which I need to concatenate together with same prefix. I have an idea, but I do not know how to solve this problem:

files:

NAME1_C001_xxx.tsv
NAME1_C001_yyy.tsv
NAME2_C001_xxx.tsv
NAME2_C001_yyy.tsv


I want to print just uniq prefix -
NAME1 and NAME2
. Length of string in prefix and suffix is vary, but always before prefix is
_C001


my solution is:

fo i in *.tsv

do prexix=$(printf "%s\n" "${i%_C001*}")

cat $prefix_C001_xxx.tsv $prefix_C001_yyy.tsv > ${i%_C001*}.merged.tsv

done;


But this solution is not very good. I have each prefix twice.

Thank you for any help.

Answer

As your filenames don't contain any newline you can pipe your list to a awk command to print unique prefixes using field separator as _C001:

printf "%s\n" *.tsv | awk -F '_C001' '!seen[$1]++{print $1}'
NAME1
NAME2

You can also use _ as FS in awk:

printf "%s\n" *.tsv | awk -F _ '!seen[$1]++{print $1}'
Comments