Andreas Tosstorff Andreas Tosstorff - 4 months ago 9
Bash Question

csplit prefix as file context

I wrote a bash script in order to split a file. The file looks like this:

@<TRIPOS>MOLECULE
ZINC32514653
....
....

@<TRIPOS>MOLECULE
ZINC982347645
....
....


Here is the script I wrote:

#!/bin/bash
#split the file into files named xx##.mol2
csplit -b %d.mol2 ./Zincpharmer_ligprep_1.mol2 '/@<TRIPOS>MOLECULE/' '{*}'
#rename all files called xx##.mol2 by their 2nd line which is ZINC######
for filename in ./xx*.mol2;
do
newFilename=$(echo $filename | sed -n 2p $filename)
if [ ! -e "./$newFilename.mol2" ]; then
mv -i $filename ./$newFilename.mol2

else
num=2
while [ -e "./"$newFilename"_$num.mol2" ]; do
num=$((num+1))
done
mv $filename "./"$newFilename"_$num.mol2"
fi
done


I have two questions:

1) is there a way to include the prefix option into csplit and telling csplit that the prefix is the line after the seperator.

2) the first line created by csplit xx00 is an empty file, as the separator is in the first line. How can I avoid this?

The expected output would be files named ZINC32514653.mol2 and ZINC982347645.mol2. An in case there a two entries with the same ZINC### ZINC982347645_2.mol2.

Answer

All you need to know if available from this man csplit page:-

To tell csplit to change the prefix:-

-f, --prefix=PREFIX
       use PREFIX instead of 'xx'

To exclude empty files:-

-z, --elide-empty-files
       remove empty output files