H.Sperling H.Sperling - 1 year ago 48
Bash Question

Merging fastq files by identifiers with a shell script

I have to merge files with the following naming pattern :


I need to merge all files with identical [SampleID] but different "Lanes" (L001-L004).
The following script works fine when directly run in the terminal:

wd="/path/to/script/" # was missing/ incorrect

# get ALL sample identifiers
touch temp1.txt
for line in $wd/*.fastq ; do
fastq_identifier=$(echo "$line" | cut -d"_" -f1);
echo $fastq_identifier >> temp1.txt

# get all uniqe samples identical
cat temp1.txt | uniq > temp2.txt
input_var=$(cat temp2.txt)

# concatenate all fastq (different lanes) with identical identifier
for line in $input_var; do
cat $line*fastq >> $line"_"$custom_id"_ID"$Run_ID"_L001_R1.fastq"
rm temp1.txt temp2.txt;

But if I create a script file (concatenate_fastq.sh) and make it executable

$ chomd +x concatenate_fastq.sh

and run it

$ ./concatenate_fastq.sh

I got the following error:

$ concatenate_fastq.sh: line 17: /*.fastq_000_ID_L001_R1.fastq: Keine Berechtigung # = Permission denied

Thx to your hints below I solved the problem by fixing



The immediate problem seems to be that wd is unset. If you script really genuinely contains exactly the line


then I would suspect invisible control characters in the script file (using a Windows editor is a common way to shoot yourself in the foot).

More generally, your script should cope correctly when the wildcard does not match any files. A common way to do that is to shopt -s nullglob but the subsequent script would still need adaptation then.

Refactoring the script to loop only over actual matches would help avoid trouble. Perhaps something like this:

printf '%s\n' "$wd"/*.fastq |
cut -d_ -f1 |
uniq |
while read line; do
    cat "$line"*fastq >> "${line}_${custom_id}_ID${Run_ID}_L001_R1.fastq"

You'll notice that this simplifies the script tremendously, and avoids the pesky temporary files.