Ramon Ramon - 4 months ago 27
Python Question

Error in converting multiple FASTA files to Nexus using Biopython

I want to convert multiple FASTA format files (DNA sequences) to the NEXUS format using BIO.SeqIO module but I get this error:

Traceback (most recent call last):
File "fasta2nexus.py", line 28, in <module>
print(process(fullpath))
File "fasta2nexus.py", line 23, in process
alphabet=IUPAC.ambiguous_dna)
File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 1003, in convert
with as_handle(in_file, in_mode) as in_handle:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/Library/Python/2.7/site-packages/Bio/File.py", line 88, in as_handle
with open(handleish, mode, **kwargs) as fp:
IOError: [Errno 2] No such file or directory: 'c'


What am I missing?

Here is my code:

##!/usr/bin/env python

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC


test = "/Users/teton/Desktop/test"

files = os.listdir(os.curdir)

def process(filename):
# retuns ("basename", "extension"), so [0] picks "basename"
base = os.path.splitext(filename)[0]
return SeqIO.convert(filename, "fasta",
base + ".nex", "nexus",
alphabet=IUPAC.ambiguous_dna)

for files in os.listdir(test):
for file in files:
fullpath = os.path.join(file)
print(process(fullpath))

Answer

This code should solve the majority of problems I can see.

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC

test = "/Users/teton/Desktop"

def process(filename):
    # retuns ("basename", "extension"), so [0] picks "basename"
    base = os.path.splitext(filename)[0] 
    return SeqIO.convert(filename, "fasta", 
                         base + ".nex", "nexus", 
                         alphabet=IUPAC.ambiguous_dna)

for root, dirs, files in os.walk(test):
    for file in files:
        fullpath = os.path.join(root, file)
        print(process(fullpath))

I changed a few things. First, I ordered your imports (personal thing) and made sure to import IUPAC from Bio.Alphabet so you can actually assign the correct alphabet to your sequences. Next, in your process() function, I added a line to split the extension off the filename, then used the full filename for the first argument, and just the base (without the extension) for naming the Nexus output file. Speaking of which, I assume you'll be using the Nexus module in later code? If not, you should remove it from the imports.

I wasn't sure what the point of the last snippet was, so I didn't include it. In it, though, you appear to be walking the file tree and process()ing each file again, then referencing some undefined variable named count. Instead, just run process() once, and do whatever count refers to within that loop.

You may want to consider adding some logic to your for loop to test that the file returned by os.path.join() actually is a FASTA file. Otherwise, if any other file type is in one of the directories you search and you process() it, all sorts of weird things could happen.

EDIT

OK, based on your new code I have a few suggestions. First, the line

files = os.listdir(os.curdir)

is completely unnecessary, as below the definition of the process() function, you're redefining the files variable. Additionally, the above line would fail, as you are not calling os.curdir(), you are just passing its reference to os.listdir().

The code at the bottom should simply be this:

for file in os.listdir(test):
    print(process(file))

for file in files is redundant, and calling os.path.join() with a single argument does nothing.