tumbleweed tumbleweed - 1 month ago 9
Python Question

How to assign the elements of a list as file names in python?

I am trying to assign the elements of a list as names for some files that live in a directory, so far I created a function that recover the name of a each file from a directory and returns them in a list:

def retrive(directory_path):
path_names = []
for filename in sorted(glob.glob(os.path.join(directory_path, '*.pdf'))):
retrieved_files = filename.split('/')[-1]
path_names.append(retrieved_files)
print (path_names)


The above function returns in a list the names of each file, then I am writing the files into another directory as follows:

path = os.path.join(new_dir_path, "list%d.txt" % i)
#This is the path of each new file:
#print(path)
with codecs.open(path, "w", encoding='utf8') as filename:
for item in [a_list]:
filename.write(item+"\n")


Finally, my question is: how can I assign as a name of each file, each element of
path_names
?, something like this line:

path = os.path.join(new_dir_path, "list%d.txt" % i)


I also tried to use the
format()
function. However I still cant assign the the correct name to each file.

Here's the full script:

def transform_directoy(input_directory, output_directory):
import codecs, glob, os
from tika import parser
all_texts = []
for filename in sorted(glob.glob(os.path.join(input_directory, '*.pdf'))):
parsed = parser.from_file(filename)
texts = parsed['content']
all_texts.append(texts)

for i , a_list in enumerate(all_texts):
new_dir_path = output_directory


#print(new_dir_path)
path = os.path.join(new_dir_path, "list%d.txt" % i)
with codecs.open(path, "w", encoding='utf8') as filename:
for item in [a_list]:
filename.write(item+"\n")


The desired output will consist of the actual names of each processed file.

2ps 2ps
Answer

You’re almost there:

for path_name in path_names:
    path = os.path.join(new_dir_path, "list%s.txt" % path_name)
    #This is the path of each new file:
    #print(path)
    with codecs.open(path, "w", encoding='utf8') as f:
        for item in [a_list]:
            f.write(item+"\n")

Update based on updated code sample. You are using different loops here, and that is not ideal unless you are doing processing in between the two loops. Since I am going to keep that structure, we are going to have to make sure to associate each block of content with the original filename. The best structure for that is a dict, and in case order is important, we use an OrderedDict. Now, when we’re looping over the filename, content pairs in the OrderedDict we’ll want to change the extension of the file to match the new file type. Luckily, python has some nice utilities for file/path manipulation in the os.path module. os.path.basename can be used to strip off the directory from a file and os.path.splitext will strip off an extension from a filename. We use both of those to get just the filename without the extension and then append .txt to designate the new file type. Putting it all together, we get :

def transform_directoy(input_directory, output_directory):    
    import codecs, glob, os
    from collections import OrderedDict
    from tika import parser
    all_texts = OrderedDict()
    for filename in sorted(glob.glob(os.path.join(input_directory, '*.pdf'))):
        parsed = parser.from_file(filename)
        filename = os.path.basename(filename)
        texts = parsed['content']
        all_texts[filename] = texts

    for i, (original_filename, a_list) in enumerate(all_texts.items()):
        new_filename, _ = os.path.splitext(original_filename)
        new_filename += '.txt'
        new_dir_path = output_directory

        #print(new_dir_path)
        path = os.path.join(new_dir_path, new_filename)
        # Print out the name of the file we are processing
        print('Transforming %s => %s' % (original_filename, path,))
        with codecs.open(path, "w", encoding='utf8') as filename:
            for item in [a_list]:
                filename.write(item+"\n")