N4v N4v - 2 months ago 7
Python Question

Python - Gathering File Information Recursively Leads to Memory Error

I am writing a script to recurse over all directories and subdirectories of a starting folder but I'm running into memory errors (the error is

MemoryError
). My guess would be that maybe my
data_dicts
list is getting too big but I'm not sure. Any advice would be appreciated.

import os

# example data dictionary
data_dict = {
'filename': 'data.csv',
'folder': 'R:/',
'size': 300000
}

def get_file_sizes_folder(data_dicts, starting_folder):
# Given a list of file information dictionaries and a folder, iterate over the files
# in the folder to get their information and append it to the list.
# Also recurse through subdirectories
for entry in os.scandir(starting_folder):
if not entry.name.startswith('.'):
if entry.is_file():
size = entry.stat().st_size
filename = entry.name
folder = os.path.dirname(entry.path)
temp_dict = {'filename': filename, 'size': size, 'folder': folder}
data_dicts.append(temp_dict.copy())
else:
print(entry.path)
data_dicts.extend(get_file_sizes_folder(data_dicts, entry.path))

return data_dicts

d = get_file_sizes_folder([], 'R:/')

Answer

You shouldn't supply data_dicts as an argument to your function get_file_sizes_folder(). Doing so will produce many, many duplicates of your entries, at a rate that is probably nearly factorial. No wonder that your computer runs out of memory very quickly!

Instead, use only starting_folder as an argument, and simply create a new list data_dicts in the first line of your function, like so:

def get_file_sizes_folder(starting_folder):
# Given a list of file information dictionaries and a folder, iterate over the files
# in the folder to get their information and append it to the list. 
# Also recurse through subdirectories
    data_dicts = []
    for entry in os.scandir(starting_folder):
        if not entry.name.startswith('.'):
            if entry.is_file():
                size = entry.stat().st_size
                filename = entry.name
                folder = os.path.dirname(entry.path)
                temp_dict = {'filename': filename, 'size': size, 'folder': folder}
                data_dicts.append(temp_dict)
            else:
                print(entry.path)
                data_dicts.extend(get_file_sizes_folder(entry.path))

    return data_dicts