TragicWhale TragicWhale - 2 months ago 16
Python Question

Traversing subfolder files?

I have wrote a script to erase a given word from docx files and am at my last hurdle of it checking subfolder items as well. Can someone help me in figuring out where I am failing in my execution. It works with all the files within the same directory but it won't also check subfolder items right now. Thanks for your help.

#!/usr/bin/env python3

# Search and Replace all docx

import os, docx

from docx import Document


findText = input("Type text to replace: ")

#replaceText = input('What text would you like to replace it with: ')


for dirs, folders, files in os.walk('.'):
for subDirs in dirs:
print('The Sub is ' + subDirs)
for fileNames in files:
print(subDirs + fileNames)
if fileNames.endswith('.docx'):
newDirName = os.path.abspath(subDirs)
fileLocation = subDirs + '\\' + fileNames
document = docx.Document(fileLocation)
print('Document is:' + fileLocation)

tables = document.tables
for table in tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
if findText in paragraph.text:
inline = paragraph.runs
for i in range(len(inline)):
if findText in inline[i].text:
text = inline[i].text.replace(findText, '')
inline[i].text = text

for paragraph in document.paragraphs:
if findText in paragraph.text:
inline = paragraph.runs
for i in range(len(inline)):
if findText in inline[i].text:
text = inline[i].text.replace(findText, '')
inline[i].text = text

document.save(fileLocation)

Answer

os.walk iterates through subdirectories yielding a 3-tuple (dirpath, dirnames, filenames) for each subdirectory visited. When you do:

for dirs, folders, files in os.walk('.'):
    for subDirs in dirs:

things go badly wrong. dirs is the name of the subdirectory in each iteration which means that for subDirs in dirs: is really enumerating the characters in the directory name. It so happens that the first directory you iterate is "." and just by luck its a single character directory name so your for loop appears to work.

As soon as you walk into another subdirectory (lets call it 'foo'), your code will try to find subdirectories called foo\f, foo\o and foo\o a second time. That doesn't work.

But you shouldn't be re-enumerating the subdirectories yourself. os.walk already does that. Boiling your code down to the enumeration part, this will find all of the .docx in the subtree.

#!/usr/bin/env python3

import os

for dirpath, dirnames, filenames in os.walk('.'):
    docx_files = [fn for fn in filenames if fn.endswith('.docx')]
    for docx_file in docx_files:
        filename = os.path.join(dirpath, docx_file)
        print(filename)
Comments