prizmracer11 prizmracer11 - 2 months ago 10
Python Question

How to search for a different string in a different file using Python 3.x

I am trying to search a large group of text files (160K) for a specific string that changes for each file. I have a text file that has every file in the directory with the string value I want to search. Basically I want to use python to create a new text file that gives the file name, the string, and a 1 if the string is present and a 0 if it is not.

The approach I am using so far is to create a dictionary from a text file. From there I am stuck. Here is what I figure in pseudo-code:

**assign dictionary**
d = {}
with open('file.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)

**loop through directory**
for filename in os.listdir(os.getcwd()):

***here is where I get lost***
match file name to dictionary
look for string
write filename, string, 1 if found
write filename, string, 0 if not found


Thank you. It needs to be somewhat efficient since its a large amount of text to go through.

Here is what I ended up with

d = {}
with open('ibes.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)

import os

for filename in os.listdir(os.getcwd()):
string = d.get(filename, "!@#$%^&*")
if string in open(filename, 'r').read():
with open("ibes_in.txt", 'a') as out:
out.write("{} {} {}\n".format(filename, string, 1))
else:
with open("ibes_in.txt", 'a') as out:
out.write("{} {} {}\n".format(filename, string, 0))

Answer

As I understand your question, the dictionary relates file names to strings

d = {
 "file1.txt": "widget",
 "file2.txt": "sprocket", #etc
}

If each file is not too large you can read each file into memory:

for filename in os.listdir(os.getcwd()):
    string = d[filename]
    if string in open(filename, 'r').read():
        print(filename, string, "1")
    else: 
        print(filename, string, "0")

This example uses print, but you could write to a file instead. Open the output file before the loop outfile = open("outfile.txt", 'w') and instead of printing use

outfile.write("{} {} {}\n".format(filename, string, 1))

On the other hand, if each file is too large to fit easily into memory, you could use a mmap as described in Search for string in txt file Python

Comments