LeibnizMan LeibnizMan - 1 month ago 13
Python Question

Running grep through Python - doesn't work

I have some code like this:

f = open("words.txt", "w")
subprocess.call(["grep", p, "/usr/share/dict/words"], stdout=f)
f.close()


I want to grep the MacOs dictionary for a certain pattern and write the results to
words.txt
. For example, if I want to do something like
grep '\<a.\>' /usr/share/dict/words
, I'd run the above code with
p = "'\<a.\>'"
. However, the subprocess call doesn't seem to work properly and
words.txt
remains empty. Any thoughts on why that is? Also, is there a way to apply regex to
/usr/share/dict/words
without calling a grep-subprocess?

edit:
When I run
grep '\<a.\>' /usr/share/dict/words
in my terminal, I get words like: aa
ad
ae
ah
ai
ak
al
am
an
ar
as
at
aw
ax
ay as results in the terminal (or a file if I redirect them there). This is what I expect
words.txt
to have after I run the subprocess call.

Answer

Like @woockashek already commented, you are not getting any results because there are no hits on '\<a.\>' in your input file. You are probably actually hoping to find hits for \<a.\> but then obviously you need to omit the single quotes, which are messing you up.

Of course, Python knows full well how to look for a regex in a file.

import re

rx = re.compile(r'\ba.\b')
with open('/usr/share/dict/words', 'Ur') as reader, open('words.txt', 'w') as writer:
    for line in reader:
        if rx.search(line):
            print(line, file=writer, end='')

The single quotes here are part of Python's string syntax, just like the single quotes on the command line are part of the shell's syntax. In neither case are they part of the actual regular expression you are searching for.

The subprocess.Popen documentation vaguely alludes to the frequently overlooked fact that the shell's quoting is not necessary or useful when you don't have shell=True (which usually you should avoid anyway, for this and other reasons).

Python unfortunately doesn't support \< and \> as word boundary operators, so we have to use (the functionally equivalent) \b instead.