Josh Whittington Josh Whittington - 2 months ago 12
Python Question

wget: How do I specify both --directory-prefix AND --output-document

When I use either the

-P
or
-O
alone with
wget
, everything works as advertised.

$: wget -P "test" http://www.google.com/intl/en_com/images/srpr/logo3w.png
Saving to: `test/logo3w.png'


.

$: wget -O "google.png" http://www.google.com/intl/en_com/images/srpr/logo3w.png
2012-01-23 21:47:33 (1.20 MB/s) - `google.png' saved [7007/7007]


However, combining the two causes
wget
to ignore
-P
.

$: wget -P "test" -O "google.png" http://www.google.com/intl/en_com/images/srpr/logo3w.png
2012-01-23 21:47:51 (5.87 MB/s) - `google.png' saved [7007/7007]


I've set a variable for both the directory (generated by the last chunk of the URL) and the filename (generated through a counting loop) such that
http://www.google.com/aaa/bbb/ccc
yields
file
=
/directory/filename
, or, for item 1,
/ccc/000.jpg


When substituting this in to the code:

Popen(['wget', '-O', file, theImg], stdout=PIPE, stderr=STDOUT)


wget
silently fails (on each iteration of the loop).

When I turn on debugging
-d
and logging
-a log.log
, each iteration prints

DEBUG output created by Wget 1.13.4 on darwin10.8.0.


When I remove the
-O
and
file
, the operation proceeds normally.

My question is:
Is there a way to

A) Specify both
-P
AND
-O
in
wget
(preferred) or

B) Insert a string to
-O
containing
/
-characters that doesn't cause it to fail?

Any help would be appreciated.

Answer

You should just pass dir/000.jpg to -O of wget:

import subprocess
import os.path

subprocess.Popen(['wget', '-O', os.path.join(directory, filename), theImg])

It's not completely clear from your question whether you were already doing something similar to this, but if you were and it still failed, I can think of two reasons:

  • The argument to -O contains a leading /, making wget fail because it doesn't have permission to randomly create directories in / (root).

  • The directory you're telling wget to write to doesn't exist. You can make sure it exists by creating it first using os.mkdir in the Python standard library.

You can also try removing the arguments stdout= and stderr= from the Popen call so you can see the errors directly, or print them using Python.