Alex Alex - 4 months ago 64
Python Question

Upload large file is too slow

I have video 450mb. I would like to upload it to I use in my script

xvideos_log_data = {'login': xv_login,
'password': password,
'referer': '',
'log': 'Login to your account'}

def xvideos(f_path):
_print('xvideos started uploading...')

s = requests.Session()'', data=xvideos_log_data, headers=headers)
rp = s.get('')
apc ='onclick="launch_upload_basic\(\'(.*?)\'\)', rp.text).group(1)

payload = {'APC_UPLOAD_PROGRESS': apc,
'message': ''}
r ='',
files={'upload_file': open(f_path, 'rb')}, headers=headers)
edt ='<a href="(.*?)" target="_top"', r.text)
if edt is None:
_print('inlineError.*>(.*?)<', r.text).group(1))
payload = {'title': make_title(),
'keywords': ' '.join(make_tags()),
'description': choice(description),
'hide': 0,
'update_video_information': 'Update information'}
r ='' +, data=payload, headers=headers)

_print('xvideos finished uploading')

except Exception as error:


The problem is that uploading is very slow, but successful. I launch script on my server. When I try to upload in browser - it's fast.

what could be the problem?


The problem is very probably the Python httplib code beneath the requests library.

It was horrible for chunked encoding streaming in older Python versions (2.2), now it is just pretty bad. By replacing it with a custom built http layer directly on the socket and handling buffers better, i could get an application to stream with 2% CPU and like full link utilization on a fast network link. Httplib could only achieve like 1 MB/s with 50% or more CPU usage due to very inefficient buffering. httplib is fine for short requests, but not so good for huge uploads (without tweaking/hacking).

You can try a few things to make things better, depending on your network and OS setup:

  1. Tune your socket buffers via setsockoption SO_SNDBUF, if you don't need many connections and have a fast network, something like 4 MB or more is possible, to reduce problems with always empty buffers on fast pipes (10GE and more)

  2. Use a different http library (pycurl or Twisted with some patches for example) and use larger buffers for transfers, e.g. make every socket.send() call move a few MB of data and not some tiny 4kB buffers.

Python can nearly fully utilize a 10 GE link, if done right.