CSJ CSJ - 3 months ago 15
Python Question

use django to serve downloading big zip file with some data appended

I have a views snippet like below, which get a zip filename form a request, and I want to append some string

sign
after the end of zip file

@require_GET
def download(request):
... skip
response = HttpResponse(readFile(abs_path, sign), content_type='application/zip')
response['Content-Length'] = os.path.getsize(abs_path) + len(sign)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
return response


and the
readFile
function as below:

def readFile(fn, sign, buf_size=1024<<5):
f = open(fn, "rb")
logger.debug("started reading %s" % fn)
while True:
c = f.read(buf_size)
if c:
yield c
else:
break
logger.debug("finished reading %s" % fn)
f.close()
yield sign


It works fine when using
runserver
mode, but failed on big zip file when I use
uwsgi + nginx
or
apache + mod_wsgi
.

It seems timeout because need too long time to read a big file.

I don't understand why I use
yield
but the browser start to download after whole file read finished.(Because I see the browser wait until the log
finished reading %s
appeared)

Shouldn't it start to download right after the first chunk read?

Is any better way to serve a file downloading function that I need to append a dynamic string after the file?

Answer

Django doesn't allow streaming responses by default so it buffers the entire response. If it didn't, middlewares couldn't function the way they do right now.

To get the behaviour you are looking for you need to use the StreamingHttpResponse instead.

Usage example from the docs:

import csv

from django.utils.six.moves import range
from django.http import StreamingHttpResponse

class Echo(object):
    """An object that implements just the write method of the file-like
    interface.
    """
    def write(self, value):
        """Write the value by returning it, instead of storing in a buffer."""
        return value

def some_streaming_csv_view(request):
    """A view that streams a large CSV file."""
    # Generate a sequence of rows. The range is based on the maximum number of
    # rows that can be handled by a single sheet in most spreadsheet
    # applications.
    rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
    pseudo_buffer = Echo()
    writer = csv.writer(pseudo_buffer)
    response = StreamingHttpResponse((writer.writerow(row) for row in rows),
                                     content_type="text/csv")
    response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
    return response