Content-disposition header contains filename which can be easily extracted, but sometimes it contains double quotes, sometimes no quotes and there are probably some other variants too. Can someone write a regex which works in all the cases.
Content-Disposition: attachment; filename=content.txt
attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates
and some other combinations might also be there
You could try something in this spirit:
filename[^;=\n]*=((['"]).*?\2|[^;\n]*) filename # match filename, followed by [^;=\n]* # anything but a ;, a = or a newline = ( # first capturing group (['"]) # either single or double quote, put it in capturing group 2 .*? # anything up until the first... \2 # matching quote (single if we found single, double if we find double) | # OR [^;\n]* # anything but a ; or a newline )
Your filename is in the first capturing group: http://regex101.com/r/hJ7tS6