anatoly techtonik anatoly techtonik - 3 months ago 16
Python Question

Encoding issue with ASCII-safe file with codec header, depending on line count

Here is magical bug of Python 3.5.2 on Windows that killed my day. File below fails on this system:


C:\Python35\python.exe encoding-problem-cp1252.py
File "encoding-problem-cp1252.py", line 2
SyntaxError: encoding problem: cp1252



Contains almost nothing - apart from the
coding
header there are a bunch of empty lines, but when any line is removed, even an empty one, it works again. I thought that it is a local problem, so I setup job on AppVeyor that showed the same behavior.

What's going on with Python?

There is a binary accurate version of the file below:

#!/usr/bin/env python
# -*- coding: cp1252 -*-


"""
There is nothing in this file, except that it is more
than 50 lines long. Running it with Python 3.5.2 on
Windows gives the following error:

>python encoding-problem-cp1252.py
File "encoding-problem-cp1252.py", line 2
SyntaxError: encoding problem: cp1252

>python
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:01:18) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

If you remove any lines from this file, it will
execute successfully.
"""



def restore(dump):
"""













"""
return



def main():
print('ok')



if __name__ == '__main__':
main()

Answer

This looks like a regression caused by issue #20731. I think your line count causes a full buffer to be filled and that throws off a seek made here:

fd = fileno(tok->fp);
/* Due to buffering the file offset for fd can be different from the file
 * position of tok->fp.  If tok->fp was opened in text mode on Windows,
 * its file position counts CRLF as one char and can't be directly mapped
 * to the file offset for fd.  Instead we step back one byte and read to
 * the end of line.*/
pos = ftell(tok->fp);
if (pos == -1 ||
    lseek(fd, (off_t)(pos > 0 ? pos - 1 : pos), SEEK_SET) == (off_t)-1) {
    PyErr_SetFromErrnoWithFilename(PyExc_OSError, NULL);
    goto cleanup;
}

The problem disappears when you convert your file to use Windows (CRLF) line endings, but I can understand that for cross-platform scripts that's not a practical solution.

I've filed issue #27797; this should be fixed in Python itself.