uzumaki uzumaki - 5 months ago 15
Python Question

Unexpected design choice in Python built-in

I tried to read from and write to a config file like so:

with open(file, 'r') as ini:
config = ConfigParser()
config.read(ini)
config['section'] = {'foo':45}
with open(file, 'w') as ini:
config.write(ini)


I just couldn't get it to read the already saved data. It took me quite a while to find out that I actually have to read from the file like this:

config.read(file)


Why is it not consistent? Why do I have to read and write in two different ways? Is there a good reason for this kind of design choice, something about I/O that I don't know yet?

And why did it not raise an exception when I tried to read from a file buffer?

Are there other built-ins that are inconsistent in file handling?

Answer

From the docs there are actually two (sort of three if you count the list argument) ways to do it:

import ConfigParser, os

config = ConfigParser.ConfigParser()
config.readfp(open('defaults.cfg'))
config.read(['site.cfg', os.path.expanduser('~/.myapp.cfg')])

The filename version also allows you to pass multiple filenames in and the ConfigParser will automatically handle (ignore) the files that are missing. This is common idiom for config files where you might have a default config file and then a locally defined one.

Why it is inconsistent that it is called readfp but write and not writefp...? You could read up on its history. Also here.

In the end, designs (and designers) aren't always perfect, idiosyncrasies aren't always caught, but once it becomes a standard lib, the interface is frozen.

We can look at the ConfigParser source to see why read() with a fileobject is silently ignored:

def read(self, filenames):
    """Read and parse a filename or a list of filenames.

    Files that cannot be opened are silently ignored; this is
    designed so that you can specify a list of potential
    configuration file locations (e.g. current directory, user's
    home directory, systemwide directory), and all existing
    configuration files in the list will be read.  A single
    filename may also be given.

    Return list of successfully read files.
    """
    if isinstance(filenames, basestring):
        filenames = [filenames]
    read_ok = []
    for filename in filenames:
        try:
            fp = open(filename)
        except IOError:
            continue
        self._read(fp, filename)
        fp.close()
        read_ok.append(filename)
    return read_ok

Oh -- wow -- this is going to be interesting!

Your fileobject isn't a basestring, so it assumes it must be iterable. It then iterates on the fileobject. Which means it reads your file for a list of filenames it tries to open.

For example, I made a file f and filled it with a-g, each letter per line. strace shows:

open("f", O_RDONLY)                     = 3
open("a\n", O_RDONLY)                   = -1 ENOENT (No such file or directory)
open("b\n", O_RDONLY)                   = -1 ENOENT (No such file or directory)
open("c\n", O_RDONLY)                   = -1 ENOENT (No such file or directory)
open("d\n", O_RDONLY)                   = -1 ENOENT (No such file or directory)
open("e\n", O_RDONLY)                   = -1 ENOENT (No such file or directory)
open("f\n", O_RDONLY)                   = -1 ENOENT (No such file or directory)
open("g\n", O_RDONLY)                   = -1 ENOENT (No such file or directory)

...and since the API is designed to ignore files it can't open, it just ignores the errors.

From the docs,

If none of the named files exist, the ConfigParser instance will contain an empty dataset. An application which requires initial values to be loaded from a file should load the required file or files using readfp() before calling read() for any optional files:

This behavior is surprising enough that I've filed http://bugs.python.org/issue27351 to ensure they are aware of this edge case.