Segsfault Segsfault - 7 months ago 28
Python Question

Improper use of __new__ to generate classes in Python?

I'm creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data source (i.e. "nfs://192.168.1.3" or "s3://mybucket/data") etc.

I'm subclassing the specific filesystems from a base class that has common code. Where I'm confused is in the object creation. What I have is the following:

import os

class FileSystem(object):
class NoAccess(Exception):
pass

def __new__(cls,path):
if cls is FileSystem:
if path.upper().startswith('NFS://'):
return super(FileSystem,cls).__new__(Nfs)
else:
return super(FileSystem,cls).__new__(LocalDrive)
else:
return super(FileSystem,cls).__new__(cls,path)

def count_files(self):
raise NotImplementedError

class Nfs(FileSystem):
def __init__ (self,path):
pass

def count_files(self):
pass

class LocalDrive(FileSystem):
def __init__(self,path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path

def count_files(self):
return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])

data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')

print type(data1)
print type(data2)

print data2.count_files()


I thought this would be a good use of
__new__
but most posts I read about it's use discourage it. Is there a more accepted way to approach this problem?

Answer

I don't think using __new__() to do what you want is improper. You may have misunderstood what's being said in that question you mentioned. It's was just that using it wasn't a good way to do what the OP (original poster) was trying to accomplish, not in general.

If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method. Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.

That said, below is what I think is an improved version of your code. I've added a class method and a static utility method to assist in automatically finding all the subclasses. This supports the most important way in which it's better is that now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively has what you could call virtual constructors.

A similar implementation could also be used to move the creation of instances out of __new__ method into another if you really wanted to avoid using it — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's assigned.

import os
import re

class FileSystem(object):
    class NoAccess(Exception): pass
    class Unknown(Exception): pass

    @classmethod
    def _get_all_subclasses(cls):
        """ Recursively generate of all the subclasses of class cls. """
        for subclass in cls.__subclasses__():
            yield subclass
            for subclass in subclass._get_all_subclasses():
                yield subclass

    @staticmethod
    def _get_prefix(s):
        """ Extract any file system prefix at beginning of string s and
            return a lowercase version of it or None when there isn't one.
        """
        match = re.match(r'\s*([^:]+)://', s)  # matches "xxx:" if x is any char
        return match.group(1).lower() if match else None

    def __new__(cls, path):
        path_prefix = cls._get_prefix(path)
        for subclass in cls._get_all_subclasses():
            if subclass.prefix == path_prefix:
                return object.__new__(subclass , path)  # using "object" avoids recursion
        else:  # no subclass with matching prefix found (or with prefix of None)
            raise FileSystem.Unknown(
                'path "{}" has no known file system prefix'.format(path))

    def count_files(self):
        raise NotImplementedError

class Nfs(FileSystem):
    prefix = 'nfs'

    def __init__ (self, path):
        pass

    def count_files(self):
        pass

class LocalDrive(FileSystem):
    prefix = None  # default when no file system prefix is found

    def __init__(self, path):
        if not os.access(path, os.R_OK):
            raise FileSystem.NoAccess('Cannot read directory')
        self.path = path

    def count_files(self):
        return len([x for x in os.listdir(self.path)
                        if os.path.isfile(os.path.join(self.path, x))])

data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/')  # change as necessary for testing

print type(data1)
print type(data2)

print data2.count_files()
Comments