Brian Brian - 9 months ago 30
Python Question

How to organize multiple python files into a single module without it behaving like a package?

Is there a way to use

__init__.py
to organize multiple files into a module?

Reason: Modules are easier to use than packages, because they don't have as many layers of namespace.

Normally it makes a package, this I get. Problem is with a package, 'import thepackage' gives me an empty namespace. Users must then either use "from thepackage import *" (frowned upon) or know exactly what is contained and manually pull it out into a usable namespace.

What I want to have is the user do 'import thepackage' and have nice clean namespaces that look like this, exposing functions and classes relevant to the project for use.

current_module
\
doit_tools/
\
- (class) _hidden_resource_pool
- (class) JobInfo
- (class) CachedLookup
- (class) ThreadedWorker
- (Fn) util_a
- (Fn) util_b
- (Fn) gather_stuff
- (Fn) analyze_stuff


The maintainer's job would be to avoid defining the same name in different files, which should be easy when the project is small like mine is.

It would also be nice if people can do
from doit_stuff import JobInfo
and have it retrieve the class, rather than a module containing the class.

This is easy if all my code is in one gigantic file, but I like to organize when things start getting big. What I have on disk looks sort of like this:

place_in_my_python_path/
doit_tools/
__init__.py
JobInfo.py
- class JobInfo:
NetworkAccessors.py
- class _hidden_resource_pool:
- class CachedLookup:
- class ThreadedWorker:
utility_functions.py
- def util_a()
- def util_b()
data_functions.py
- def gather_stuff()
- def analyze_stuff()


I only separate them so my files aren't huge and unnavigable. They are all related, though someone (possible me) may want to use the classes by themselves without importing everything.

I've read a number of suggestions in various threads, here's what happens for each suggestion I can find for how to do this:

If I do not use an
__init__.py
, I cannot import anything because Python doesn't descend into the folder from sys.path.

If I use a blank
__init__.py
, when I
import doit_tools
it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

If I list the submodules in
__all__
, I can use the (frowned upon?)
from thing import *
syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use
from x import *
instead of
import x
, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

If I add
from thatfile import X
statements to
__init__.py
, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there. In the below example, you'll see that:


  1. The class JobInfo overwrote the module object named JobInfo because their names were the same. Somehow Python can figure this out, because JobInfo is of type
    <class 'doit_tools.JobInfo.JobInfo'>
    . (doit_tools.JobInfo is a class, but doit_tools.JobInfo.JobInfo is that same class... this is tangled and seems very bad, but doesn't seem to break anything.)

  2. Each filename made its way into the doit_tools namespace, which makes it more confusing to look through if anyone is looking at the contents of the module. I want doit_tools.utility_functions.py to hold some code, not define a new namespace.



.

current_module
\
doit_tools/
\
- (module) JobInfo
\
- (class) JobInfo
- (class) JobInfo
- (module) NetworkAccessors
\
- (class) CachedLookup
- (class) ThreadedWorker
- (class) CachedLookup
- (class) ThreadedWorker
- (module) utility_functions
\
- (Fn) util_a
- (Fn) util_b
- (Fn) util_a
- (Fn) util_b
- (module) data_functions
\
- (Fn) gather_stuff
- (Fn) analyze_stuff
- (Fn) gather_stuff
- (Fn) analyze_stuff


Also someone importing just the data abstraction class would get something different than they expect when they do 'from doit_tools import JobInfo':

current_namespace
\
JobInfo (module)
\
-JobInfo (class)

instead of:

current_namespace
\
- JobInfo (class)


So, is this just a wrong way to organize Python code? If not, what is a correct way to split related code up but still collect it in a module-like way?

Maybe the best case scenario is that doing 'from doit_tools import JobInfo' is a little confusing for someone using the package?

Maybe a python file called 'api' so that people using the code do the following?:

import doit_tools.api
from doit_tools.api import JobInfo


============================================

Examples in response to comments:

Take the following package contents, inside folder 'foo' which is in python path.

foo/__init__.py


__all__ = ['doit','dataholder','getSomeStuff','hold_more_data','SpecialCase']
from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
from specialcase import SpecialCase


foo/specialcase.py


class SpecialCase:
pass


foo/more.py


def getSomeStuff():
pass

class hold_more_data(object):
pass


foo/stuff.py


def doit():
print "I'm a function."

class dataholder(object):
pass


Do this:

>>> import foo
>>> for thing in dir(foo): print thing
...
SpecialCase
__builtins__
__doc__
__file__
__name__
__package__
__path__
another_class
dataholder
descriptive_name
doit
getSomeStuff
hold_more_data
specialcase


another_class
and
descriptive_name
are there cluttering things up, and also have extra copies of e.g. doit() underneath their namespaces.

If I have a class named Data inside a file named Data.py, when I do 'from Data import Data' then I get a namespace conflict because Data is a class in the current namespace that is inside module Data, somehow is also in the current namespace. (But Python seems to be able to handle this.)

Answer Source

You can sort of do it, but it's not really a good idea and you're fighting against the way Python modules/packages are supposed to work. By importing appropriate names in __init__.py you can make them accessible in the package namespace. By deleting module names you can make them inaccessible. (For why you need to delete them, see this question). So you can get close to what you want with something like this (in __init__.py):

from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
del another_class, descriptive_name
__all__ = ['doit', 'dataholder', 'getSomeStuff', 'hold_more_data']

However, this will break subsequent attempts to import package.another_class. In general, you can't import anything from a package.module without making package.module accessible as an importable reference to that module (although with the __all__ you can block from package import module).

More generally, by splitting up your code by class/function you are working against the Python package/module system. A Python module should generally contain stuff you want to import as a unit. It's not uncommon to import submodule components directly in the top-level package namespace for convenience, but the reverse --- trying to hide the submodules and allow access to their contents only through the top-level package namespace --- is going to lead to problems. In addition, there is nothing to be gained by trying to "cleanse" the package namespace of the modules. Those modules are supposed to be in the package namespace; that's where they belong.