vaultah vaultah - 3 months ago 9
Python Question

Is it possible to access inner functions and classes via code objects?

Say there's a function

func


def func():
class a:
def method(self):
return 'method'
def a(): return 'function'
lambda x: 'lambda'


that I need to examine.

As a part of the examination I want to "retrieve" source code or objects of all nested classes and functions (if any). However I do realize that they don't exist yet and there's no direct/clean way of accessing them without running
func
or defining
them outside (before)
func
. Unfortunately, the most I can do is import a module containing
func
to obtain the
func
function object.

I discovered that functions have the
__code__
attribute containing the
code
object, which has the
co_consts
attribute so I wrote this:

In [11]: [x for x in func.__code__.co_consts if iscode(x) and x.co_name == 'a']
Out[11]:
[<code object a at 0x7fe246aa9810, file "<ipython-input-6-31c52097eb5f>", line 2>,
<code object a at 0x7fe246aa9030, file "<ipython-input-6-31c52097eb5f>", line 4>]


Those
code
objects look awfully similar and I don't think they contain data necessary to help me distinguish between types of objects they represent (e.g.
type
and
function
).

Q1: Am I right?

Q2: Is there any way to access classes/functions (ordinary and lambdas) defined within the function body?

Answer

A1: Things that can help you are -

Constants of the code object

From the documentation:

If a code object represents a function, the first item in co_consts is the documentation string of the function, or None if undefined.

Also, if a code object represents a class, the first item of co_consts is always the qualified name of that class. You can try to use this information.

The following solution will correctly work in most cases, but you'll have to skip code objects Python creates for list/set/dict comprehensions and generator expressions:

from inspect import iscode

for x in func.__code__.co_consts:
    if iscode(x):
        # Skip <setcomp>, <dictcomp>, <listcomp> or <genexp>
        if x.co_name.startswith('<') and x.co_name != '<lambda>':
            continue
        firstconst = x.co_consts[0]
        # Compute the qualified name for the current code object
        # Note that we don't know its "type" yet
        qualname = '{func_name}.<locals>.{code_name}'.format(
                        func_name=func.__name__, code_name=x.co_name)
        if firstconst is None or firstconst != qualname:
            print(x, 'represents a function {!r}'.format(x.co_name))
        else:
            print(x, 'represents a class {!r}'.format(x.co_name))

prints

<code object a at 0x7fd149d1a9c0, file "<ipython-input>", line 2> represents a class 'a'
<code object a at 0x7fd149d1ab70, file "<ipython-input>", line 5> represents a function 'a'
<code object <lambda> at 0x7fd149d1aae0, file "<ipython-input>", line 6> represents a function '<lambda>'

Code flags

There's a way to get the required information from co_flags. Citing the documentation I linked above:

The following flag bits are defined for co_flags: bit 0x04 is set if the function uses the *arguments syntax to accept an arbitrary number of positional arguments; bit 0x08 is set if the function uses the **keywords syntax to accept arbitrary keyword arguments; bit 0x20 is set if the function is a generator.

Other bits in co_flags are reserved for internal use.

Flags are manipulated in compute_code_flags (Python/compile.c):

static int
compute_code_flags(struct compiler *c)
{
    PySTEntryObject *ste = c->u->u_ste;
    ...
    if (ste->ste_type == FunctionBlock) {
        flags |= CO_NEWLOCALS | CO_OPTIMIZED;
        if (ste->ste_nested)
            flags |= CO_NESTED;
        if (ste->ste_generator)
            flags |= CO_GENERATOR;
        if (ste->ste_varargs)
            flags |= CO_VARARGS;
        if (ste->ste_varkeywords)
            flags |= CO_VARKEYWORDS;
    }

    /* (Only) inherit compilerflags in PyCF_MASK */
    flags |= (c->c_flags->cf_flags & PyCF_MASK);

    n = PyDict_Size(c->u->u_freevars);
    ...
    if (n == 0) {
        n = PyDict_Size(c->u->u_cellvars);
        ...
        if (n == 0) {
            flags |= CO_NOFREE;
        }
    }
    ...
}

There're 2 code flags (CO_NEWLOCALS and CO_OPTIMIZED) that won't be set for classes. You can use them to check the type (doesn't mean you should - poorly documented implementation details may change in the future):

from inspect import iscode

for x in complex_func.__code__.co_consts:
    if iscode(x):
        # Skip <setcomp>, <dictcomp>, <listcomp> or <genexp>
        if x.co_name.startswith('<') and x.co_name != '<lambda>':
            continue
        flags = x.co_flags
        # CO_OPTIMIZED = 0x0001, CO_NEWLOCALS = 0x0002
        if flags & 0x0001 and flags & 0x0002:
            print(x, 'represents a function {!r}'.format(x.co_name))
        else:
            print(x, 'represents a class {!r}'.format(x.co_name))

The output is exactly the same.

Bytecode of the outer function

It's also possible to get object type by inspecting the bytecode of the outer function.

Search bytecode instructions to find blocks with LOAD_BUILD_CLASS, it signifies the creation of a class (LOAD_BUILD_CLASS - Pushes builtins.__build_class__() onto the stack. It is later called by CALL_FUNCTION to construct a class.)

from dis import Bytecode
from inspect import iscode
from itertools import groupby

def _group(i):
    if i.starts_line is not None: _group.starts = i
    return _group.starts

bytecode = Bytecode(func)

for _, iset in groupby(bytecode, _group):
    iset = list(iset)
    try:
        code = next(arg.argval for arg in iset if iscode(arg.argval))
        # Skip <setcomp>, <dictcomp>, <listcomp> or <genexp>
        if code.co_name.startswith('<') and code.co_name != '<lambda>':
            raise TypeError
    except (StopIteration, TypeError):
        continue
    else:
        if any(x.opname == 'LOAD_BUILD_CLASS' for x in iset):
            print(code, 'represents a function {!r}'.format(code.co_name))
        else:
            print(code, 'represents a class {!r}'.format(code.co_name)) 

The output is the same (again).

A2: Sure.

Source code

In order to get the source code for code objects, you'd use inspect.getsource or equivalent:

from inspect import iscode, ismethod, getsource
from textwrap import dedent


def nested_sources(ob):
    if ismethod(ob):
        ob = ob.__func__
    try:
        code = ob.__code__
    except AttributeError:
        raise TypeError('Can\'t inspect {!r}'.format(ob)) from None
    for c in code.co_consts:
        if not iscode(c):
            continue
        name = c.co_name
        # Skip <setcomp>, <dictcomp>, <listcomp> or <genexp>
        if not name.startswith('<') or name == '<lambda>':
            yield dedent(getsource(c))

For instance nested_sources(complex_func) (see below)

def complex_func():
    lambda x: 42

    def decorator(cls):
        return lambda: cls()

    @decorator
    class b():
        def method():
            pass

    class c(int, metaclass=abc.ABCMeta):
        def method():
            pass

    {x for x in ()}
    {x: x for x in ()}
    [x for x in ()]
    (x for x in ())

must yield source code for the first lambda, decorator, b (including @decorator) and c:

In [41]: nested_sources(complex_func)
Out[41]: <generator object nested_sources at 0x7fd380781d58>

In [42]: for source in _:
   ....:     print(source, end='=' * 30 + '\n')
   ....:     
lambda x: 42
==============================
def decorator(cls):
    return lambda: cls()
==============================
@decorator
class b():
    def method():
        pass
==============================
class c(int, metaclass=abc.ABCMeta):
    def method():
        pass
==============================

Function and type objects

If you still need a function/class object, you can eval/exec the source code.

Example

  • for lambda functions:

    In [39]: source = sources[0]
    
    In [40]: eval(source, func.__globals__)
    Out[40]: <function __main__.<lambda>>
    
  • for regular functions

    In [21]: source, local = sources[1], {}
    
    In [22]: exec(source, func.__globals__, local)
    
    In [23]: local.popitem()[1]
    Out[23]: <function __main__.decorator>
    
  • for classes

    In [24]: source, local = sources[3], {}
    
    In [25]: exec(source, func.__globals__, local)
    
    In [26]: local.popitem()[1] 
    Out[26]: __main__.c