J Jones J Jones - 25 days ago 11
Python Question

Testing if a pandas DataFrame exists

In my code, I have several variables which can either contain a pandas DataFrame or nothing at all. Let's say I want to test and see if a certain DataFrame has been created yet or not. My first thought would be to test for it like this:

if df1:
# do something


However, that code fails in this way:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


Fair enough. Ideally, I would like to have a presence test that works for either a DataFrame or Python None.

Here is one way this can work:

if not isinstance(df1, type(None)):
# do something


However, testing for type is really slow.

t = timeit.Timer('if None: pass')
t.timeit()
# approximately 0.04
t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
t.timeit()
# approximately 0.4


Ouch. Along with being slow, testing for NoneType isn't very flexible, either.

A different solution would be to initialize
df1
as an empty DataFrame, so that the type would be the same in both the null and non-null cases. I could then just test using
len()
, or
any()
, or something like that. Making an empty DataFrame seems kind of silly and wasteful, though.

Another solution would be to have an indicator variable:
df1_exists
, which is set to False until
df1
is created. Then, instead of testing
df1
, I would be testing
df1_exists
. But this doesn't seem all that elegant, either.

Is there a better, more Pythonic way of handling this issue? Am I missing something, or is this just an awkward side effect all the awesome things about pandas?

Answer

Option 1 (my preferred option)

This is @Ami Tavory's

Please select his answer if you like this approach

It is very idiomatic python to initialize a variable with None then check for None prior to doing something with that variable.

df1 = None

if df1 is not None:
    print df1.head()

Option 2

However, setting up an empty dataframe isn't at all a bad idea.

df1 = pd.DataFrame()

if not df1.empty:
    print df1.head()

Option 3

Just try it.

try:
    print df1.head()
# catch when df1 is None
except AttributeError:
    pass
# catch when it hasn't even been defined
except NameError:
    pass

Timing

When df1 is in initialized state or doesn't exist at all

enter image description here

When df1 is a dataframe with something in it

df1 = pd.DataFrame(np.arange(25).reshape(5, 5), list('ABCDE'), list('abcde'))
df1

enter image description here

enter image description here

Comments