jim mako jim mako - 8 days ago 5
Python Question

Python equivalent of R's list for pandas dataframe

I am trying to collect multiple data frames into a single variable, but I am having trouble doing this in Python.

The code I am trying to execute in R is as follows

df1 <- data.frame()
df2 <- data.frame()
my_collection <- list(my_df1 = df1, my_df2 = df2)


This allows me to do nice things such as calling individual data frams based on name (eg.
my_collection[["my_df1"]]
).

The problem is that I am not able to find a solution in Python that allows me to combine them into a single variable that is searchable. I am a little stuck on what this would be in Python terminology and hence struggling to be pointed in the right direction.

Any help to be able to combine would be much appreciated! Thanks!

Answer

It sounds to me like you want a dict:

In [6]: df1 = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

In [7]: df2 = pd.DataFrame({'c':[7,8,9], 'd':[10,11,12]})

In [8]: df1
Out[8]:
   a  b
0  1  4
1  2  5
2  3  6

In [9]: df2
Out[9]:
   c   d
0  7  10
1  8  11
2  9  12

In [10]: frames = dict(my_df1=df1, my_df2=df2)

In [11]: frames['my_df1']
Out[11]:
   a  b
0  1  4
1  2  5
2  3  6

In [12]: frames['my_df2']
Out[12]:
   c   d
0  7  10
1  8  11
2  9  12

Notice, I'm using dict literals in the DataFrame constructor, but I'm using the dict constructer just so the syntax looks the same as R's.

You could have used literals too:

In [13]: frames2 = {'foo':df1, 'bar':df2}

In [14]: frames2['foo']
Out[14]:
   a  b
0  1  4
1  2  5
2  3  6

In [15]: frames2['bar']
Out[15]:
   c   d
0  7  10
1  8  11
2  9  12

Note, R lists are basically arrays that allow for labeling, but the complexity is the same as arrays (maybe array lists). They are spruced up Python lists. A dict is a hashtable with very different runtime complexity. It is more the equivalent of an R environment (or rather, what an R environment uses under the hood - I don't think R has a plain hash-map data structure).