ceiling cat - 2 months ago 15

Python Question

How can I create a DataFrame from multiple

`numpy`

`Pandas`

`Pandas`

For example, I have these two

`numpy`

`Pandas`

`foo = np.array( [ 1, 2, 3 ] )`

bar = np.array( [ 4, 5, 6 ] )

If I do this, the

`bar`

`dict`

`pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )`

bar foo

0 4 1

1 5 2

2 6 3

I can do this, but it gets tedious when I need to combine many variables.

`pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )`

EDIT: Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).

EDIT2: One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.

Answer

`collections.OrderedDict`

In my original solution, I proposed to use `OrderedDict`

from the `collections`

package in python's standard library.

```
>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )
foo bar
0 1 4
1 2 5
2 3 6
```

However, as noted, if a normal dictionary is passed to `OrderedDict`

, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an `OrderedDict`

, as suggested from this SO post:

```
>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )
a c b
0 1 7 4
1 2 8 5
2 3 9 6
>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )
a b c
0 1 4 7
1 2 5 8
2 3 6 9
```