grinsbaeckchen - 1 year ago 72

Python Question

I'm trying to implement an apply function that returns two values because the calculations are similar and pretty time consuming, so I don't want to do apply twice.

The below is an MWE that is pretty stupid and I know there are easier ways to achieve what this MWE does. My actual function is more complicated, but I already run into an error with this MWE:

So, I got this to work:

`def function(row):`

return [row.A, row.A/2]

df = pd.DataFrame({'A' : np.random.randn(8),

'B' : np.random.randn(8)})

df[['D','E']] = df.apply(lambda row: function(row), axis=1).apply(pd.Series)

However, this does not:

`df2 = pd.DataFrame({'A' : np.random.randn(8),`

'B' : pd.date_range('1/1/2011', periods=8, freq='H'),

'C' : np.random.randn(8)})

df2[['D','E']] = df2.apply(lambda row: function(row), axis=1).apply(pd.Series)

Instead, it gives me

ValueError: Shape of passed values is (8, 2), indices imply (8, 3)

I don't understand why changing the type of the B column would impact the outcome, it is not even used in the apply function at all?

I guess I could avoid this issue in the example by temporary excluding the date column. However, in my function later I will need to use the date.

Can someone explain me, why this example does not work? What changes by including a TS?

Answer Source

have `function`

return a `pd.Series`

instead. Returning a list is making apply try to fit the list into the existing row. Returning a `pd.Series`

convinces pandas of something different.

```
def function(row):
return pd.Series([row.A, row.A/2])
df2 = pd.DataFrame({'A' : np.random.randn(8),
'B' : pd.date_range('1/1/2011', periods=8, freq='H'),
'C' : np.random.randn(8)})
df2[['D','E']] = df2.apply(function, axis=1)
df2
```

*Attempt to explain*

```
s = pd.Series([1, 2, 3])
s
0 1
1 2
2 3
dtype: int64
```

```
s.loc[:] = [4, 5, 6]
s
0 4
1 5
2 6
dtype: int64
```

```
s.loc[:] = [7, 8]
```

ValueError: cannot set using a slice indexer with a different length than the value