brittenb brittenb -4 years ago 172
Python Question

How to use created variable in same assign function with pandas

Some simple data to get us started:

import pandas as pd
import numpy as np

df = pd.DataFrame({"x": np.random.normal(size=100), "y": np.random.normal(size=100)})


So, up until this point, I always thought that
assign
was the equivalent of
mutate
in the
dplyr
library. However, if I try to use a variable that I have created in an
assign
step in that same
assign
step, I get an error. Consider the following, which is acceptable in R:

df %>%
mutate(z = x * y, w = z + 10)


If I try the equivalent in
pandas
, I get an error:

df.assign(z = df.x * df.y, w = z + 10) # Error
df.assign(z = df.x * df.y, w = lambda d: d.z + 10) # Error


The only way I can think of to do this is to use two
assign
steps:

df.assign(z = df.x * df.y).assign(w = lambda d: d.z + 10)


Is there something that I've missed? Or is there another function that is more appropriate?

Answer Source

These are not equivalent. From the docs for assign (emphasis mine):

Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.

I imagine that this might just be a little bit harder to implement with Python magic than with R magic. If you wanted to avoid assign-ing twice, the obvious option would just be to store the common calculation before-hand, sadly breaking up your chaining.

mul = df.x * df.y
df.assign(z = mul, w = mul+10)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download