Ramsey Ramsey - 10 months ago 58
R Question

R formula equivalent in Pyspark

I am trying to find the equivalent Pyspark code for the below R Code.

generate lag variables



car <-
car %>%
group_by(Model) %>%
mutate(Target.1 = lag(Target, 3),Sales.1 = lag(Sales, 3))


Any ideas?
Thanks

aku aku
Answer Source

I think using Window functions ought to work, though you would need something to order by:

import pyspark.sql.functions as func
from pyspark.sql.window import Window

window = Window.partitionBy("Model").orderBy( ??? )
car = car.withColumn("Target.1", func.lag("Target", 3).over(window))\
    .withColumn("Sales.1", func.lag("Sales", 3))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download