Ramsey Ramsey -3 years ago 187
R Question

R formula equivalent in Pyspark

I am trying to find the equivalent Pyspark code for the below R Code.

generate lag variables

car <-
car %>%
group_by(Model) %>%
mutate(Target.1 = lag(Target, 3),Sales.1 = lag(Sales, 3))

Any ideas?

Answer Source

I think using Window functions ought to work, though you would need something to order by:

import pyspark.sql.functions as func
from pyspark.sql.window import Window

window = Window.partitionBy("Model").orderBy( ??? )
car = car.withColumn("Target.1", func.lag("Target", 3).over(window))\
    .withColumn("Sales.1", func.lag("Sales", 3))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download