pr338 pr338 - 17 days ago 8
Python Question

Using Spark DataFrame and Window Functions to calculate the rolling average return

It's my first time working with spark data frames and I am trying to figure out how to use the window functions to compute the average daily return of every stock for every date.

I am trying to group by the ticker and then try to apply a rolling difference window function, but I can't find a lot of documentation on the window functions or how they work.

The data I have is date, open price, high price, low price, close price, volume traded, and ticker.

You find rolling average return by subtracting the close price yesterday from the close price today and then dividing it all by the close price yesterday.

What I tried so far:

w = Window()
df.groupBy("ticker")

I am trying to learn how to use window and groupby together to solve my problem.

Answer

Do you mean:

w = Window().partitionBy("ticker").orderBy("date")

df.withColumn("percentDiff", (col("close") - lag("close", 1).over(w)) / lag("close", 1).over(w)) \
  .groupBy("date").agg(mean("percentDiff"))
Comments