Pyspark Cummularive Sum Window Date and Sequence Number

33 views Asked by At

I'm new to python and Pyspark, I'm trying to get the counter and cummulative sum of the last transactions on the last 30 days per each client ID. I've been trying to build the last 30 days windows but the date is not enough, since I do not have the timestamp, I have to use the sequence number variable in order to get the correct order of the transactions. Here is my data:

enter image description here

More types of transactions can be included, however I already filtered that. I would appreciate any of your suggestions! This is my window so far:

window30 = (Window().orderBy(F.col("Date").cast("timestamp").cast("long")).partitionBy(F.col("ID"),F.col("Type")).rangeBetween(-days(30), 0))

Thank you so much.

0

There are 0 answers