I have an Excel sheet of the formula I need to convert into Pyspark code
considering columns A, B, C, D, E, F, G, H and I where columns F, G, H and I have fixed random numeric values.
Column A has the first row as NULL and subsequent rows follow the formula as "=F3+G3+((C2+D2+E2)/2)".
Column B has the first row as 1000 and subsequent rows follow the formula as "=A3+(B2/2)".
column C follows the formula as "=$B2*5+(100/2)".
column D follows the formula as "=$B2*5+(10/2)".
Column E follows the formula as "=$B2*5+(1/2)".
could you write me a Pyspark code for the same?
You will have to use Pandas API on Spark to achieve this. But the calculations are dependent in complex ways for previously computed values. So it might be slow. Here's an example solution.
Output :