Pyspark - Transpose distinct row values to a column header where ill insert values from the same row I transposed but different column

Question

Pyspark - Transpose distinct row values to a column header where ill insert values from the same row I transposed but different column

104 views Asked by José Bastos At 22 February 2023 at 17:33

This is the table I want to transpose

I created a list of the distinct values at DESC_INFO using this: columnsToPivot = list(dict.fromkeys(df.filter(F.col("DESC_INFO") != '').rdd.map(lambda x: (x.DESC_INFO, x.RAW_INFO)).collect()))

And then I tried to map the RAW_INFO values into the matching columns with this:

for key in columnsToPivot:
  if key[1] != '':
    df = df.withColumn(key[0], F.lit(key[1]))

It happens that I just wrote all rows with the same value when I want to fill the RAW_INFO where the table matcher the values mapped with the same 'PROCESS', 'SUBPROCESS' AND 'LAYER'.

This is the map I expect at the end.

The blue lines mean the transpose I have already achieved. The red lines mean the data I need to fill matching the condition shadowed in yellow.

Original Q&A

There are 2 answers

**Ankit Tyagi** · Answer 1 · 2023-02-28T05:38:34+00:00

Ankit Tyagi On 28 February 2023 at 05:38

Try this:

from pyspark.sql.functions import *
df_new=df.filter(df['Desc_info']!='').groupBy('Layer','Process','Subprocess').pivot('Desc_Info').agg(first('Raw Info'))

**José Bastos** · Answer 2 · 2023-02-28T12:35:54+00:00

José Bastos On 28 February 2023 at 12:35

Thats becouse of this columns that I want to repeat the data from DESC_INFO and RAW_INFO

TechQA.

Pyspark - Transpose distinct row values to a column header where ill insert values from the same row I transposed but different column

There are 2 answers

Related Questions in PYTHON

Related Questions in DATAFRAME

Related Questions in DYNAMIC-PIVOT

Related Questions in PYSPARK-PANDAS

Popular Questions

Trending Questions