pandas dataset transformation to normalize the data

Question

pandas dataset transformation to normalize the data

223 views Asked by python_enthusiast At 20 March 2020 at 16:20

I have a csv file like this:

I want to transform it into a pandas dataframe like this:

Basically i'm trying to normalize the dataset to populate a sql table.

I have used json_normalize to create a separate dataset from genres column but I'm at a loss over how to transform both the columns as shown in the above depiction.

Some suggestions would be highly appreciated.

Original Q&A

There are 1 answers

**ManojK** · Accepted Answer · 2020-03-20T17:04:22+00:00

If the genre_id is the only numeric value (as shown in the picture), you can use the following:

#find all occurrences of digits in the column and convert the list items to comma separated string.
df['genre_id'] = df['genres'].str.findall(r'(\d+)').apply(', '.join)

#use pandas.DataFrame.explode to generate new genre_ids by comma separating them.
df = df.assign(genre_id = df.genre_id.str.split(',')).explode('genre_id') 

#finally remove the extra space
df['genre_id']  = df['genre_id'].str.lstrip() 

#if required create a new dataframe with these 2 columns only
df = df[['id','genre_id']]

TechQA.

pandas dataset transformation to normalize the data

There are 1 answers

Related Questions in PYTHON

Related Questions in JSON

Related Questions in PANDAS

Related Questions in DENORMALIZED

Popular Questions

Trending Questions