Pandas drop_duplicates() gives odd results: Has anybody seen this already?

Question

Pandas drop_duplicates() gives odd results: Has anybody seen this already?

21 views Asked by Marc At 07 September 2023 at 15:51

When I create a dataframe with a 3-level multiindex and run drop_duplicates() on it, the function seems to focus only on the first two levels of the index and ignores the third.

index=[('2020-09-30', '2020-12-31', '2021-01-15'), 
       ('2020-09-30', '2020-12-31', '2021-01-30'),
       ('2020-09-30', '2020-12-31', '2021-02-04'), 
       ('2020-09-30', '2020-12-31', '2021-02-04')]

cols=['values']

data=[10,10,10,10]

df=pd.DataFrame(index=pd.MultiIndex.from_tuples(index), data=data, columns=cols)

The dataframe looks like this:


                                  values
2020-09-30 2020-12-31 2021-01-15      10
                      2021-01-30      10
                      2021-02-04      10
                      2021-02-04      10

There is only one duplicated row (the 3rd and 4th).

When I run the drop_duplicate() function, I get this:

In:
df.drop_duplicates()

Out:
                                 values
2020-09-30 2020-12-31 2021-01-15      10

I expected 3 rows back and got only 1. Has anybody come across this problem? Have I done anything or is that a known issue with MultiIndices?

Original Q&A

There are 1 answers

**mozway** · Answer 1 · 2023-09-07T16:08:12+00:00

mozway On 07 September 2023 at 16:08

drop_duplicates ignores the index, if you want to consider the index and all columns you might use:

out = df[~df.reset_index().duplicated().to_numpy()]

TechQA.

Pandas drop_duplicates() gives odd results: Has anybody seen this already?

There are 1 answers

Related Questions in PANDAS

Related Questions in MULTI-INDEX

Related Questions in DROP-DUPLICATES

Popular Questions

Trending Questions