How to get rows that are last in order within pandas without using a loop

25 views Asked by At

I have data that can be grouped by column type and then ordered by another column order. I would like to know if I can use sklearn's train_test_split to split this data such that the rows that have the same value for order and are numerically the last, split out as the test case. In the example below, I would like the last two rows with order=3 to go into the test case.

type order
A 1
A 1
A 2
A 2
A 3
A 3

The way I can think of doing this is programmatically and appending to a list, dataframe or array as I iterate over the type after selecting these values first from the bigger dataframe that has multiple types. I am wondering if there's an alternate way of using train_test_split or something within pandas that avoids a loop.

EDIT:

I would also like to have the rows in the top with orders 1 and 2 as I need them in training.

2

There are 2 answers

0
Muhammed Yunus On

Is the solution below suitable? It filters the rows based on whether they are "order == maximum order value" or not.

enter image description here

Data:

import pandas as pd

data = {'type': ['A', 'A', 'A', 'A', 'A', 'A'],
        'order': [1, 1, 2, 2, 3, 3]}

df = pd.DataFrame(data)

Filter rows


top_rows, bottom_rows = [df.loc[rows] for rows
                         in [df.order.ne(df.order.max()), df.order.eq(df.order.max())]
                         ]

display(top_rows, bottom_rows)
0
Panda Kim On

Code

cond = df.groupby('type')['order'].transform('last').eq(df['order'])

df[cond]

    type    order
4   A       3
5   A       3

df[~cond]

    type    order
0   A       1
1   A       1
2   A       2
3   A       2