A for loop inside a for loop not executing after the first run

89 views Asked by At

I've a dataset of firms with year from 1960-1989 and month (1-12) that looks like this.

PERMNO. Year Month
10057 1960 1
10057 1960 2
10057 1960 3
10057 1960 4
10057 1960 5
10057 1960 6
10057 1960 7
10057 1960 ...
10057 1961 1
10057 1961 2
10057 1961 3
10057 1961 4
10057 1961 5
10057 1961 6
10057 1961 7
10057 1961 ...
10057 1989 1
10057 1989 2
10057 1989 3
10057 1989 4
10057 1989 5
10057 1989 6
10057 1989 7
10057 1989 ...
11857 1960 1
11857 1960 2
11857 1960 3
11857 1960 4
11857 1960 5
11857 1960 6
11857 1960 7
11857 1960 ...

What I'm trying to do is for each Month of July for each firm (PERMNO), I want all the rows before it. For example, For firm 10057, for July 1960, I want all rows beginning from Jan 1960-July 1960 and for next for the same firm 10057, for July 1961, I want all rows beginning from Jan 1960-July 1961 etc etc. Something like this:

PERMNO. Year Month
10057 1960 1
10057 1960 2
10057 1960 3
10057 1960 4
10057 1960 5
10057 1960 6
10057 1960 7
10057 1960 1
10057 1960 2
10057 1960 3
10057 1960 ...
10057 1961 4
10057 1961 5
10057 1961 6
10057 1961 7
10057 1960 1
10057 1960 ...
10057 1961 1
10057 1961 ...
10057 1962 6
10057 1962 7

and so on for each firm and within each firm, all Julys.

I've written a loop inside a loop. The first firm worked but starting from the second, the loop started to return all rows from the beginning 1960 to the end 1980. Not sure where I did wrong for the loop.

This is my code:

permnolist = CRSP2.drop_duplicates('PERMNO').PERMNO.to_list()
LL = []
for indexer in permnolist:
    df = CRSP2[CRSP2['PERMNO']==indexer]
    july_list = df[df['Month']==7].Month.index.to_list()
    for indexer2 in july_list:
        df2 = df.iloc[:indexer2]
        LL.append(df2)

I made a list of all unique firms. The first loop attempts to get a dataframe of the first firm and the second loop uses the index for the row where Month=July (Yes, I've reset the index). I use iloc to take all rows before July. The first firm worked beautifully but once it looped to the second firm, it started returning all rows for that firm from beginning year 1960 all the way to the end 1980 for many times.

2

There are 2 answers

0
Sebastian Liebscher On

Given you input example data you could do something like this to avoid loops at all:

# Filter rows where id=1 and month <= 7
filtered_df = CRSP2[(CRSP2["PERMNO"] == 10057) & (CRSP2["Month"] <= 7)]
print(filtered_df)

to output exactly what you describe as output:

PERMNO. Year    Month
10057   1960    1
10057   1960    2
10057   1960    3
10057   1960    4
10057   1960    5
10057   1960    6
10057   1960    7
10057   1960    1
10057   1960    2
10057   1960    3
10057   1960    ...
10057   1961    4
10057   1961    5
10057   1961    6
10057   1961    7
10057   1960    1
10057   1960    ...
10057   1961    1
10057   1961    ...
10057   1962    6
10057   1962    7
0
ziying35 On

try this:

result = CRSP2.query('Month<=7').sort_values('PERMNO.')