R-studio -I have a very large dataset that I need to aggregate by a unique ID. The values are in 2 columns, 1 is year/month and the other is a integer

13 views Asked by At

So basically the data looks like this

enter image description here

enter image description here

The Unique ID repeats, the integer is always different, and the year_month goes from 2001_1 (jan 2001) to 2001_12(Dec 2001) AND it repeats for more years 2002_1-12, 2003_1-12

The Unique ID is an individual, the integer is the likelihood of finding that individual during that particular year_month.

I need to calculate the mean likelihood of finding the individual for each month throughout all years.

So I can say for individual 1, the probability of finding them in January is X , in February is X

So my first thought was aggregate by Unique ID and then combine/average probability for each month.

There are ~3.5 thousand unique IDs in each excel sheet. Each has a integer and then a year_month. I merged all excel sheets and now have ~ 1.6 million rows.

I don't know if it's bc the data is so big but I can't seem to figure this out.

0

There are 0 answers