Incremental counter based on several conditions previously being met

62 views Asked by At

I am attempting to create an incremental counter in an ID column based on several conditions being met in 2 other columns and then "resetting" those conditions being met to ascertain the next increment of ID. This is time series data so order does matter (I have not included the time stamp column).

I will provide a toy dataset. I have 3 columns: Location, Activity and ID. At the moment my ID column is empty but I have populated it here with values to illustrate my conditioning. I want to initialize ID from 1 and then I want to check whether D has occurred. This is my first condition. I then need to check whether A occurs after the first condition being met and at that instance, A should ALSO be at Location 2. Once this along with the first condition is met, I want to increment ID by 1 in the following row. Then in the next row, I want to "reset" the conditions which have occurred and then again check by row whether D has occurred and then at the first instance of A at location 2 occurring after D, I want to increment the next line by 1. This repeats itself to the very end of the dataset.

df <- data.frame(
  Location = c(2, 3, 3, 2, 1, 2, 2, 2, 1, 3, 3, 1, 2, 3, 2, 2, 1, 2, 3, 2, 1),
  Activity = c("A", "B", "C", "D", "D", "B", "A", "A", "B", "A", "C", "D", "A", "B", "B", "D", "A", "D", "D", "A", "C"),
  ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4)
)

# Print the dataframe to view its structure
print(df)

   Location Activity ID
1         2        A  1
2         3        B  1
3         3        C  1
4         2        D  1
5         1        D  1
6         2        B  1
7         2        A  1
8         2        A  2
9         1        B  2
10        3        A  2
11        3        C  2
12        1        D  2
13        2        A  2
14        3        B  3
15        2        B  3
16        2        D  3
17        1        A  3
18        2        D  3
19        3        D  3
20        2        A  3
21        1        C  4
...

Toy data

I have tried many iterations of some sort of conditional logic, but it appears to fail. My best attempt follows, but it does not match my expectations on ID column.

# Function to increment ID based on conditions
increment_id_based_on_conditions <- function(df) {
  df$ID[1] <- 1  # Initialize the first ID
  
  # Initialize control variables
  waiting_for_a <- FALSE
  last_id <- 1
  
  for (i in 1:nrow(df)) {
    if (waiting_for_a && df$Activity[i] == "A" && df$Location[i] == 2) {
      last_id <- last_id + 1  # Increment ID after conditions are met
      waiting_for_a <- FALSE  # Reset condition
    } else if (df$Activity[i] == "D") {
      waiting_for_a <- TRUE  # Set condition to start waiting for "A" at Location 2
    }
    
    df$ID[i] <- last_id  # Update ID column
  }
  
  df$ID <- c(df$ID[-1], NA)  # Shift ID down by one row and make last ID NA
  return(df)
}

# Apply the function to  dataset
df_with_ids <- increment_id_based_on_conditions(df)

# View the updated dataset
print(df_with_ids)

 Location Activity ID
1         2        A  1
2         3        B  1
3         3        C  1
4         2        D  1
5         1        D  1
6         2        B  2
7         2        A  2
8         2        A  2
9         1        B  2
10        3        A  2
11        3        C  2
12        1        D  3
13        2        A  3
14        3        B  3
15        2        B  3
16        2        D  3
17        1        A  3
18        2        D  3
19        3        D  4
20        2        A  4
21        1        C NA

1

There are 1 answers

2
Adriano Mello On

This solution create groups for the "D"s and, for each one, identify the first "2A" position. With this information, an unique id is created. Look:

library(tidyverse) # (Edited)

df <- mutate(df, id = row_number())

aux <- df %>% 
  mutate(d_group = cumsum(if_else(activity == "D", 1, 0))) %>% 
  distinct(d_group, location, activity, .keep_all = TRUE) %>% 
  filter(location == 2, activity == "A", d_group > 0) %>% 
  pull(id)

df  <- mutate(df, id = cumsum(if_else(dplyr::lag(id) %in% aux, 1, 0)) + 1)

rm(aux)

# ---------
> df
   location activity id
1         2        A  1
2         3        B  1
3         3        C  1
4         2        D  1
5         1        D  1
6         2        B  1
7         2        A  1
8         2        A  2
9         1        B  2
10        3        A  2
11        3        C  2
12        1        D  2
13        2        A  2
14        3        B  3
15        2        B  3
16        2        D  3
17        1        A  3
18        2        D  3
19        3        D  3
20        2        A  3
21        1        C  4