get max of value - lead and keep corresponding rows that generated the max using dplyr

30 views Asked by At

How would one keep the corresponding two rows (all columns preserved) that produced the max difference after grouping, arranging, and subtracting the 4th lead from the current value? I can currently use top_n to pluck out one row (see sample code below). Have easily used reframe to get the max difference after grouping, but keep the rows that produced that max is taking longer than expected. Thanks.

library(dplyr)

df %>%
  group_by(group) %>%
  arrange(DateTime)%>%
  mutate(new_column = value- lead(value,n=4L)) %>%
  top_n(1,new_column)
1

There are 1 answers

1
Nir Graham On

You are leading, which works by position, so you need to track positions, row_number() is good for that.

Here is an approach

library(dplyr)

set.seed(42)
(df <- data.frame(
  group=rep(letters[1:2],each=10),
  DateTime =rep(1:10,2),
  value  = sample.int(100,size=20,replace=TRUE)
))

(df_arr <-  df %>%
  group_by(group) %>%
  arrange(DateTime)%>%
  mutate(rn=row_number()))

(original_sol_df <-df_arr |> mutate(
         new_column = value- lead(value,n=4L)) %>%
  top_n(1,new_column))

(stub_df <- original_sol_df |> select(group,rn))

(prep_1 <- bind_rows(stub_df,
                     mutate(stub_df,
                            rn=rn+4L)))

fin_df <- left_join(prep_1,
                    df_arr,
                    by = join_by(group, rn))