Modified data:
structure(list(hour = c(0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L), cs = c(0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L
), cs_acum = c(0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 2L, 0L, 0L), cs_wanted = c(0L, 0L, 0L, 0L,
0L, 1L, 2L, 3L, 0L, 0L, 4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L,
3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 0L, 0L
), cs_acum2 = c(0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 4L, 5L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 2L, 3L, 0L, 4L, 5L, 0L, 0L)), .Names = c("hour", "cs", "cs_acum",
"cs_wanted", "cs_acum2"), class = c("data.table", "data.frame"
), row.names = c(NA, -36L), .internal.selfref = <pointer: 0x00000000001f0788>)
cs_acum is cumulative sum of cs with restart at 0.
df1$cs_acum <- with(df1, ave(df1$cs, cumsum(df1$cs == 0), FUN = cumsum))
I need this accumulation to continue if there is value of 1 in 5 rows of hour after the accumulation of 1's from cs has stopped.
Desired output is in col cs_wanted.
Further explanation: çs_acum is accumulation of hours (rows f cs) that meet certain criteria. After this, it has nothing to do with cs any more, it is then related to col: hour. The accumulation should continue if there is a value of 1 in 5 hour window after it has stopped.
Probably a new function checking five lines in hour from the position in cs_acum turns to 0, would be in order, continuing accumulation from where it has stopped in cs_acum.
Possible steps:
find position where accumulation stops
look at next five rows in hour
if there are values of 1, continue accumulation for that line,
look again in next five hours,
if there is no values of 1, do nothing.
New data:
df3 <- structure(list(hour = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
cs = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
cs_acum = c(0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13),
cs_acum2 = c(0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 8, 9, 10, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)),
.Names = c("hour", "cs", "cs_acum", "cs_acum2"), class = "data.frame", row.names = c(NA, -68L))
Using:
gives:
Explanation:
setDT(df1).rl <- rle(d1$hour)andgrp := rleid(rep(rl$lengths >5 & rl$values == 0, rl$lengths))you create a grouping variable that only changes when there are more than 5 zero's.hour == 1and create a get the cumulative sum withcumsum(hour). If your the values inhourare only1's and0's, you could also create a counter withseq_alongor1:.Nwhich will give the same result.is.na(cs_acum2), cs_acum2 := 0you change the NA's to zero's.Update 1: For the new example data (
df2):which gives:
The way I understood it is that the
cumsumofhouris only allowed to start after the first appearance ofcs == 1.Additional explanation:
rn = .Iyou creat a rowindexnumber.df2[, .I[cs == 1]][1]give you the rownumber wherecs == 1for the first time.rn >= df2[, .I[cs == 1]][1]you select only the rows from that point onward.Update 2: With regard to the latest (fourth) dataset, you could do:
which gives:
Used data
First example dataset:
Second dataset:
Fourth dataset: