中午忽略计数重置

Question

这是的后续问题。

概括一下：我有一个这样的数据框data：

> data
   ID measurement_type    measurement_time amount          entry_time
1   1           type_1 2014-06-17 04:00:00      1 2014-06-17 01:53:00
2   1           type_1 2014-06-17 11:52:00      2 2014-06-17 01:53:00
3   1           type_1 2014-06-17 18:58:00      1 2014-06-17 01:53:00
4   1           type_1 2014-06-18 02:05:00      2 2014-06-17 01:53:00
5   1           type_1 2014-06-18 08:00:00      3 2014-06-17 01:53:00
6   1           type_2 2014-06-17 05:27:00     11 2014-06-17 01:53:00
7   1           type_2 2014-06-17 11:10:00     22 2014-06-17 01:53:00
8   1           type_2 2014-06-17 17:02:00     11 2014-06-17 01:53:00
9   1           type_2 2014-06-17 23:56:00     22 2014-06-17 01:53:00
10  1           type_2 2014-06-18 07:01:00     33 2014-06-17 01:53:00
11  2           type_1 2014-07-03 16:01:00    111 2014-07-03 14:35:00
12  2           type_1 2014-07-03 19:19:00    222 2014-07-03 14:35:00
13  2           type_1 2014-07-03 23:55:00    333 2014-07-03 14:35:00
14  2           type_1 2014-07-04 08:08:00    444 2014-07-03 14:35:00
15  2           type_1 2014-07-04 13:55:00    111 2014-07-03 14:35:00
16  2           type_2 2014-07-03 22:12:00   1111 2014-07-03 14:35:00
17  2           type_2 2014-07-04 08:59:00   2222 2014-07-03 14:35:00
18  2           type_2 2014-07-04 14:10:00   1111 2014-07-03 14:35:00
19  2           type_2 2014-07-04 17:00:00   2222 2014-07-03 14:35:00
20  2           type_2 2014-07-04 23:00:00   3333 2014-07-03 14:35:00

具有 ID 1 和 ID 2 的受试者在指定的 entry_time 进入，然后在特定的 measurement_times 测量累积的 amounts。但是，每天中午，金额将再次设置回零并重新开始计数（从零开始）。我想要实现的是，一旦中午休息（因此重置为零），它会不断将新的新开始累积量添加到中午之前已经累积的量（按分组变量 measurement_type 分组） .

中午休息，上述link中提供的答案完美运行：

library(dplyr)

data %>% as_tibble() %>%
  # Check 12 hours passed --> `pm` column
  mutate(pm = format(measurement_time, "%H") >= 12) %>%
  mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
  # Group by ID and `pm`
  group_by(ID, measurement_type, date_fct, pm) %>%
  # Turn cumsum into actual values
  mutate(amount_act = amount - lag(amount, default = 0)) %>%
  # Cumsum over ID
  ungroup() %>%
  group_by(ID, measurement_type) %>%
  mutate(amount_cums = cumsum(amount_act)) %>%
  ungroup() %>%
  select(-c(pm, date_fct, amount_act))

# A tibble: 20 x 6
   ID    measurement_type measurement_time    amount entry_time          amount_cums
   <fct> <fct>            <dttm>               <dbl> <dttm>                    <dbl>
 1 1     type_1           2014-06-17 04:00:00      1 2014-06-17 01:53:00           1
 2 1     type_1           2014-06-17 11:52:00      2 2014-06-17 01:53:00           2
 3 1     type_1           2014-06-17 18:58:00      1 2014-06-17 01:53:00           3
 4 1     type_1           2014-06-18 02:05:00      2 2014-06-17 01:53:00           5
 5 1     type_1           2014-06-18 08:00:00      3 2014-06-17 01:53:00           6
 6 1     type_2           2014-06-17 05:27:00     11 2014-06-17 01:53:00          11
 7 1     type_2           2014-06-17 11:10:00     22 2014-06-17 01:53:00          22
 8 1     type_2           2014-06-17 17:02:00     11 2014-06-17 01:53:00          33
 9 1     type_2           2014-06-17 23:56:00     22 2014-06-17 01:53:00          44
10 1     type_2           2014-06-18 07:01:00     33 2014-06-17 01:53:00          77
11 2     type_1           2014-07-03 16:01:00    111 2014-07-03 14:35:00         111
12 2     type_1           2014-07-03 19:19:00    222 2014-07-03 14:35:00         222
13 2     type_1           2014-07-03 23:55:00    333 2014-07-03 14:35:00         333
14 2     type_1           2014-07-04 08:08:00    444 2014-07-03 14:35:00         777
15 2     type_1           2014-07-04 13:55:00    111 2014-07-03 14:35:00         888
16 2     type_2           2014-07-03 22:12:00   1111 2014-07-03 14:35:00        1111
17 2     type_2           2014-07-04 08:59:00   2222 2014-07-03 14:35:00        3333
18 2     type_2           2014-07-04 14:10:00   1111 2014-07-03 14:35:00        4444
19 2     type_2           2014-07-04 17:00:00   2222 2014-07-03 14:35:00        5555
20 2     type_2           2014-07-04 23:00:00   3333 2014-07-03 14:35:00        6666

如您所见，下午计数已正确添加到中午之前计数。但是，由于按天分组（date_fct 在提供的代码中），午夜休息时间错误地将第二天的值（从中午开始累积）添加到前一天的累积量（amount_cums） .

非常感谢任何帮助获得 amount_cums 所需的输出，如下所示：

# A tibble: 20 x 6
ID    measurement_type measurement_time    amount entry_time          amount_cums
<fct> <fct>            <dttm>               <dbl> <dttm>                    <dbl>
1     type_1           2014-06-17 04:00:00      1 2014-06-17 01:53:00           1
1     type_1           2014-06-17 11:52:00      2 2014-06-17 01:53:00           2
1     type_1           2014-06-17 18:58:00      1 2014-06-17 01:53:00           3
1     type_1           2014-06-18 02:05:00      2 2014-06-17 01:53:00           4
1     type_1           2014-06-18 08:00:00      3 2014-06-17 01:53:00           5
1     type_2           2014-06-17 05:27:00     11 2014-06-17 01:53:00          11
1     type_2           2014-06-17 11:10:00     22 2014-06-17 01:53:00          22
1     type_2           2014-06-17 17:02:00     11 2014-06-17 01:53:00          33
1     type_2           2014-06-17 23:56:00     22 2014-06-17 01:53:00          44
1     type_2           2014-06-18 07:01:00     33 2014-06-17 01:53:00          55
2     type_1           2014-07-03 16:01:00    111 2014-07-03 14:35:00         111
2     type_1           2014-07-03 19:19:00    222 2014-07-03 14:35:00         222
2     type_1           2014-07-03 23:55:00    333 2014-07-03 14:35:00         333
2     type_1           2014-07-04 08:08:00    444 2014-07-03 14:35:00         444
2     type_1           2014-07-04 13:55:00    111 2014-07-03 14:35:00         555
2     type_2           2014-07-03 22:12:00   1111 2014-07-03 14:35:00        1111
2     type_2           2014-07-04 08:59:00   2222 2014-07-03 14:35:00        2222
2     type_2           2014-07-04 14:10:00   1111 2014-07-03 14:35:00        3333
2     type_2           2014-07-04 17:00:00   2222 2014-07-03 14:35:00        4444
2     type_2           2014-07-04 23:00:00   3333 2014-07-03 14:35:00        5555

数据

data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
measurement_type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("type_1", "type_2"), class = "factor"), 
measurement_time = structure(c(1402970400, 1402998720, 1403024280, 1403049900, 1403071200,  1402975620, 1402996200, 1403017320, 1403042160, 1403067660, 
1404396060, 1404407940, 1404424500, 1404454080, 1404474900,  1404418320, 1404457140, 1404475800, 1404486000, 1404507600), class = c("POSIXct", "POSIXt"), tzone = ""), 
amount = c(1, 2, 1, 2, 3, 11, 22, 11, 22, 33, 111, 222, 333, 444, 111, 1111, 2222, 1111, 2222, 3333), 
entry_time = structure(c(1402962780, 1402962780, 1402962780, 1402962780, 1402962780,1402962780, 1402962780, 1402962780, 1402962780, 1402962780, 
1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900), 
class = c("POSIXct", "POSIXt"), tzone = "CET")), class = "data.frame", row.names = c(NA, -20L))

Answer 1

这是我确定最近中午的方法，添加一个帮助列来捕获每个测量日的最后一次测量，最后将前几天的最后一次测量添加到每个值。

library(dplyr); library(lubridate)
data %>%
  # arrange(ID, measurement_type, measurement_time) %>%

  # I needed to adjust the times since they loaded in my local time
  mutate(measurement_time = measurement_time + dhours(9)) %>% 
  
  # identify the most recent noon
  mutate(start_of_count_day = floor_date(measurement_time - dhours(12), "day") + dhours(12)) %>%
  group_by(ID, measurement_type, start_of_count_day) %>%
  mutate(day_ttl = if_else(row_number() == max(row_number()), amount, 0)) %>%
  group_by(ID, measurement_type) %>%
  mutate(cuml = amount + cumsum(lag(day_ttl, default = 0))) %>%
  ungroup()

结果

# A tibble: 20 × 8
   ID    measurement_type measurement_time    amount entry_time          start_of_count_day  day_ttl  cuml
   <fct> <fct>            <dttm>               <dbl> <dttm>              <dttm>                <dbl> <dbl>
 1 1     type_1           2014-06-17 04:00:00      1 2014-06-16 16:53:00 2014-06-16 12:00:00       0     1
 2 1     type_1           2014-06-17 11:52:00      2 2014-06-16 16:53:00 2014-06-16 12:00:00       2     2
 3 1     type_1           2014-06-17 18:58:00      1 2014-06-16 16:53:00 2014-06-17 12:00:00       0     3
 4 1     type_1           2014-06-18 02:05:00      2 2014-06-16 16:53:00 2014-06-17 12:00:00       0     4
 5 1     type_1           2014-06-18 08:00:00      3 2014-06-16 16:53:00 2014-06-17 12:00:00       3     5
 6 1     type_2           2014-06-17 05:27:00     11 2014-06-16 16:53:00 2014-06-16 12:00:00       0    11
 7 1     type_2           2014-06-17 11:10:00     22 2014-06-16 16:53:00 2014-06-16 12:00:00      22    22
 8 1     type_2           2014-06-17 17:02:00     11 2014-06-16 16:53:00 2014-06-17 12:00:00       0    33
 9 1     type_2           2014-06-17 23:56:00     22 2014-06-16 16:53:00 2014-06-17 12:00:00       0    44
10 1     type_2           2014-06-18 07:01:00     33 2014-06-16 16:53:00 2014-06-17 12:00:00      33    55
11 2     type_1           2014-07-03 16:01:00    111 2014-07-03 05:35:00 2014-07-03 12:00:00       0   111
12 2     type_1           2014-07-03 19:19:00    222 2014-07-03 05:35:00 2014-07-03 12:00:00       0   222
13 2     type_1           2014-07-03 23:55:00    333 2014-07-03 05:35:00 2014-07-03 12:00:00       0   333
14 2     type_1           2014-07-04 08:08:00    444 2014-07-03 05:35:00 2014-07-03 12:00:00     444   444
15 2     type_1           2014-07-04 13:55:00    111 2014-07-03 05:35:00 2014-07-04 12:00:00     111   555
16 2     type_2           2014-07-03 22:12:00   1111 2014-07-03 05:35:00 2014-07-03 12:00:00       0  1111
17 2     type_2           2014-07-04 08:59:00   2222 2014-07-03 05:35:00 2014-07-03 12:00:00    2222  2222
18 2     type_2           2014-07-04 14:10:00   1111 2014-07-03 05:35:00 2014-07-04 12:00:00       0  3333
19 2     type_2           2014-07-04 17:00:00   2222 2014-07-03 05:35:00 2014-07-04 12:00:00       0  4444
20 2     type_2           2014-07-04 23:00:00   3333 2014-07-03 05:35:00 2014-07-04 12:00:00    3333  5555

中午忽略计数重置

Ignoring count reset at noon

r

time-series

dplyr