中午忽略计数重置
Ignoring count reset at noon
这是 的后续问题。
概括一下:我有一个这样的数据框data
:
> data
ID measurement_type measurement_time amount entry_time
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00
具有 ID 1
和 ID 2
的受试者在指定的 entry_time
进入,然后在特定的 measurement_times
测量累积的 amount
s。但是,每天中午,金额将再次设置回零并重新开始计数(从零开始)。我想要实现的是,一旦中午休息(因此重置为零),它会不断将新的新开始累积量添加到中午之前已经累积的量(按分组变量 measurement_type
分组) .
中午休息,上述link中提供的答案完美运行:
library(dplyr)
data %>% as_tibble() %>%
# Check 12 hours passed --> `pm` column
mutate(pm = format(measurement_time, "%H") >= 12) %>%
mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
# Group by ID and `pm`
group_by(ID, measurement_type, date_fct, pm) %>%
# Turn cumsum into actual values
mutate(amount_act = amount - lag(amount, default = 0)) %>%
# Cumsum over ID
ungroup() %>%
group_by(ID, measurement_type) %>%
mutate(amount_cums = cumsum(amount_act)) %>%
ungroup() %>%
select(-c(pm, date_fct, amount_act))
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 5
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 6
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 77
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 777
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 888
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 3333
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 4444
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 5555
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 6666
如您所见,下午计数已正确添加到中午之前计数。但是,由于按天分组(date_fct
在提供的代码中),午夜休息时间错误地将第二天的值(从中午开始累积)添加到前一天的累积量(amount_cums
) .
非常感谢任何帮助获得 amount_cums
所需的输出,如下所示:
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 4
1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 5
1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 55
2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 444
2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 555
2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 2222
2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 3333
2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 4444
2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 5555
数据
data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
measurement_type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("type_1", "type_2"), class = "factor"),
measurement_time = structure(c(1402970400, 1402998720, 1403024280, 1403049900, 1403071200, 1402975620, 1402996200, 1403017320, 1403042160, 1403067660,
1404396060, 1404407940, 1404424500, 1404454080, 1404474900, 1404418320, 1404457140, 1404475800, 1404486000, 1404507600), class = c("POSIXct", "POSIXt"), tzone = ""),
amount = c(1, 2, 1, 2, 3, 11, 22, 11, 22, 33, 111, 222, 333, 444, 111, 1111, 2222, 1111, 2222, 3333),
entry_time = structure(c(1402962780, 1402962780, 1402962780, 1402962780, 1402962780,1402962780, 1402962780, 1402962780, 1402962780, 1402962780,
1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900),
class = c("POSIXct", "POSIXt"), tzone = "CET")), class = "data.frame", row.names = c(NA, -20L))
这是我确定最近中午的方法,添加一个帮助列来捕获每个测量日的最后一次测量,最后将前几天的最后一次测量添加到每个值。
library(dplyr); library(lubridate)
data %>%
# arrange(ID, measurement_type, measurement_time) %>%
# I needed to adjust the times since they loaded in my local time
mutate(measurement_time = measurement_time + dhours(9)) %>%
# identify the most recent noon
mutate(start_of_count_day = floor_date(measurement_time - dhours(12), "day") + dhours(12)) %>%
group_by(ID, measurement_type, start_of_count_day) %>%
mutate(day_ttl = if_else(row_number() == max(row_number()), amount, 0)) %>%
group_by(ID, measurement_type) %>%
mutate(cuml = amount + cumsum(lag(day_ttl, default = 0))) %>%
ungroup()
结果
# A tibble: 20 × 8
ID measurement_type measurement_time amount entry_time start_of_count_day day_ttl cuml
<fct> <fct> <dttm> <dbl> <dttm> <dttm> <dbl> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-16 16:53:00 2014-06-16 12:00:00 0 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-16 16:53:00 2014-06-16 12:00:00 2 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-16 16:53:00 2014-06-17 12:00:00 0 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-16 16:53:00 2014-06-17 12:00:00 0 4
5 1 type_1 2014-06-18 08:00:00 3 2014-06-16 16:53:00 2014-06-17 12:00:00 3 5
6 1 type_2 2014-06-17 05:27:00 11 2014-06-16 16:53:00 2014-06-16 12:00:00 0 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-16 16:53:00 2014-06-16 12:00:00 22 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-16 16:53:00 2014-06-17 12:00:00 0 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-16 16:53:00 2014-06-17 12:00:00 0 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-16 16:53:00 2014-06-17 12:00:00 33 55
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 05:35:00 2014-07-03 12:00:00 0 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 05:35:00 2014-07-03 12:00:00 0 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 05:35:00 2014-07-03 12:00:00 444 444
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 05:35:00 2014-07-04 12:00:00 111 555
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 05:35:00 2014-07-03 12:00:00 2222 2222
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 05:35:00 2014-07-04 12:00:00 0 3333
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 05:35:00 2014-07-04 12:00:00 0 4444
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 05:35:00 2014-07-04 12:00:00 3333 5555
这是
概括一下:我有一个这样的数据框data
:
> data
ID measurement_type measurement_time amount entry_time
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00
具有 ID 1
和 ID 2
的受试者在指定的 entry_time
进入,然后在特定的 measurement_times
测量累积的 amount
s。但是,每天中午,金额将再次设置回零并重新开始计数(从零开始)。我想要实现的是,一旦中午休息(因此重置为零),它会不断将新的新开始累积量添加到中午之前已经累积的量(按分组变量 measurement_type
分组) .
中午休息,上述link中提供的答案完美运行:
library(dplyr)
data %>% as_tibble() %>%
# Check 12 hours passed --> `pm` column
mutate(pm = format(measurement_time, "%H") >= 12) %>%
mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
# Group by ID and `pm`
group_by(ID, measurement_type, date_fct, pm) %>%
# Turn cumsum into actual values
mutate(amount_act = amount - lag(amount, default = 0)) %>%
# Cumsum over ID
ungroup() %>%
group_by(ID, measurement_type) %>%
mutate(amount_cums = cumsum(amount_act)) %>%
ungroup() %>%
select(-c(pm, date_fct, amount_act))
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 5
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 6
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 77
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 777
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 888
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 3333
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 4444
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 5555
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 6666
如您所见,下午计数已正确添加到中午之前计数。但是,由于按天分组(date_fct
在提供的代码中),午夜休息时间错误地将第二天的值(从中午开始累积)添加到前一天的累积量(amount_cums
) .
非常感谢任何帮助获得 amount_cums
所需的输出,如下所示:
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 4
1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 5
1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 55
2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 444
2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 555
2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 2222
2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 3333
2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 4444
2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 5555
数据
data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
measurement_type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("type_1", "type_2"), class = "factor"),
measurement_time = structure(c(1402970400, 1402998720, 1403024280, 1403049900, 1403071200, 1402975620, 1402996200, 1403017320, 1403042160, 1403067660,
1404396060, 1404407940, 1404424500, 1404454080, 1404474900, 1404418320, 1404457140, 1404475800, 1404486000, 1404507600), class = c("POSIXct", "POSIXt"), tzone = ""),
amount = c(1, 2, 1, 2, 3, 11, 22, 11, 22, 33, 111, 222, 333, 444, 111, 1111, 2222, 1111, 2222, 3333),
entry_time = structure(c(1402962780, 1402962780, 1402962780, 1402962780, 1402962780,1402962780, 1402962780, 1402962780, 1402962780, 1402962780,
1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900),
class = c("POSIXct", "POSIXt"), tzone = "CET")), class = "data.frame", row.names = c(NA, -20L))
这是我确定最近中午的方法,添加一个帮助列来捕获每个测量日的最后一次测量,最后将前几天的最后一次测量添加到每个值。
library(dplyr); library(lubridate)
data %>%
# arrange(ID, measurement_type, measurement_time) %>%
# I needed to adjust the times since they loaded in my local time
mutate(measurement_time = measurement_time + dhours(9)) %>%
# identify the most recent noon
mutate(start_of_count_day = floor_date(measurement_time - dhours(12), "day") + dhours(12)) %>%
group_by(ID, measurement_type, start_of_count_day) %>%
mutate(day_ttl = if_else(row_number() == max(row_number()), amount, 0)) %>%
group_by(ID, measurement_type) %>%
mutate(cuml = amount + cumsum(lag(day_ttl, default = 0))) %>%
ungroup()
结果
# A tibble: 20 × 8
ID measurement_type measurement_time amount entry_time start_of_count_day day_ttl cuml
<fct> <fct> <dttm> <dbl> <dttm> <dttm> <dbl> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-16 16:53:00 2014-06-16 12:00:00 0 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-16 16:53:00 2014-06-16 12:00:00 2 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-16 16:53:00 2014-06-17 12:00:00 0 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-16 16:53:00 2014-06-17 12:00:00 0 4
5 1 type_1 2014-06-18 08:00:00 3 2014-06-16 16:53:00 2014-06-17 12:00:00 3 5
6 1 type_2 2014-06-17 05:27:00 11 2014-06-16 16:53:00 2014-06-16 12:00:00 0 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-16 16:53:00 2014-06-16 12:00:00 22 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-16 16:53:00 2014-06-17 12:00:00 0 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-16 16:53:00 2014-06-17 12:00:00 0 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-16 16:53:00 2014-06-17 12:00:00 33 55
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 05:35:00 2014-07-03 12:00:00 0 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 05:35:00 2014-07-03 12:00:00 0 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 05:35:00 2014-07-03 12:00:00 444 444
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 05:35:00 2014-07-04 12:00:00 111 555
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 05:35:00 2014-07-03 12:00:00 2222 2222
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 05:35:00 2014-07-04 12:00:00 0 3333
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 05:35:00 2014-07-04 12:00:00 0 4444
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 05:35:00 2014-07-04 12:00:00 3333 5555