如何使用数据表获取 R 中自上次日期(滞后)以来的天数时差?
How to get time difference in days since last date (lag) in R using datatable?
person_id diag_date concept_id event diff_prev_event
1: 1 2012-01-15 4265600 comorb NA secs
2: 1 2012-01-15 201820 comorb 0 secs
3: 1 2012-03-15 4265600 comorb 5184000 secs
4: 2 2012-03-15 201820 comorb NA secs
5: 2 2012-06-22 201820 comorb 8553600 secs
6: 2 2012-06-22 4265600 comorb 0 secs
我正在尝试计算每个人自上次活动以来的天数。我 运行 有两个问题。
- 时差以秒为单位显示。我需要得到日子。 (5184000 秒 = 30 天)
- 如果两天是相同的日期,那么第二个显示的是 0,而它应该查看的是不同的日期。第 5 行和第 6 行是同一日期,因此它们会有相同的日期差异。
这是我试过的代码:
dt[order(diag_date),diff_prev_event := difftime(diag_date, lag( diag_date)), by = c("person_id") ]
指定 units
library(data.table)
dt[order(diag_date),diff_prev_event := difftime(diag_date,
lag( diag_date), units = 'days'), by = c("person_id") ]
然后,我们按 'person_id' 和 'diag_date' 分组,如果有多于一行
,则将值更改为 max
dt[, diff_prev_event := if(.N > 1) max(diff_prev_event,
na.rm = TRUE) else diff_prev_event, .(person_id, diag_date)]
> dt
person_id diag_date concept_id event diff_prev_event
<int> <Date> <int> <char> <difftime>
1: 1 2012-01-15 4265600 comorb 0 days
2: 1 2012-01-15 201820 comorb 0 days
3: 1 2012-03-15 4265600 comorb 60 days
4: 2 2012-03-15 201820 comorb NA days
5: 2 2012-06-22 201820 comorb 99 days
6: 2 2012-06-22 4265600 comorb 99 days
-输出
数据
dt <- structure(list(person_id = c(1L, 1L, 1L, 2L, 2L, 2L), diag_date = structure(c(15354,
15354, 15414, 15414, 15513, 15513), class = "Date"), concept_id = c(4265600L,
201820L, 4265600L, 201820L, 201820L, 4265600L), event = c("comorb",
"comorb", "comorb", "comorb", "comorb", "comorb")), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
person_id diag_date concept_id event diff_prev_event
1: 1 2012-01-15 4265600 comorb NA secs
2: 1 2012-01-15 201820 comorb 0 secs
3: 1 2012-03-15 4265600 comorb 5184000 secs
4: 2 2012-03-15 201820 comorb NA secs
5: 2 2012-06-22 201820 comorb 8553600 secs
6: 2 2012-06-22 4265600 comorb 0 secs
我正在尝试计算每个人自上次活动以来的天数。我 运行 有两个问题。
- 时差以秒为单位显示。我需要得到日子。 (5184000 秒 = 30 天)
- 如果两天是相同的日期,那么第二个显示的是 0,而它应该查看的是不同的日期。第 5 行和第 6 行是同一日期,因此它们会有相同的日期差异。
这是我试过的代码:
dt[order(diag_date),diff_prev_event := difftime(diag_date, lag( diag_date)), by = c("person_id") ]
指定 units
library(data.table)
dt[order(diag_date),diff_prev_event := difftime(diag_date,
lag( diag_date), units = 'days'), by = c("person_id") ]
然后,我们按 'person_id' 和 'diag_date' 分组,如果有多于一行
,则将值更改为max
dt[, diff_prev_event := if(.N > 1) max(diff_prev_event,
na.rm = TRUE) else diff_prev_event, .(person_id, diag_date)]
> dt
person_id diag_date concept_id event diff_prev_event
<int> <Date> <int> <char> <difftime>
1: 1 2012-01-15 4265600 comorb 0 days
2: 1 2012-01-15 201820 comorb 0 days
3: 1 2012-03-15 4265600 comorb 60 days
4: 2 2012-03-15 201820 comorb NA days
5: 2 2012-06-22 201820 comorb 99 days
6: 2 2012-06-22 4265600 comorb 99 days
-输出
数据
dt <- structure(list(person_id = c(1L, 1L, 1L, 2L, 2L, 2L), diag_date = structure(c(15354,
15354, 15414, 15414, 15513, 15513), class = "Date"), concept_id = c(4265600L,
201820L, 4265600L, 201820L, 201820L, 4265600L), event = c("comorb",
"comorb", "comorb", "comorb", "comorb", "comorb")), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))