每周频率 - 透析数据集
Frequency per Week - Dialysis Dataset
我有一个从 1995 年到 2014 年的透析数据集 运行。它有变量 "id"、"name"、"date" 和 "modality"
我对 "HD" 模态感兴趣。
数据框遵循以下结构:
- 从 1995 年 4 月开始(然后逐月列出直到 2014 年 12 月)
- 可以在多个月内找到个人(即 Name1 可能在 1995 年 4 月至 1997 年 3 月期间接受透析;因此为什么多次列出)
- 带有日期的每一行都是一个疗程(我需要计算出每位患者每周的疗程频率)。
希望以上内容对我正在尝试做的事情有意义。
这里是一个数据集的例子:
id name date modality
10101650 name1 03-Apr-95 HD
10101650 name1 05-Apr-95 HD
10101650 name1 07-Apr-95 HD
10101650 name1 10-Apr-95 HD
10101650 name1 12-Apr-95 HD
10101650 name1 14-Apr-95 HD
10101650 name1 17-Apr-95 HD
10101650 name1 19-Apr-95 HD
10101650 name1 21-Apr-95 HD
10101650 name1 22-Apr-95 HD
10101650 name1 24-Apr-95 HD
10101650 name1 26-Apr-95 HD
10101650 name1 28-Apr-95 HD
10206042 name2 03-Apr-95 HD
10206042 name2 05-Apr-95 HD
10206042 name2 07-Apr-95 HD
10206042 name2 10-Apr-95 HD
10206042 name2 12-Apr-95 HD
10206042 name2 14-Apr-95 HD
10206042 name2 17-Apr-95 HD
10206042 name2 19-Apr-95 HD
10206042 name2 21-Apr-95 HD
10206042 name2 24-Apr-95 HD
10206042 name2 26-Apr-95 HD
10206042 name2 28-Apr-95 HD
10101650 name1 01-May-95 HD
10101650 name1 03-May-95 HD
10101650 name1 05-May-95 HD
10101650 name1 08-May-95 HD
10101650 name1 10-May-95 HD
10101650 name1 12-May-95 HD
10101650 name1 15-May-95 HD
10101650 name1 17-May-95 HD
10101650 name1 19-May-95 HD
10101650 name1 22-May-95 HD
10101650 name1 24-May-95 HD
10101650 name1 26-May-95 HD
10101650 name1 29-May-95 HD
10101650 name1 31-May-95 HD
10205987 name3 01-May-95 HD
10205987 name3 03-May-95 HD
10205987 name3 05-May-95 HD
10205987 name3 08-May-95 HD
10205987 name3 10-May-95 HD
10205987 name3 12-May-95 HD
10205987 name3 15-May-95 HD
10205987 name3 17-May-95 HD
10205987 name3 19-May-95 HD
10205987 name3 22-May-95 HD
10205987 name3 24-May-95 HD
10205987 name3 26-May-95 HD
10205987 name3 29-May-95 HD
10205987 name3 31-May-95 HD
10206042 name2 01-May-95 HD
10206042 name2 03-May-95 HD
10206042 name2 05-May-95 HD
10206042 name2 08-May-95 HD
10206042 name2 10-May-95 HD
10206042 name2 12-May-95 HD
10206042 name2 15-May-95 HD
10206042 name2 17-May-95 HD
10206042 name2 19-May-95 HD
10206042 name2 22-May-95 HD
10206042 name2 24-May-95 HD
10206042 name2 26-May-95 HD
如前所述,我需要每位患者每周的疗程数。这将是一个平均值,因为患者可以接受几年的透析。
以下是使用 dplyr
和 lubridate
包 -
执行此操作的方法
library(dplyr)
library(lubridate)
df$week_year <- paste(week(df$date), year(df$date), sep = "-")
filter(df, modality == "HD") %>%
group_by(id, name, week_year) %>%
summarise(sessions = n()) %>%
group_by(id, name) %>%
summarize(avg_sessions_per_week = mean(sessions))
# A tibble: 3 x 3
# Groups: id [?]
# id name avg_sessions_per_week
# <int> <chr> <dbl>
# 1 10101650 name1 3.00
# 2 10205987 name3 2.80
# 3 10206042 name2 3.00
数据 -
df <- structure(list(id = c(10101650L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10206042L, 10206042L, 10206042L,
10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L,
10206042L, 10206042L, 10206042L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10205987L,
10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 10205987L,
10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 10205987L,
10205987L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L,
10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L,
10206042L), name = c("name1", "name1", "name1", "name1", "name1",
"name1", "name1", "name1", "name1", "name1", "name1", "name1",
"name1", "name2", "name2", "name2", "name2", "name2", "name2",
"name2", "name2", "name2", "name2", "name2", "name2", "name1",
"name1", "name1", "name1", "name1", "name1", "name1", "name1",
"name1", "name1", "name1", "name1", "name1", "name1", "name3",
"name3", "name3", "name3", "name3", "name3", "name3", "name3",
"name3", "name3", "name3", "name3", "name3", "name3", "name2",
"name2", "name2", "name2", "name2", "name2", "name2", "name2",
"name2", "name2", "name2", "name2"), date = structure(c(9223,
9225, 9227, 9230, 9232, 9234, 9237, 9239, 9241, 9242, 9244, 9246,
9248, 9223, 9225, 9227, 9230, 9232, 9234, 9237, 9239, 9241, 9244,
9246, 9248, 9251, 9253, 9255, 9258, 9260, 9262, 9265, 9267, 9269,
9272, 9274, 9276, 9279, 9281, 9251, 9253, 9255, 9258, 9260, 9262,
9265, 9267, 9269, 9272, 9274, 9276, 9279, 9281, 9251, 9253, 9255,
9258, 9260, 9262, 9265, 9267, 9269, 9272, 9274, 9276), class = "Date"),
modality = c("HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD")), .Names = c("id",
"name", "date", "modality"), row.names = c(NA, -65L), class = "data.frame")
我有一个从 1995 年到 2014 年的透析数据集 运行。它有变量 "id"、"name"、"date" 和 "modality"
我对 "HD" 模态感兴趣。
数据框遵循以下结构: - 从 1995 年 4 月开始(然后逐月列出直到 2014 年 12 月) - 可以在多个月内找到个人(即 Name1 可能在 1995 年 4 月至 1997 年 3 月期间接受透析;因此为什么多次列出) - 带有日期的每一行都是一个疗程(我需要计算出每位患者每周的疗程频率)。
希望以上内容对我正在尝试做的事情有意义。
这里是一个数据集的例子:
id name date modality
10101650 name1 03-Apr-95 HD
10101650 name1 05-Apr-95 HD
10101650 name1 07-Apr-95 HD
10101650 name1 10-Apr-95 HD
10101650 name1 12-Apr-95 HD
10101650 name1 14-Apr-95 HD
10101650 name1 17-Apr-95 HD
10101650 name1 19-Apr-95 HD
10101650 name1 21-Apr-95 HD
10101650 name1 22-Apr-95 HD
10101650 name1 24-Apr-95 HD
10101650 name1 26-Apr-95 HD
10101650 name1 28-Apr-95 HD
10206042 name2 03-Apr-95 HD
10206042 name2 05-Apr-95 HD
10206042 name2 07-Apr-95 HD
10206042 name2 10-Apr-95 HD
10206042 name2 12-Apr-95 HD
10206042 name2 14-Apr-95 HD
10206042 name2 17-Apr-95 HD
10206042 name2 19-Apr-95 HD
10206042 name2 21-Apr-95 HD
10206042 name2 24-Apr-95 HD
10206042 name2 26-Apr-95 HD
10206042 name2 28-Apr-95 HD
10101650 name1 01-May-95 HD
10101650 name1 03-May-95 HD
10101650 name1 05-May-95 HD
10101650 name1 08-May-95 HD
10101650 name1 10-May-95 HD
10101650 name1 12-May-95 HD
10101650 name1 15-May-95 HD
10101650 name1 17-May-95 HD
10101650 name1 19-May-95 HD
10101650 name1 22-May-95 HD
10101650 name1 24-May-95 HD
10101650 name1 26-May-95 HD
10101650 name1 29-May-95 HD
10101650 name1 31-May-95 HD
10205987 name3 01-May-95 HD
10205987 name3 03-May-95 HD
10205987 name3 05-May-95 HD
10205987 name3 08-May-95 HD
10205987 name3 10-May-95 HD
10205987 name3 12-May-95 HD
10205987 name3 15-May-95 HD
10205987 name3 17-May-95 HD
10205987 name3 19-May-95 HD
10205987 name3 22-May-95 HD
10205987 name3 24-May-95 HD
10205987 name3 26-May-95 HD
10205987 name3 29-May-95 HD
10205987 name3 31-May-95 HD
10206042 name2 01-May-95 HD
10206042 name2 03-May-95 HD
10206042 name2 05-May-95 HD
10206042 name2 08-May-95 HD
10206042 name2 10-May-95 HD
10206042 name2 12-May-95 HD
10206042 name2 15-May-95 HD
10206042 name2 17-May-95 HD
10206042 name2 19-May-95 HD
10206042 name2 22-May-95 HD
10206042 name2 24-May-95 HD
10206042 name2 26-May-95 HD
如前所述,我需要每位患者每周的疗程数。这将是一个平均值,因为患者可以接受几年的透析。
以下是使用 dplyr
和 lubridate
包 -
library(dplyr)
library(lubridate)
df$week_year <- paste(week(df$date), year(df$date), sep = "-")
filter(df, modality == "HD") %>%
group_by(id, name, week_year) %>%
summarise(sessions = n()) %>%
group_by(id, name) %>%
summarize(avg_sessions_per_week = mean(sessions))
# A tibble: 3 x 3
# Groups: id [?]
# id name avg_sessions_per_week
# <int> <chr> <dbl>
# 1 10101650 name1 3.00
# 2 10205987 name3 2.80
# 3 10206042 name2 3.00
数据 -
df <- structure(list(id = c(10101650L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10206042L, 10206042L, 10206042L,
10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L,
10206042L, 10206042L, 10206042L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10101650L,
10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10205987L,
10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 10205987L,
10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 10205987L,
10205987L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L,
10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L,
10206042L), name = c("name1", "name1", "name1", "name1", "name1",
"name1", "name1", "name1", "name1", "name1", "name1", "name1",
"name1", "name2", "name2", "name2", "name2", "name2", "name2",
"name2", "name2", "name2", "name2", "name2", "name2", "name1",
"name1", "name1", "name1", "name1", "name1", "name1", "name1",
"name1", "name1", "name1", "name1", "name1", "name1", "name3",
"name3", "name3", "name3", "name3", "name3", "name3", "name3",
"name3", "name3", "name3", "name3", "name3", "name3", "name2",
"name2", "name2", "name2", "name2", "name2", "name2", "name2",
"name2", "name2", "name2", "name2"), date = structure(c(9223,
9225, 9227, 9230, 9232, 9234, 9237, 9239, 9241, 9242, 9244, 9246,
9248, 9223, 9225, 9227, 9230, 9232, 9234, 9237, 9239, 9241, 9244,
9246, 9248, 9251, 9253, 9255, 9258, 9260, 9262, 9265, 9267, 9269,
9272, 9274, 9276, 9279, 9281, 9251, 9253, 9255, 9258, 9260, 9262,
9265, 9267, 9269, 9272, 9274, 9276, 9279, 9281, 9251, 9253, 9255,
9258, 9260, 9262, 9265, 9267, 9269, 9272, 9274, 9276), class = "Date"),
modality = c("HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD",
"HD", "HD", "HD", "HD", "HD", "HD", "HD")), .Names = c("id",
"name", "date", "modality"), row.names = c(NA, -65L), class = "data.frame")