在 R 中创建一个规则来计算每个患者每天的咨询次数

Question

我使用实际数据集中的关键场景创建了以下数据集：

df <- data.frame (organisation_id  = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
                  patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
                  date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
                  consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
                  professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
                  professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
                  professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)

df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)

我想创建两个额外的列（include? 和 Nr_consultations_per_Pt_day），如下所示：

对于每个 organisation_id、patient_id、date 和 consultation_mode 检查：

1- 如果只有 1 行，include? = 1 并且 Nr_consultations_per_Pt_day = 1 professional_role.

2- 如果超过 1 行，include? = 1 对于每个不同的 professional_id 和 professional_name with consultation_role = 'Doctor' 或 'Nurse'.

注意：如果“医生”或“护士”有 2+ 个不同 professional_id 但相同 professional_name 的条目，则第一行获取include? = 1 和以下行 include? = 0。例如詹妮弗沃克的 25 / 26 个 ID。同样，如果“Doctor”或“Nurse”有 2 个以上的条目具有相同的 professional_id 但不同的 professional_name，则第一行得到 include? = 1，随后的行得到 include? = 0。例如Kevin R Collins / Kevin Collins 的 44 ID。

2.1- 如果有 0 个 'Doctor' 或 'Nurse'（全部'支持'），则第一行得到 include? = 1 和以下行 include? = 0，其中 Nr_consultations_per_Pt_day = 1 表示 professional_role.

中间数据集：

organisation_id	patient_id	date	consultation_mode	professional_id	professional_role	professional_name	include?
1	1230	08-02-2018	Telephone	24	Doctor	Dr John Taylor	1
1	1230	08-02-2018	Face-to-Face	11	Support	Mary Wright	1
2	1222	12-01-2018	Telephone	123	Doctor	Dr Patricia Jones	1
2	1222	12-01-2018	Telephone	110	Support	James Davies	0
2	1244	12-01-2018	Face-to-Face	123	Doctor	Dr Patricia Jones	1
2	1244	22-02-2018	Face-to-Face	110	Support	James Davies	1
2	987	12-01-2018	Telephone	123	Doctor	Dr Patricia Jones	1
2	987	22-02-2018	Telephone	333	Nurse	Peter Hall	1
3	2223	01-03-2019	Home visit	444	Doctor	Dr Mary Wilson	1
3	2223	01-03-2019	Home visit	444	Doctor	Dr Mary Wilson	0
3	2247	01-03-2019	Face-to-Face	444	Doctor	Dr Mary Wilson	1
3	2247	01-03-2019	Face-to-Face	444	Doctor	Dr Mary Wilson	0
3	2247	01-03-2019	Face-to-Face	444	Doctor	Dr Mary Wilson	0
4	1234	12-07-2020	Telephone	1133	Support	Mary Wright	0
4	1234	12-07-2020	Telephone	12	Support	Anthony Patel	0
4	1234	12-07-2020	Telephone	25	Nurse	Jennifer Walker	1
4	1234	12-07-2020	Telephone	26	Nurse	Jennifer Walker	0
4	1234	12-07-2020	Face-to-Face	12	Support	Anthony Patel	0
4	1234	12-07-2020	Face-to-Face	34	Doctor	Dr Carol Bell	1
4	1234	12-07-2020	Face-to-Face	35	Doctor	Dr Carol Bell	0
4	1234	12-07-2020	Face-to-Face	38	Nurse	Deborah Dixon	1
4	1239	13-07-2020	Home visit	44	Nurse	Kevin R Collins	1
4	1239	13-07-2020	Home visit	44	Nurse	Kevin Collins	0
4	1239	13-07-2020	Home visit	5556	Doctor	Dr Robert Brown	1
4	3322	16-06-2021	Face-to-Face	443	Doctor	Dr Mary Wilson	1
4	3322	16-06-2021	Telephone	443	Doctor	Dr Mary Wilson	1
4	3322	16-06-2021	Face-to-Face	445	Doctor	Dr John Snow	1
4	5434	14-05-2019	Telephone	29	Doctor	Dr John Taylor	1
4	5434	14-05-2019	Face-to-Face	29	Doctor	Dr John Taylor	1
4	4488	17-03-2020	Face-to-Face	555	Doctor	Dr James Smith	1
4	4488	17-03-2020	Telephone	5556	Doctor	Dr Robert Brown	1
4	4488	17-03-2020	Telephone	12	Support	Anthony Patel	0
4	1250	03-02-2019	Face-to-Face	1133	Support	Mary Wright	1
4	1250	03-02-2019	Face-to-Face	113663	Support	Mary TEST Wright	0

最终数据集： 一个 organisation_id、patient_id、date 以及 consultation_mode 和 professional_role.

的每个类别的示例

organisation_id	patient_id	date	consultation_mode	professional_role	Nr_consultations_per_Pt_day
1	1230	08-02-2018	Face-to-Face	Doctor	0
1	1230	08-02-2018	Face-to-Face	Nurse	0
1	1230	08-02-2018	Face-to-Face	Support	1
1	1230	08-02-2018	Telephone	Doctor	1
1	1230	08-02-2018	Telephone	Nurse	0
1	1230	08-02-2018	Telephone	Support	0
1	1230	08-02-2018	Home visit	Doctor	0
1	1230	08-02-2018	Home visit	Nurse	0
1	1230	08-02-2018	Home visit	Support	0

等等

关于如何在 R 中以有效的方式执行此操作的任何想法？

Answer 1

如果我对你的描述理解正确，对于每一行我们要评估以下条件来决定是否 include? = 1:

organisation_id-patient_id-date-consultation_mode 的行组大小为 1
organisation_id-patient_id-date-consultation_mode 的行组大小大于 1，并且该行对应于：
1. AND医生是第一个医生id/name
2. 护士 AND 是 第一个 具有相同 id/name
3. 支持并且是第一个支持并且是organisation_id-patient_id-date-consultation_mode没有医生或护士的小组的一部分

此逻辑将创建“中间”table。要创建“最终”table，我们遍历 consultation_mode 和 professional_role 的每个类别并设置 Nr_consultations_per_Pt_day = 1（如果存在 include? = 1 的对应条目）。

基于以上预期，我会这样做：

library(tidyverse)

# For each row, add the size of its 
# organisation_id-patient_id-date-consultation_mode group
df2 <- df %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(group_size = n()) %>% ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME NAME but possiblly different ID
df3 <- df2 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_name) %>% 
    mutate(first_by_name = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME ID but possiblly different name
df4 <- df3 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_id) %>% 
    mutate(first_by_id = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether there's no doctor/nurse in its 
# organisation_id-patient_id-date-consultation_mode
# and indicate the first entry in such support-only group
df5 <- df4 %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(support_only_group = length(intersect(professional_role, c("Doctor", "Nurse"))) == 0) %>% 
    mutate(first_in_support_only = row_number()==1 & support_only_group) %>% 
    ungroup()

# Apply rules to determine the inclusion status of each row
df6 <- df5 %>% mutate(`include?` = if_else(
        group_size == 1 | 
        (professional_role %in% c("Doctor","Nurse") & (first_by_name & first_by_id)) |
        first_in_support_only, 1, 0)) 
df6

转换成最后的table:

# Convert into the final table
df7 <- df6 %>% 
    select(-c(group_size, first_by_name, first_by_id, support_only_group, first_in_support_only)) %>% 
    group_by(organisation_id, patient_id, date) %>% 
    expand(consultation_mode, professional_role) %>%
    left_join(df6) %>%
    mutate(Nr_consultations_per_Pt_day = replace_na(`include?`,0)) %>%
    select(-c(professional_id, professional_name, `include?`)) %>%
    group_by(organisation_id, patient_id, date, consultation_mode, professional_role) %>%
    summarise(Nr_consultations_per_Pt_day = sum(Nr_consultations_per_Pt_day))

df7 %>% filter(patient_id %in% c(2223, 1250, 1230))

在 R 中创建一个规则来计算每个患者每天的咨询次数

create a rule in R to count number of consultations per patient per day

loops

r

rules

cycle

conditional-statements