在 R 中创建一个规则来计算每个患者每天的咨询次数
create a rule in R to count number of consultations per patient per day
我使用实际数据集中的关键场景创建了以下数据集:
df <- data.frame (organisation_id = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)
df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)
我想创建两个额外的列(include?
和 Nr_consultations_per_Pt_day
),如下所示:
对于每个 organisation_id
、patient_id
、date
和 consultation_mode
检查:
1- 如果只有 1 行,include?
= 1 并且 Nr_consultations_per_Pt_day
= 1 professional_role
.
2- 如果超过 1 行,include?
= 1 对于每个不同的 professional_id
和 professional_name
with consultation_role
= 'Doctor' 或 'Nurse'.
注意:如果“医生”或“护士”有 2+ 个不同 professional_id
但相同 professional_name
的条目,则第一行获取include?
= 1 和以下行 include?
= 0。例如詹妮弗沃克的 25 / 26 个 ID。同样,如果“Doctor”或“Nurse”有 2 个以上的条目具有相同的 professional_id
但不同的 professional_name
,则第一行得到 include?
= 1,随后的行得到 include?
= 0。例如Kevin R Collins / Kevin Collins 的 44 ID。
2.1- 如果有 0 个 'Doctor' 或 'Nurse'(全部'支持'),则第一行得到 include?
= 1 和以下行 include?
= 0,其中 Nr_consultations_per_Pt_day
= 1 表示 professional_role
.
中间数据集:
organisation_id
patient_id
date
consultation_mode
professional_id
professional_role
professional_name
include?
1
1230
08-02-2018
Telephone
24
Doctor
Dr John Taylor
1
1
1230
08-02-2018
Face-to-Face
11
Support
Mary Wright
1
2
1222
12-01-2018
Telephone
123
Doctor
Dr Patricia Jones
1
2
1222
12-01-2018
Telephone
110
Support
James Davies
0
2
1244
12-01-2018
Face-to-Face
123
Doctor
Dr Patricia Jones
1
2
1244
22-02-2018
Face-to-Face
110
Support
James Davies
1
2
987
12-01-2018
Telephone
123
Doctor
Dr Patricia Jones
1
2
987
22-02-2018
Telephone
333
Nurse
Peter Hall
1
3
2223
01-03-2019
Home visit
444
Doctor
Dr Mary Wilson
1
3
2223
01-03-2019
Home visit
444
Doctor
Dr Mary Wilson
0
3
2247
01-03-2019
Face-to-Face
444
Doctor
Dr Mary Wilson
1
3
2247
01-03-2019
Face-to-Face
444
Doctor
Dr Mary Wilson
0
3
2247
01-03-2019
Face-to-Face
444
Doctor
Dr Mary Wilson
0
4
1234
12-07-2020
Telephone
1133
Support
Mary Wright
0
4
1234
12-07-2020
Telephone
12
Support
Anthony Patel
0
4
1234
12-07-2020
Telephone
25
Nurse
Jennifer Walker
1
4
1234
12-07-2020
Telephone
26
Nurse
Jennifer Walker
0
4
1234
12-07-2020
Face-to-Face
12
Support
Anthony Patel
0
4
1234
12-07-2020
Face-to-Face
34
Doctor
Dr Carol Bell
1
4
1234
12-07-2020
Face-to-Face
35
Doctor
Dr Carol Bell
0
4
1234
12-07-2020
Face-to-Face
38
Nurse
Deborah Dixon
1
4
1239
13-07-2020
Home visit
44
Nurse
Kevin R Collins
1
4
1239
13-07-2020
Home visit
44
Nurse
Kevin Collins
0
4
1239
13-07-2020
Home visit
5556
Doctor
Dr Robert Brown
1
4
3322
16-06-2021
Face-to-Face
443
Doctor
Dr Mary Wilson
1
4
3322
16-06-2021
Telephone
443
Doctor
Dr Mary Wilson
1
4
3322
16-06-2021
Face-to-Face
445
Doctor
Dr John Snow
1
4
5434
14-05-2019
Telephone
29
Doctor
Dr John Taylor
1
4
5434
14-05-2019
Face-to-Face
29
Doctor
Dr John Taylor
1
4
4488
17-03-2020
Face-to-Face
555
Doctor
Dr James Smith
1
4
4488
17-03-2020
Telephone
5556
Doctor
Dr Robert Brown
1
4
4488
17-03-2020
Telephone
12
Support
Anthony Patel
0
4
1250
03-02-2019
Face-to-Face
1133
Support
Mary Wright
1
4
1250
03-02-2019
Face-to-Face
113663
Support
Mary TEST Wright
0
最终数据集:
一个 organisation_id
、patient_id
、date
以及 consultation_mode
和 professional_role
.
的每个类别的示例
organisation_id
patient_id
date
consultation_mode
professional_role
Nr_consultations_per_Pt_day
1
1230
08-02-2018
Face-to-Face
Doctor
0
1
1230
08-02-2018
Face-to-Face
Nurse
0
1
1230
08-02-2018
Face-to-Face
Support
1
1
1230
08-02-2018
Telephone
Doctor
1
1
1230
08-02-2018
Telephone
Nurse
0
1
1230
08-02-2018
Telephone
Support
0
1
1230
08-02-2018
Home visit
Doctor
0
1
1230
08-02-2018
Home visit
Nurse
0
1
1230
08-02-2018
Home visit
Support
0
等等
关于如何在 R 中以有效的方式执行此操作的任何想法?
如果我对你的描述理解正确,对于每一行我们要评估以下条件来决定是否 include? = 1
:
- organisation_id-patient_id-date-consultation_mode 的行组大小为 1
- organisation_id-patient_id-date-consultation_mode 的行组大小大于 1,并且该行对应于:
- AND医生是第一个医生id/name
- 护士 AND 是 第一个 具有相同 id/name
的护士
- 支持并且是第一个支持并且是organisation_id-patient_id-date-consultation_mode没有医生或护士的小组的一部分
此逻辑将创建“中间”table。要创建“最终”table,我们遍历 consultation_mode 和 professional_role 的每个类别并设置 Nr_consultations_per_Pt_day = 1
(如果存在 include? = 1
的对应条目)。
基于以上预期,我会这样做:
library(tidyverse)
# For each row, add the size of its
# organisation_id-patient_id-date-consultation_mode group
df2 <- df %>% group_by(organisation_id, patient_id, date, consultation_mode) %>%
mutate(group_size = n()) %>% ungroup()
# For each row, indicate whether it's the first entry of
# organisation_id-patient_id-date-consultation_mode-professional_role group
# of people with the SAME NAME but possiblly different ID
df3 <- df2 %>% group_by(organisation_id, patient_id, date, consultation_mode,
professional_role, professional_name) %>%
mutate(first_by_name = row_number()==1) %>%
ungroup()
# For each row, indicate whether it's the first entry of
# organisation_id-patient_id-date-consultation_mode-professional_role group
# of people with the SAME ID but possiblly different name
df4 <- df3 %>% group_by(organisation_id, patient_id, date, consultation_mode,
professional_role, professional_id) %>%
mutate(first_by_id = row_number()==1) %>%
ungroup()
# For each row, indicate whether there's no doctor/nurse in its
# organisation_id-patient_id-date-consultation_mode
# and indicate the first entry in such support-only group
df5 <- df4 %>% group_by(organisation_id, patient_id, date, consultation_mode) %>%
mutate(support_only_group = length(intersect(professional_role, c("Doctor", "Nurse"))) == 0) %>%
mutate(first_in_support_only = row_number()==1 & support_only_group) %>%
ungroup()
# Apply rules to determine the inclusion status of each row
df6 <- df5 %>% mutate(`include?` = if_else(
group_size == 1 |
(professional_role %in% c("Doctor","Nurse") & (first_by_name & first_by_id)) |
first_in_support_only, 1, 0))
df6
转换成最后的table:
# Convert into the final table
df7 <- df6 %>%
select(-c(group_size, first_by_name, first_by_id, support_only_group, first_in_support_only)) %>%
group_by(organisation_id, patient_id, date) %>%
expand(consultation_mode, professional_role) %>%
left_join(df6) %>%
mutate(Nr_consultations_per_Pt_day = replace_na(`include?`,0)) %>%
select(-c(professional_id, professional_name, `include?`)) %>%
group_by(organisation_id, patient_id, date, consultation_mode, professional_role) %>%
summarise(Nr_consultations_per_Pt_day = sum(Nr_consultations_per_Pt_day))
df7 %>% filter(patient_id %in% c(2223, 1250, 1230))
我使用实际数据集中的关键场景创建了以下数据集:
df <- data.frame (organisation_id = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)
df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)
我想创建两个额外的列(include?
和 Nr_consultations_per_Pt_day
),如下所示:
对于每个 organisation_id
、patient_id
、date
和 consultation_mode
检查:
1- 如果只有 1 行,include?
= 1 并且 Nr_consultations_per_Pt_day
= 1 professional_role
.
2- 如果超过 1 行,include?
= 1 对于每个不同的 professional_id
和 professional_name
with consultation_role
= 'Doctor' 或 'Nurse'.
注意:如果“医生”或“护士”有 2+ 个不同 professional_id
但相同 professional_name
的条目,则第一行获取include?
= 1 和以下行 include?
= 0。例如詹妮弗沃克的 25 / 26 个 ID。同样,如果“Doctor”或“Nurse”有 2 个以上的条目具有相同的 professional_id
但不同的 professional_name
,则第一行得到 include?
= 1,随后的行得到 include?
= 0。例如Kevin R Collins / Kevin Collins 的 44 ID。
2.1- 如果有 0 个 'Doctor' 或 'Nurse'(全部'支持'),则第一行得到 include?
= 1 和以下行 include?
= 0,其中 Nr_consultations_per_Pt_day
= 1 表示 professional_role
.
中间数据集:
organisation_id | patient_id | date | consultation_mode | professional_id | professional_role | professional_name | include? |
---|---|---|---|---|---|---|---|
1 | 1230 | 08-02-2018 | Telephone | 24 | Doctor | Dr John Taylor | 1 |
1 | 1230 | 08-02-2018 | Face-to-Face | 11 | Support | Mary Wright | 1 |
2 | 1222 | 12-01-2018 | Telephone | 123 | Doctor | Dr Patricia Jones | 1 |
2 | 1222 | 12-01-2018 | Telephone | 110 | Support | James Davies | 0 |
2 | 1244 | 12-01-2018 | Face-to-Face | 123 | Doctor | Dr Patricia Jones | 1 |
2 | 1244 | 22-02-2018 | Face-to-Face | 110 | Support | James Davies | 1 |
2 | 987 | 12-01-2018 | Telephone | 123 | Doctor | Dr Patricia Jones | 1 |
2 | 987 | 22-02-2018 | Telephone | 333 | Nurse | Peter Hall | 1 |
3 | 2223 | 01-03-2019 | Home visit | 444 | Doctor | Dr Mary Wilson | 1 |
3 | 2223 | 01-03-2019 | Home visit | 444 | Doctor | Dr Mary Wilson | 0 |
3 | 2247 | 01-03-2019 | Face-to-Face | 444 | Doctor | Dr Mary Wilson | 1 |
3 | 2247 | 01-03-2019 | Face-to-Face | 444 | Doctor | Dr Mary Wilson | 0 |
3 | 2247 | 01-03-2019 | Face-to-Face | 444 | Doctor | Dr Mary Wilson | 0 |
4 | 1234 | 12-07-2020 | Telephone | 1133 | Support | Mary Wright | 0 |
4 | 1234 | 12-07-2020 | Telephone | 12 | Support | Anthony Patel | 0 |
4 | 1234 | 12-07-2020 | Telephone | 25 | Nurse | Jennifer Walker | 1 |
4 | 1234 | 12-07-2020 | Telephone | 26 | Nurse | Jennifer Walker | 0 |
4 | 1234 | 12-07-2020 | Face-to-Face | 12 | Support | Anthony Patel | 0 |
4 | 1234 | 12-07-2020 | Face-to-Face | 34 | Doctor | Dr Carol Bell | 1 |
4 | 1234 | 12-07-2020 | Face-to-Face | 35 | Doctor | Dr Carol Bell | 0 |
4 | 1234 | 12-07-2020 | Face-to-Face | 38 | Nurse | Deborah Dixon | 1 |
4 | 1239 | 13-07-2020 | Home visit | 44 | Nurse | Kevin R Collins | 1 |
4 | 1239 | 13-07-2020 | Home visit | 44 | Nurse | Kevin Collins | 0 |
4 | 1239 | 13-07-2020 | Home visit | 5556 | Doctor | Dr Robert Brown | 1 |
4 | 3322 | 16-06-2021 | Face-to-Face | 443 | Doctor | Dr Mary Wilson | 1 |
4 | 3322 | 16-06-2021 | Telephone | 443 | Doctor | Dr Mary Wilson | 1 |
4 | 3322 | 16-06-2021 | Face-to-Face | 445 | Doctor | Dr John Snow | 1 |
4 | 5434 | 14-05-2019 | Telephone | 29 | Doctor | Dr John Taylor | 1 |
4 | 5434 | 14-05-2019 | Face-to-Face | 29 | Doctor | Dr John Taylor | 1 |
4 | 4488 | 17-03-2020 | Face-to-Face | 555 | Doctor | Dr James Smith | 1 |
4 | 4488 | 17-03-2020 | Telephone | 5556 | Doctor | Dr Robert Brown | 1 |
4 | 4488 | 17-03-2020 | Telephone | 12 | Support | Anthony Patel | 0 |
4 | 1250 | 03-02-2019 | Face-to-Face | 1133 | Support | Mary Wright | 1 |
4 | 1250 | 03-02-2019 | Face-to-Face | 113663 | Support | Mary TEST Wright | 0 |
最终数据集:
一个 organisation_id
、patient_id
、date
以及 consultation_mode
和 professional_role
.
organisation_id | patient_id | date | consultation_mode | professional_role | Nr_consultations_per_Pt_day |
---|---|---|---|---|---|
1 | 1230 | 08-02-2018 | Face-to-Face | Doctor | 0 |
1 | 1230 | 08-02-2018 | Face-to-Face | Nurse | 0 |
1 | 1230 | 08-02-2018 | Face-to-Face | Support | 1 |
1 | 1230 | 08-02-2018 | Telephone | Doctor | 1 |
1 | 1230 | 08-02-2018 | Telephone | Nurse | 0 |
1 | 1230 | 08-02-2018 | Telephone | Support | 0 |
1 | 1230 | 08-02-2018 | Home visit | Doctor | 0 |
1 | 1230 | 08-02-2018 | Home visit | Nurse | 0 |
1 | 1230 | 08-02-2018 | Home visit | Support | 0 |
等等
关于如何在 R 中以有效的方式执行此操作的任何想法?
如果我对你的描述理解正确,对于每一行我们要评估以下条件来决定是否 include? = 1
:
- organisation_id-patient_id-date-consultation_mode 的行组大小为 1
- organisation_id-patient_id-date-consultation_mode 的行组大小大于 1,并且该行对应于:
- AND医生是第一个医生id/name
- 护士 AND 是 第一个 具有相同 id/name 的护士
- 支持并且是第一个支持并且是organisation_id-patient_id-date-consultation_mode没有医生或护士的小组的一部分
此逻辑将创建“中间”table。要创建“最终”table,我们遍历 consultation_mode 和 professional_role 的每个类别并设置 Nr_consultations_per_Pt_day = 1
(如果存在 include? = 1
的对应条目)。
基于以上预期,我会这样做:
library(tidyverse)
# For each row, add the size of its
# organisation_id-patient_id-date-consultation_mode group
df2 <- df %>% group_by(organisation_id, patient_id, date, consultation_mode) %>%
mutate(group_size = n()) %>% ungroup()
# For each row, indicate whether it's the first entry of
# organisation_id-patient_id-date-consultation_mode-professional_role group
# of people with the SAME NAME but possiblly different ID
df3 <- df2 %>% group_by(organisation_id, patient_id, date, consultation_mode,
professional_role, professional_name) %>%
mutate(first_by_name = row_number()==1) %>%
ungroup()
# For each row, indicate whether it's the first entry of
# organisation_id-patient_id-date-consultation_mode-professional_role group
# of people with the SAME ID but possiblly different name
df4 <- df3 %>% group_by(organisation_id, patient_id, date, consultation_mode,
professional_role, professional_id) %>%
mutate(first_by_id = row_number()==1) %>%
ungroup()
# For each row, indicate whether there's no doctor/nurse in its
# organisation_id-patient_id-date-consultation_mode
# and indicate the first entry in such support-only group
df5 <- df4 %>% group_by(organisation_id, patient_id, date, consultation_mode) %>%
mutate(support_only_group = length(intersect(professional_role, c("Doctor", "Nurse"))) == 0) %>%
mutate(first_in_support_only = row_number()==1 & support_only_group) %>%
ungroup()
# Apply rules to determine the inclusion status of each row
df6 <- df5 %>% mutate(`include?` = if_else(
group_size == 1 |
(professional_role %in% c("Doctor","Nurse") & (first_by_name & first_by_id)) |
first_in_support_only, 1, 0))
df6
转换成最后的table:
# Convert into the final table
df7 <- df6 %>%
select(-c(group_size, first_by_name, first_by_id, support_only_group, first_in_support_only)) %>%
group_by(organisation_id, patient_id, date) %>%
expand(consultation_mode, professional_role) %>%
left_join(df6) %>%
mutate(Nr_consultations_per_Pt_day = replace_na(`include?`,0)) %>%
select(-c(professional_id, professional_name, `include?`)) %>%
group_by(organisation_id, patient_id, date, consultation_mode, professional_role) %>%
summarise(Nr_consultations_per_Pt_day = sum(Nr_consultations_per_Pt_day))
df7 %>% filter(patient_id %in% c(2223, 1250, 1230))