按条件分组并计数
Group by and count with condition
我有一个数据集,看起来像这样:
id
DateTx
TypeTx
Major_complaint
Grade_major
1
01/02/2021
Reflexology
Fatigue / exhaustion
3
1
01/03/2021
Reflexology
Fatigue / exhaustion
3
1
08/02/2021
Reflexology
Pain
4
1
15/02/2021
Reflexology
Pain
3
1
17/12/2020
Nutrition counseling
Depression
4
1
24/02/2021
Reflexology
Pain
3
2
07/10/2020
Acupuncture
Neuropathy legs
5
2
21/10/2020
Acupuncture
Neuoropathic pain
4
3
18/01/2021
Reflexology
Fatigue / exhaustion
4
3
23/02/2021
Reflexology
Neuropathy legs
4
3
31/01/2021
Reflexology
Fatigue / exhaustion
4
我想按 id、TypeTx、Tx 的第一个和最后一个日期以及成绩(第一个和最后一个)进行分组,所以我希望这样收到:
id
FirstDateTx
LastDateTx
TypeTx
Major_complaint
First_Grade
Last_Grade
CountTx
1
01/02/2021
01/03/2021
Reflexology
Fatigue / exhaustion
3
3
2
1
08/02/2021
15/02/2021
Reflexology
Pain
3
4
3
1
17/12/2020
17/12/2020
Nutrition counseling
Depression
4
4
1
2
07/10/2020
07/10/2020
Acupuncture
Neuropathy legs
5
5
1
2
21/10/2020
21/10/2020
Acupuncture
Neuoropathic pain
4
4
1
3
18/01/2021
31/01/2021
Reflexology
Fatigue / exhaustion
4
4
2
3
23/02/2021
23/02/2021
Reflexology
Neuropathy legs
4
4
1
我试试 dplyr:
Tal_data %>%
group_by(id) %>% mutate(DateTxStart=min(DateTx),
DateTxEnd=max(DateTx),
First_grade= first(Grade_major),
Last_grade=last(Grade_major)) %>%
count(TypeTx, DateTxStart, DateTxEnd, First_grade,Last_grade, Major_complaint)
所以我是这样的:
如您所见,我无法 link 对日期和类型 Tx 进行评分。例如对于 id=1,营养咨询,抑郁等级必须是 4,而不是我的解决方案中的 3
有什么想法吗?
数据输入
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L,
3L), DateTx = c("01/02/2021", "01/03/2021", "08/02/2021", "15/02/2021",
"17/12/2020", "24/02/2021", "07/10/2020", "21/10/2020", "18/01/2021",
"23/02/2021", "31/01/2021"), TypeTx = c("Reflexology", "Reflexology",
"Reflexology", "Reflexology", "Nutrition counseling", "Reflexology",
"Acupuncture", "Acupuncture", "Reflexology", "Reflexology", "Reflexology"
), Major_complaint = c("Fatigue / exhaustion", "Fatigue / exhaustion",
"Pain", "Pain", "Depression", "Pain", "Neuropathy legs", "Neuoropathic pain",
"Fatigue / exhaustion", "Neuropathy legs", "Fatigue / exhaustion"
), Grade_major = c(3L, 3L, 4L, 3L, 4L, 3L, 5L, 4L, 4L, 4L, 4L
)), row.names = c(NA, -11L), class = "data.frame")
library(dplyr)
df %>%
mutate(DateTx = lubridate::dmy(DateTx)) %>%
arrange(id, DateTx) %>%
group_by(id, TypeTx, Major_complaint) %>%
summarise(FirstDateTx = first(DateTx),
LastDateTx = last(DateTx),
First_Grade = first(Grade_major),
Last_Grade = last(Grade_major),
CountTx = n())
#> `summarise()` has grouped output by 'id', 'TypeTx'. You can override using the `.groups` argument.
#> # A tibble: 7 x 8
#> # Groups: id, TypeTx [4]
#> id TypeTx Major_complaint FirstDateTx LastDateTx First_Grade Last_Grade
#> <int> <chr> <chr> <date> <date> <int> <int>
#> 1 1 Nutritio… Depression 2020-12-17 2020-12-17 4 4
#> 2 1 Reflexol… Fatigue / exhau… 2021-02-01 2021-03-01 3 3
#> 3 1 Reflexol… Pain 2021-02-08 2021-02-24 4 3
#> 4 2 Acupunct… Neuoropathic pa… 2020-10-21 2020-10-21 4 4
#> 5 2 Acupunct… Neuropathy legs 2020-10-07 2020-10-07 5 5
#> 6 3 Reflexol… Fatigue / exhau… 2021-01-18 2021-01-31 4 4
#> 7 3 Reflexol… Neuropathy legs 2021-02-23 2021-02-23 4 4
#> # … with 1 more variable: CountTx <int>
由 reprex package (v2.0.0)
于 2021-04-22 创建
我有一个数据集,看起来像这样:
id | DateTx | TypeTx | Major_complaint | Grade_major |
---|---|---|---|---|
1 | 01/02/2021 | Reflexology | Fatigue / exhaustion | 3 |
1 | 01/03/2021 | Reflexology | Fatigue / exhaustion | 3 |
1 | 08/02/2021 | Reflexology | Pain | 4 |
1 | 15/02/2021 | Reflexology | Pain | 3 |
1 | 17/12/2020 | Nutrition counseling | Depression | 4 |
1 | 24/02/2021 | Reflexology | Pain | 3 |
2 | 07/10/2020 | Acupuncture | Neuropathy legs | 5 |
2 | 21/10/2020 | Acupuncture | Neuoropathic pain | 4 |
3 | 18/01/2021 | Reflexology | Fatigue / exhaustion | 4 |
3 | 23/02/2021 | Reflexology | Neuropathy legs | 4 |
3 | 31/01/2021 | Reflexology | Fatigue / exhaustion | 4 |
我想按 id、TypeTx、Tx 的第一个和最后一个日期以及成绩(第一个和最后一个)进行分组,所以我希望这样收到:
id | FirstDateTx | LastDateTx | TypeTx | Major_complaint | First_Grade | Last_Grade | CountTx |
---|---|---|---|---|---|---|---|
1 | 01/02/2021 | 01/03/2021 | Reflexology | Fatigue / exhaustion | 3 | 3 | 2 |
1 | 08/02/2021 | 15/02/2021 | Reflexology | Pain | 3 | 4 | 3 |
1 | 17/12/2020 | 17/12/2020 | Nutrition counseling | Depression | 4 | 4 | 1 |
2 | 07/10/2020 | 07/10/2020 | Acupuncture | Neuropathy legs | 5 | 5 | 1 |
2 | 21/10/2020 | 21/10/2020 | Acupuncture | Neuoropathic pain | 4 | 4 | 1 |
3 | 18/01/2021 | 31/01/2021 | Reflexology | Fatigue / exhaustion | 4 | 4 | 2 |
3 | 23/02/2021 | 23/02/2021 | Reflexology | Neuropathy legs | 4 | 4 | 1 |
我试试 dplyr:
Tal_data %>%
group_by(id) %>% mutate(DateTxStart=min(DateTx),
DateTxEnd=max(DateTx),
First_grade= first(Grade_major),
Last_grade=last(Grade_major)) %>%
count(TypeTx, DateTxStart, DateTxEnd, First_grade,Last_grade, Major_complaint)
所以我是这样的:
如您所见,我无法 link 对日期和类型 Tx 进行评分。例如对于 id=1,营养咨询,抑郁等级必须是 4,而不是我的解决方案中的 3 有什么想法吗?
数据输入
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L,
3L), DateTx = c("01/02/2021", "01/03/2021", "08/02/2021", "15/02/2021",
"17/12/2020", "24/02/2021", "07/10/2020", "21/10/2020", "18/01/2021",
"23/02/2021", "31/01/2021"), TypeTx = c("Reflexology", "Reflexology",
"Reflexology", "Reflexology", "Nutrition counseling", "Reflexology",
"Acupuncture", "Acupuncture", "Reflexology", "Reflexology", "Reflexology"
), Major_complaint = c("Fatigue / exhaustion", "Fatigue / exhaustion",
"Pain", "Pain", "Depression", "Pain", "Neuropathy legs", "Neuoropathic pain",
"Fatigue / exhaustion", "Neuropathy legs", "Fatigue / exhaustion"
), Grade_major = c(3L, 3L, 4L, 3L, 4L, 3L, 5L, 4L, 4L, 4L, 4L
)), row.names = c(NA, -11L), class = "data.frame")
library(dplyr)
df %>%
mutate(DateTx = lubridate::dmy(DateTx)) %>%
arrange(id, DateTx) %>%
group_by(id, TypeTx, Major_complaint) %>%
summarise(FirstDateTx = first(DateTx),
LastDateTx = last(DateTx),
First_Grade = first(Grade_major),
Last_Grade = last(Grade_major),
CountTx = n())
#> `summarise()` has grouped output by 'id', 'TypeTx'. You can override using the `.groups` argument.
#> # A tibble: 7 x 8
#> # Groups: id, TypeTx [4]
#> id TypeTx Major_complaint FirstDateTx LastDateTx First_Grade Last_Grade
#> <int> <chr> <chr> <date> <date> <int> <int>
#> 1 1 Nutritio… Depression 2020-12-17 2020-12-17 4 4
#> 2 1 Reflexol… Fatigue / exhau… 2021-02-01 2021-03-01 3 3
#> 3 1 Reflexol… Pain 2021-02-08 2021-02-24 4 3
#> 4 2 Acupunct… Neuoropathic pa… 2020-10-21 2020-10-21 4 4
#> 5 2 Acupunct… Neuropathy legs 2020-10-07 2020-10-07 5 5
#> 6 3 Reflexol… Fatigue / exhau… 2021-01-18 2021-01-31 4 4
#> 7 3 Reflexol… Neuropathy legs 2021-02-23 2021-02-23 4 4
#> # … with 1 more variable: CountTx <int>
由 reprex package (v2.0.0)
于 2021-04-22 创建