按条件分组并计数

Group by and count with condition

我有一个数据集,看起来像这样:

id DateTx TypeTx Major_complaint Grade_major
1 01/02/2021 Reflexology Fatigue / exhaustion 3
1 01/03/2021 Reflexology Fatigue / exhaustion 3
1 08/02/2021 Reflexology Pain 4
1 15/02/2021 Reflexology Pain 3
1 17/12/2020 Nutrition counseling Depression 4
1 24/02/2021 Reflexology Pain 3
2 07/10/2020 Acupuncture Neuropathy legs 5
2 21/10/2020 Acupuncture Neuoropathic pain 4
3 18/01/2021 Reflexology Fatigue / exhaustion 4
3 23/02/2021 Reflexology Neuropathy legs 4
3 31/01/2021 Reflexology Fatigue / exhaustion 4

我想按 id、TypeTx、Tx 的第一个和最后一个日期以及成绩(第一个和最后一个)进行分组,所以我希望这样收到:

id FirstDateTx LastDateTx TypeTx Major_complaint First_Grade Last_Grade CountTx
1 01/02/2021 01/03/2021 Reflexology Fatigue / exhaustion 3 3 2
1 08/02/2021 15/02/2021 Reflexology Pain 3 4 3
1 17/12/2020 17/12/2020 Nutrition counseling Depression 4 4 1
2 07/10/2020 07/10/2020 Acupuncture Neuropathy legs 5 5 1
2 21/10/2020 21/10/2020 Acupuncture Neuoropathic pain 4 4 1
3 18/01/2021 31/01/2021 Reflexology Fatigue / exhaustion 4 4 2
3 23/02/2021 23/02/2021 Reflexology Neuropathy legs 4 4 1

我试试 dplyr:

Tal_data %>% 
group_by(id) %>% mutate(DateTxStart=min(DateTx), 
                        DateTxEnd=max(DateTx),
                        First_grade= first(Grade_major),
                        Last_grade=last(Grade_major)) %>% 
count(TypeTx, DateTxStart, DateTxEnd, First_grade,Last_grade, Major_complaint)
    

所以我是这样的:

如您所见,我无法 link 对日期和类型 Tx 进行评分。例如对于 id=1,营养咨询,抑郁等级必须是 4,而不是我的解决方案中的 3 有什么想法吗?

数据输入

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 
  3L), DateTx = c("01/02/2021", "01/03/2021", "08/02/2021", "15/02/2021", 
    "17/12/2020", "24/02/2021", "07/10/2020", "21/10/2020", "18/01/2021", 
    "23/02/2021", "31/01/2021"), TypeTx = c("Reflexology", "Reflexology", 
      "Reflexology", "Reflexology", "Nutrition counseling", "Reflexology", 
      "Acupuncture", "Acupuncture", "Reflexology", "Reflexology", "Reflexology"
    ), Major_complaint = c("Fatigue / exhaustion", "Fatigue / exhaustion", 
      "Pain", "Pain", "Depression", "Pain", "Neuropathy legs", "Neuoropathic pain", 
      "Fatigue / exhaustion", "Neuropathy legs", "Fatigue / exhaustion"
    ), Grade_major = c(3L, 3L, 4L, 3L, 4L, 3L, 5L, 4L, 4L, 4L, 4L
    )), row.names = c(NA, -11L), class = "data.frame")
library(dplyr)

df %>%
  mutate(DateTx = lubridate::dmy(DateTx)) %>%
  arrange(id, DateTx) %>%
  group_by(id, TypeTx, Major_complaint) %>%
  summarise(FirstDateTx = first(DateTx), 
    LastDateTx = last(DateTx), 
    First_Grade = first(Grade_major), 
    Last_Grade = last(Grade_major), 
    CountTx = n())
#> `summarise()` has grouped output by 'id', 'TypeTx'. You can override using the `.groups` argument.
#> # A tibble: 7 x 8
#> # Groups:   id, TypeTx [4]
#>      id TypeTx    Major_complaint  FirstDateTx LastDateTx First_Grade Last_Grade
#>   <int> <chr>     <chr>            <date>      <date>           <int>      <int>
#> 1     1 Nutritio… Depression       2020-12-17  2020-12-17           4          4
#> 2     1 Reflexol… Fatigue / exhau… 2021-02-01  2021-03-01           3          3
#> 3     1 Reflexol… Pain             2021-02-08  2021-02-24           4          3
#> 4     2 Acupunct… Neuoropathic pa… 2020-10-21  2020-10-21           4          4
#> 5     2 Acupunct… Neuropathy legs  2020-10-07  2020-10-07           5          5
#> 6     3 Reflexol… Fatigue / exhau… 2021-01-18  2021-01-31           4          4
#> 7     3 Reflexol… Neuropathy legs  2021-02-23  2021-02-23           4          4
#> # … with 1 more variable: CountTx <int>

reprex package (v2.0.0)

于 2021-04-22 创建