按条件分组并计数

Question

我有一个数据集，看起来像这样：

id	DateTx	TypeTx	Major_complaint	Grade_major
1	01/02/2021	Reflexology	Fatigue / exhaustion	3
1	01/03/2021	Reflexology	Fatigue / exhaustion	3
1	08/02/2021	Reflexology	Pain	4
1	15/02/2021	Reflexology	Pain	3
1	17/12/2020	Nutrition counseling	Depression	4
1	24/02/2021	Reflexology	Pain	3
2	07/10/2020	Acupuncture	Neuropathy legs	5
2	21/10/2020	Acupuncture	Neuoropathic pain	4
3	18/01/2021	Reflexology	Fatigue / exhaustion	4
3	23/02/2021	Reflexology	Neuropathy legs	4
3	31/01/2021	Reflexology	Fatigue / exhaustion	4

我想按 id、TypeTx、Tx 的第一个和最后一个日期以及成绩（第一个和最后一个）进行分组，所以我希望这样收到：

id	FirstDateTx	LastDateTx	TypeTx	Major_complaint	First_Grade	Last_Grade	CountTx
1	01/02/2021	01/03/2021	Reflexology	Fatigue / exhaustion	3	3	2
1	08/02/2021	15/02/2021	Reflexology	Pain	3	4	3
1	17/12/2020	17/12/2020	Nutrition counseling	Depression	4	4	1
2	07/10/2020	07/10/2020	Acupuncture	Neuropathy legs	5	5	1
2	21/10/2020	21/10/2020	Acupuncture	Neuoropathic pain	4	4	1
3	18/01/2021	31/01/2021	Reflexology	Fatigue / exhaustion	4	4	2
3	23/02/2021	23/02/2021	Reflexology	Neuropathy legs	4	4	1

我试试 dplyr:

Tal_data %>% 
group_by(id) %>% mutate(DateTxStart=min(DateTx), 
                        DateTxEnd=max(DateTx),
                        First_grade= first(Grade_major),
                        Last_grade=last(Grade_major)) %>% 
count(TypeTx, DateTxStart, DateTxEnd, First_grade,Last_grade, Major_complaint)

所以我是这样的：

如您所见，我无法 link 对日期和类型 Tx 进行评分。例如对于 id=1，营养咨询，抑郁等级必须是 4，而不是我的解决方案中的 3 有什么想法吗？

Answer 1

数据输入

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 
  3L), DateTx = c("01/02/2021", "01/03/2021", "08/02/2021", "15/02/2021", 
    "17/12/2020", "24/02/2021", "07/10/2020", "21/10/2020", "18/01/2021", 
    "23/02/2021", "31/01/2021"), TypeTx = c("Reflexology", "Reflexology", 
      "Reflexology", "Reflexology", "Nutrition counseling", "Reflexology", 
      "Acupuncture", "Acupuncture", "Reflexology", "Reflexology", "Reflexology"
    ), Major_complaint = c("Fatigue / exhaustion", "Fatigue / exhaustion", 
      "Pain", "Pain", "Depression", "Pain", "Neuropathy legs", "Neuoropathic pain", 
      "Fatigue / exhaustion", "Neuropathy legs", "Fatigue / exhaustion"
    ), Grade_major = c(3L, 3L, 4L, 3L, 4L, 3L, 5L, 4L, 4L, 4L, 4L
    )), row.names = c(NA, -11L), class = "data.frame")

library(dplyr)

df %>%
  mutate(DateTx = lubridate::dmy(DateTx)) %>%
  arrange(id, DateTx) %>%
  group_by(id, TypeTx, Major_complaint) %>%
  summarise(FirstDateTx = first(DateTx), 
    LastDateTx = last(DateTx), 
    First_Grade = first(Grade_major), 
    Last_Grade = last(Grade_major), 
    CountTx = n())
#> `summarise()` has grouped output by 'id', 'TypeTx'. You can override using the `.groups` argument.
#> # A tibble: 7 x 8
#> # Groups:   id, TypeTx [4]
#>      id TypeTx    Major_complaint  FirstDateTx LastDateTx First_Grade Last_Grade
#>   <int> <chr>     <chr>            <date>      <date>           <int>      <int>
#> 1     1 Nutritio… Depression       2020-12-17  2020-12-17           4          4
#> 2     1 Reflexol… Fatigue / exhau… 2021-02-01  2021-03-01           3          3
#> 3     1 Reflexol… Pain             2021-02-08  2021-02-24           4          3
#> 4     2 Acupunct… Neuoropathic pa… 2020-10-21  2020-10-21           4          4
#> 5     2 Acupunct… Neuropathy legs  2020-10-07  2020-10-07           5          5
#> 6     3 Reflexol… Fatigue / exhau… 2021-01-18  2021-01-31           4          4
#> 7     3 Reflexol… Neuropathy legs  2021-02-23  2021-02-23           4          4
#> # … with 1 more variable: CountTx <int>

^{由 reprex package (v2.0.0)}

于 2021-04-22 创建

按条件分组并计数

Group by and count with condition

r

dplyr

data-transform

data-wrangling