R 将长转换为宽但仅适用于 select 行

R convert long to wide but only for select rows

我有两个类似于这些的数据框:

df<-structure(list(study_id = c(1, 2, 3, 5, 8, 10), mrn = c(123456, 
654321, 121212, 212121, 232323, 323232
), gender = c(1, 0, 0, 1, 1, 0), surg_date = structure(c(17003, 
17519, 17610, 16800, 18083, 18003), class = "Date"), tobacco_use_at_time_of_sur = c(1, 
0, 0, 1, 0, 1)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

df2<-structure(list(mrn = c("123456", "654321", "654321", 
"654321", "654321"), Procedures = c("Right Cranioplasty With Custom Made Implant", 
"Right Mesh Cranioplasty Insertion", "Removal, Right Sided Cranioplasty, Washout, Complex Wound Closure", 
"Revision Right-Sided Peek Cranioplasty\r\nRight Temporalis Muscle Resuspension; Reconstruction Of The Scalp With Scalp Flap 23 X 14 Cm", 
"Removal Of Right Sided Cranioplasty Implant, Wound Washout\r\nScalp Flap Closure Of Right Cranial Wound 19 X 14.5"
), surg_date = structure(c(1529452800, 1569196800, 1571875200, 
1610496000, 1613779200), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `Patient Age` = c("68 yrs", "63 yrs", "63 yrs", "63 yrs", 
"63 yrs")), row.names = c(NA, -5L), class = c("tbl_df", "tbl", 
"data.frame"))

我最终会尝试合并。 “df”明确表示每位患者 1 行。 “df2”是 'mostly' 每个患者一行,但有些患者是重复的并且在多行中被发现。我正在尝试合并它们,我知道会有很多步骤,比如删除我不关心的列,确保列名和 类 对齐,等等...

但我今天的具体问题是,如果我试图让 df2 明确地为每个患者一行,并为每个患者获取额外的行并将它们转换为额外的列,我该怎么做?

我知道 pivot_wider 会处理很多这样的事情,但我坚持认为:理想情况下,根据最早 surg_date 为每个患者找到的第一行将保留在其原始位置。

理想情况下我的结果应该是这样的:

最后一个小星号是可能有几个列(除了 MRN)我想“不理会”。

使用tidyr::pivot_wider,首先按组创建一个id,然后创建pivot。

library(dplyr)
library(tidyr)
df2 %>% 
  mutate(mrn = as.numeric(mrn)) %>% 
  right_join(select(df, -surg_date), ., by = "mrn") %>% 
  group_by(mrn) %>% 
  mutate(id_count = seq(n()) - 1) %>%
  pivot_wider(names_from = id_count, values_from = c(Procedures, surg_date, `Patient Age`))

# A tibble: 2 x 16
# Groups:   mrn [2]
  study_id    mrn gender tobacco_use_at_time_of_sur Procedures_0  Procedures_1  Procedures_2   Procedures_3  surg_date_0         surg_date_1        
     <dbl>  <dbl>  <dbl>                      <dbl> <chr>         <chr>         <chr>          <chr>         <dttm>              <dttm>             
1        1 123456      1                          1 Right Cranio~ NA             NA             NA           2018-06-20 00:00:00 NA                 
2        2 654321      0                          0 Right Mesh C~ Removal, Rig~ "Revision Rig~ "Removal Of ~ 2019-09-23 00:00:00 2019-10-24 00:00:00
# ... with 6 more variables: surg_date_2 <dttm>, surg_date_3 <dttm>, Patient Age_0 <chr>, Patient Age_1 <chr>, Patient Age_2 <chr>,
#   Patient Age_3 <chr>

不确定此输出如何提供更多信息,但您可以尝试

library(tidyverse)
df2 %>% 
  group_by(mrn) %>% 
  mutate(n=1:n()) %>%
  pivot_wider(
    names_from = n,
    names_glue = "{n}_{.value}",
    values_from = c(Procedures, surg_date,`Patient Age`)
  ) %>% 
  mutate(mrn=as.numeric(mrn)) %>% 
  right_join(df,by = "mrn")