R 将长转换为宽但仅适用于 select 行
R convert long to wide but only for select rows
我有两个类似于这些的数据框:
df<-structure(list(study_id = c(1, 2, 3, 5, 8, 10), mrn = c(123456,
654321, 121212, 212121, 232323, 323232
), gender = c(1, 0, 0, 1, 1, 0), surg_date = structure(c(17003,
17519, 17610, 16800, 18083, 18003), class = "Date"), tobacco_use_at_time_of_sur = c(1,
0, 0, 1, 0, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
df2<-structure(list(mrn = c("123456", "654321", "654321",
"654321", "654321"), Procedures = c("Right Cranioplasty With Custom Made Implant",
"Right Mesh Cranioplasty Insertion", "Removal, Right Sided Cranioplasty, Washout, Complex Wound Closure",
"Revision Right-Sided Peek Cranioplasty\r\nRight Temporalis Muscle Resuspension; Reconstruction Of The Scalp With Scalp Flap 23 X 14 Cm",
"Removal Of Right Sided Cranioplasty Implant, Wound Washout\r\nScalp Flap Closure Of Right Cranial Wound 19 X 14.5"
), surg_date = structure(c(1529452800, 1569196800, 1571875200,
1610496000, 1613779200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Patient Age` = c("68 yrs", "63 yrs", "63 yrs", "63 yrs",
"63 yrs")), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
我最终会尝试合并。 “df”明确表示每位患者 1 行。 “df2”是 'mostly' 每个患者一行,但有些患者是重复的并且在多行中被发现。我正在尝试合并它们,我知道会有很多步骤,比如删除我不关心的列,确保列名和 类 对齐,等等...
但我今天的具体问题是,如果我试图让 df2 明确地为每个患者一行,并为每个患者获取额外的行并将它们转换为额外的列,我该怎么做?
我知道 pivot_wider 会处理很多这样的事情,但我坚持认为:理想情况下,根据最早 surg_date 为每个患者找到的第一行将保留在其原始位置。
理想情况下我的结果应该是这样的:
最后一个小星号是可能有几个列(除了 MRN)我想“不理会”。
使用tidyr::pivot_wider
,首先按组创建一个id,然后创建pivot。
library(dplyr)
library(tidyr)
df2 %>%
mutate(mrn = as.numeric(mrn)) %>%
right_join(select(df, -surg_date), ., by = "mrn") %>%
group_by(mrn) %>%
mutate(id_count = seq(n()) - 1) %>%
pivot_wider(names_from = id_count, values_from = c(Procedures, surg_date, `Patient Age`))
# A tibble: 2 x 16
# Groups: mrn [2]
study_id mrn gender tobacco_use_at_time_of_sur Procedures_0 Procedures_1 Procedures_2 Procedures_3 surg_date_0 surg_date_1
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dttm> <dttm>
1 1 123456 1 1 Right Cranio~ NA NA NA 2018-06-20 00:00:00 NA
2 2 654321 0 0 Right Mesh C~ Removal, Rig~ "Revision Rig~ "Removal Of ~ 2019-09-23 00:00:00 2019-10-24 00:00:00
# ... with 6 more variables: surg_date_2 <dttm>, surg_date_3 <dttm>, Patient Age_0 <chr>, Patient Age_1 <chr>, Patient Age_2 <chr>,
# Patient Age_3 <chr>
不确定此输出如何提供更多信息,但您可以尝试
library(tidyverse)
df2 %>%
group_by(mrn) %>%
mutate(n=1:n()) %>%
pivot_wider(
names_from = n,
names_glue = "{n}_{.value}",
values_from = c(Procedures, surg_date,`Patient Age`)
) %>%
mutate(mrn=as.numeric(mrn)) %>%
right_join(df,by = "mrn")
我有两个类似于这些的数据框:
df<-structure(list(study_id = c(1, 2, 3, 5, 8, 10), mrn = c(123456,
654321, 121212, 212121, 232323, 323232
), gender = c(1, 0, 0, 1, 1, 0), surg_date = structure(c(17003,
17519, 17610, 16800, 18083, 18003), class = "Date"), tobacco_use_at_time_of_sur = c(1,
0, 0, 1, 0, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
df2<-structure(list(mrn = c("123456", "654321", "654321",
"654321", "654321"), Procedures = c("Right Cranioplasty With Custom Made Implant",
"Right Mesh Cranioplasty Insertion", "Removal, Right Sided Cranioplasty, Washout, Complex Wound Closure",
"Revision Right-Sided Peek Cranioplasty\r\nRight Temporalis Muscle Resuspension; Reconstruction Of The Scalp With Scalp Flap 23 X 14 Cm",
"Removal Of Right Sided Cranioplasty Implant, Wound Washout\r\nScalp Flap Closure Of Right Cranial Wound 19 X 14.5"
), surg_date = structure(c(1529452800, 1569196800, 1571875200,
1610496000, 1613779200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Patient Age` = c("68 yrs", "63 yrs", "63 yrs", "63 yrs",
"63 yrs")), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
我最终会尝试合并。 “df”明确表示每位患者 1 行。 “df2”是 'mostly' 每个患者一行,但有些患者是重复的并且在多行中被发现。我正在尝试合并它们,我知道会有很多步骤,比如删除我不关心的列,确保列名和 类 对齐,等等...
但我今天的具体问题是,如果我试图让 df2 明确地为每个患者一行,并为每个患者获取额外的行并将它们转换为额外的列,我该怎么做?
我知道 pivot_wider 会处理很多这样的事情,但我坚持认为:理想情况下,根据最早 surg_date 为每个患者找到的第一行将保留在其原始位置。
理想情况下我的结果应该是这样的:
最后一个小星号是可能有几个列(除了 MRN)我想“不理会”。
使用tidyr::pivot_wider
,首先按组创建一个id,然后创建pivot。
library(dplyr)
library(tidyr)
df2 %>%
mutate(mrn = as.numeric(mrn)) %>%
right_join(select(df, -surg_date), ., by = "mrn") %>%
group_by(mrn) %>%
mutate(id_count = seq(n()) - 1) %>%
pivot_wider(names_from = id_count, values_from = c(Procedures, surg_date, `Patient Age`))
# A tibble: 2 x 16
# Groups: mrn [2]
study_id mrn gender tobacco_use_at_time_of_sur Procedures_0 Procedures_1 Procedures_2 Procedures_3 surg_date_0 surg_date_1
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dttm> <dttm>
1 1 123456 1 1 Right Cranio~ NA NA NA 2018-06-20 00:00:00 NA
2 2 654321 0 0 Right Mesh C~ Removal, Rig~ "Revision Rig~ "Removal Of ~ 2019-09-23 00:00:00 2019-10-24 00:00:00
# ... with 6 more variables: surg_date_2 <dttm>, surg_date_3 <dttm>, Patient Age_0 <chr>, Patient Age_1 <chr>, Patient Age_2 <chr>,
# Patient Age_3 <chr>
不确定此输出如何提供更多信息,但您可以尝试
library(tidyverse)
df2 %>%
group_by(mrn) %>%
mutate(n=1:n()) %>%
pivot_wider(
names_from = n,
names_glue = "{n}_{.value}",
values_from = c(Procedures, surg_date,`Patient Age`)
) %>%
mutate(mrn=as.numeric(mrn)) %>%
right_join(df,by = "mrn")