如何重新格式化数据,使 ID 对应两行,一行包含示例源,第二行包含源的结果
How to reformat data so ID corresponds to two rows, one row contains sample source, the second row contains the result for the source
我正在处理一个包含约 2500 个唯一 ID 的临床数据集。一些 ID 对应于 20 多次出现。我希望看到样本类型(NP、喉咙等)以及“未检测到”或“检测到”的测试结果,但我希望看到它们分布在多个列中并且 ID 为基本上是两排。第一行是每次出现的所有样本类型,然后第二行是每次出现的结果。我可以得到第一行没问题,但我一直无法弄清楚如何在同一 ID 上添加第二行,结果低于相应的样本类型。任何帮助将不胜感激!
ID <- c(1,1,2,2,3,3,3,4)
Type<-c("EM","EM","PA","PA","PA","PA","PA","EM")
Specimen_Type <- c("NP", "NP", "Throat", "Throat", "NP", "Throat", "Throat", "NP")
RESULT_VAL <- c("Not Detected", "Detected", "Not Detected", "Detected", "Not Detected", "Not Detected", "Detected", "Not Detected")
RESULT_DATE <- c("6-1-2020", "6-10-2020","6-1-2020", "6-10-2020","6-1-2020", "6-10-2020", "6-20-2020", "6-1-2020")
Data_sum<- data.frame(ID, Type, Specimen_Type, RESULT_VAL, RESULT_DATE)
我希望它看起来像
ID Type Occurrence_1 Occurrence_2 Occurrence_3
1 EM NP NP
1 EM Not Detected Detected
2 PA Throat Throat
2 PA Not Detected Detected
3 PA NP Throat Throat
3 PA Not Detected Not Detected Detected
4 EM NP
4 EM Not Detected
我们可以重新整形为 'long',然后再整形为 'wide'
library(dplyr)
library(stringr)
library(tidyr)
library(data.table)
Data_sum %>%
pivot_longer(cols = c(Specimen_Type, RESULT_VAL)) %>%
arrange(ID, Type,
factor(name, levels = c('Specimen_Type', 'RESULT_VAL'))) %>%
mutate(rn = str_c('Occurence_', rowid(ID, Type, name))) %>%
select(-RESULT_DATE) %>%
pivot_wider(names_from = rn, values_from = value) %>%
select(-name)
# A tibble: 8 x 5
# ID Type Occurence_1 Occurence_2 Occurence_3
# <dbl> <chr> <chr> <chr> <chr>
#1 1 EM NP NP <NA>
#2 1 EM Not Detected Detected <NA>
#3 2 PA Throat Throat <NA>
#4 2 PA Not Detected Detected <NA>
#5 3 PA NP Throat Throat
#6 3 PA Not Detected Not Detected Detected
#7 4 EM NP <NA> <NA>
#8 4 EM Not Detected <NA> <NA>
我正在处理一个包含约 2500 个唯一 ID 的临床数据集。一些 ID 对应于 20 多次出现。我希望看到样本类型(NP、喉咙等)以及“未检测到”或“检测到”的测试结果,但我希望看到它们分布在多个列中并且 ID 为基本上是两排。第一行是每次出现的所有样本类型,然后第二行是每次出现的结果。我可以得到第一行没问题,但我一直无法弄清楚如何在同一 ID 上添加第二行,结果低于相应的样本类型。任何帮助将不胜感激!
ID <- c(1,1,2,2,3,3,3,4)
Type<-c("EM","EM","PA","PA","PA","PA","PA","EM")
Specimen_Type <- c("NP", "NP", "Throat", "Throat", "NP", "Throat", "Throat", "NP")
RESULT_VAL <- c("Not Detected", "Detected", "Not Detected", "Detected", "Not Detected", "Not Detected", "Detected", "Not Detected")
RESULT_DATE <- c("6-1-2020", "6-10-2020","6-1-2020", "6-10-2020","6-1-2020", "6-10-2020", "6-20-2020", "6-1-2020")
Data_sum<- data.frame(ID, Type, Specimen_Type, RESULT_VAL, RESULT_DATE)
我希望它看起来像
ID Type Occurrence_1 Occurrence_2 Occurrence_3
1 EM NP NP
1 EM Not Detected Detected
2 PA Throat Throat
2 PA Not Detected Detected
3 PA NP Throat Throat
3 PA Not Detected Not Detected Detected
4 EM NP
4 EM Not Detected
我们可以重新整形为 'long',然后再整形为 'wide'
library(dplyr)
library(stringr)
library(tidyr)
library(data.table)
Data_sum %>%
pivot_longer(cols = c(Specimen_Type, RESULT_VAL)) %>%
arrange(ID, Type,
factor(name, levels = c('Specimen_Type', 'RESULT_VAL'))) %>%
mutate(rn = str_c('Occurence_', rowid(ID, Type, name))) %>%
select(-RESULT_DATE) %>%
pivot_wider(names_from = rn, values_from = value) %>%
select(-name)
# A tibble: 8 x 5
# ID Type Occurence_1 Occurence_2 Occurence_3
# <dbl> <chr> <chr> <chr> <chr>
#1 1 EM NP NP <NA>
#2 1 EM Not Detected Detected <NA>
#3 2 PA Throat Throat <NA>
#4 2 PA Not Detected Detected <NA>
#5 3 PA NP Throat Throat
#6 3 PA Not Detected Not Detected Detected
#7 4 EM NP <NA> <NA>
#8 4 EM Not Detected <NA> <NA>