Flattening/widening 一个数据集,用于在一行中显示单个分析物的多次试验
Flattening/widening a dataset to show multiple trials of a single analyte in one row
我正在尝试 flatten/widen 我的数据框按样本名称排序。每个样本都完成了多次试验,我想将所有试验安排在一行中。
示例数据:
Sample_Name <- c("M1","M1","M1","M1","M2","M2","M2","M2")
test_ID <- c("Gen1 Spec1", "Gen2 Spec2", "Gen2 Spec2", "Gen2 Spec2", "Gen3 Spec3", "Gen3 Spec3", "Gen4 Spec4", "Gen4 Spec4")
MScore <- c(2.2, 1.9, 2.1, 2.0, 1.0, 2.0, 1.4, 1.5)
Test_Data <-data.frame(Sample_Name, test_ID, MScore)
我想要的输出形状:
Target_Sample_Name <-c("M1","M2")
Trial_1_ID <-c("Gen1 Spec1", "Gen2 Spec2")
Trial_1_Score <-c(2.2, 1.0)
Trial_2_ID<-c("Gen2 Spec2", "Gen3 Spec3")
Trial_2_Score<-c(1.9, 2.0)
Trial_3_ID<-c("Gen2 Spec2", "Gen4 Spec4" )
Trial_3_Score<-c(2.1 , 1.4)
Trial_4_ID<-c("Gen2 Spec2","Gen4 Spec4" )
Trial_4_Score<-c(2.0, 1.5 )
Desired_Output <- data.frame(Target_Sample_Name, Trial_1_ID, Trial_1_Score, Trial_2_ID, Trial_2_Score, Trial_3_ID, Trial_3_Score, Trial_4_ID, Trial_4_Score)
我确信有更好的方式来实际展示我想做的事情,但我是超级新手,还没有找到它。
我尝试过使用聚合,但不知道使用什么 FUN。我也试过使用 tibble pivot_wider 函数,但我无法让它工作。我知道这是一种组织数据的奇怪方式,但我保证它在我的项目上下文中是有意义的!
谢谢!
你可以使用
library(dplyr)
library(tidyr)
Test_Data %>%
group_by(Sample_Name) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = Sample_Name,
names_from = rn,
names_glue = "{.value}_{rn}",
values_from = c("test_ID", "MScore")) %>%
rename_with(~gsub("test_ID_(\d+)", "Trail_\1_ID", .x), starts_with("test_ID")) %>%
rename_with(~gsub("MScore_(\d+)", "Trail_\1_Score", .x), starts_with("MScore")) %>%
select(colnames(.)[order(colnames(.))]) %>%
ungroup()
这个returns
# A tibble: 2 x 9
Sample_Name Trail_1_ID Trail_1_Score Trail_2_ID Trail_2_Score Trail_3_ID Trail_3_Score
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 M1 Gen1 Spec1 2.2 Gen2 Spec2 1.9 Gen2 Spec2 2.1
2 M2 Gen3 Spec3 1 Gen3 Spec3 2 Gen4 Spec4 1.4
# ... with 2 more variables: Trail_4_ID <chr>, Trail_4_Score <dbl>
我正在尝试 flatten/widen 我的数据框按样本名称排序。每个样本都完成了多次试验,我想将所有试验安排在一行中。
示例数据:
Sample_Name <- c("M1","M1","M1","M1","M2","M2","M2","M2")
test_ID <- c("Gen1 Spec1", "Gen2 Spec2", "Gen2 Spec2", "Gen2 Spec2", "Gen3 Spec3", "Gen3 Spec3", "Gen4 Spec4", "Gen4 Spec4")
MScore <- c(2.2, 1.9, 2.1, 2.0, 1.0, 2.0, 1.4, 1.5)
Test_Data <-data.frame(Sample_Name, test_ID, MScore)
我想要的输出形状:
Target_Sample_Name <-c("M1","M2")
Trial_1_ID <-c("Gen1 Spec1", "Gen2 Spec2")
Trial_1_Score <-c(2.2, 1.0)
Trial_2_ID<-c("Gen2 Spec2", "Gen3 Spec3")
Trial_2_Score<-c(1.9, 2.0)
Trial_3_ID<-c("Gen2 Spec2", "Gen4 Spec4" )
Trial_3_Score<-c(2.1 , 1.4)
Trial_4_ID<-c("Gen2 Spec2","Gen4 Spec4" )
Trial_4_Score<-c(2.0, 1.5 )
Desired_Output <- data.frame(Target_Sample_Name, Trial_1_ID, Trial_1_Score, Trial_2_ID, Trial_2_Score, Trial_3_ID, Trial_3_Score, Trial_4_ID, Trial_4_Score)
我确信有更好的方式来实际展示我想做的事情,但我是超级新手,还没有找到它。
我尝试过使用聚合,但不知道使用什么 FUN。我也试过使用 tibble pivot_wider 函数,但我无法让它工作。我知道这是一种组织数据的奇怪方式,但我保证它在我的项目上下文中是有意义的!
谢谢!
你可以使用
library(dplyr)
library(tidyr)
Test_Data %>%
group_by(Sample_Name) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = Sample_Name,
names_from = rn,
names_glue = "{.value}_{rn}",
values_from = c("test_ID", "MScore")) %>%
rename_with(~gsub("test_ID_(\d+)", "Trail_\1_ID", .x), starts_with("test_ID")) %>%
rename_with(~gsub("MScore_(\d+)", "Trail_\1_Score", .x), starts_with("MScore")) %>%
select(colnames(.)[order(colnames(.))]) %>%
ungroup()
这个returns
# A tibble: 2 x 9
Sample_Name Trail_1_ID Trail_1_Score Trail_2_ID Trail_2_Score Trail_3_ID Trail_3_Score
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 M1 Gen1 Spec1 2.2 Gen2 Spec2 1.9 Gen2 Spec2 2.1
2 M2 Gen3 Spec3 1 Gen3 Spec3 2 Gen4 Spec4 1.4
# ... with 2 more variables: Trail_4_ID <chr>, Trail_4_Score <dbl>