如果行共享相同的 id 并在 R 中创建新列,则将它们附加在一起
Attaching rows together if they share same id and creating new columns in R
假设我们有以下数据框:
df <- read.table(header=T, text=
'Patient_ID Gene Type
1 ATM 3
1 MEN1 1
2 BRCA1 3
2 RAD51C 2
2 BRCA2 2
3 CHEK2 1
4 MUTYH 1
4 BRCA2 3', stringsAsFactors=F)
如何重新排列此数据框以使其看起来如下所示:
ID ATM MEN1 BRCA1 RAD51C CHEK2 MUTYH BRCA2
1 3 1
2 3 2 2
3 1
4 1 3
请注意,现在每一行都是一个独特的案例,并且列 Type
用于为新创建的列提供值。
您的数据是long/tidy。你想让它变宽。 R中有很多函数可以做到这一点。常用的是tidyr::pivot_wider()
,我在下面演示:
library(tidyverse)
df <- read.table(header=T, text=
'Patient_ID Gene Type
1 ATM 3
1 MEN1 1
2 BRCA1 3
2 RAD51C 2
2 BRCA2 2
3 CHEK2 1
4 MUTYH 1
4 BRCA2 3', stringsAsFactors=F)
# Blank cells will be NA
df |>
rename(ID = Patient_ID) |>
pivot_wider(names_from = Gene,
values_from = Type)
#> # A tibble: 4 × 8
#> ID ATM MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 3 1 NA NA NA NA NA
#> 2 2 NA NA 3 2 2 NA NA
#> 3 3 NA NA NA NA NA 1 NA
#> 4 4 NA NA NA NA 3 NA 1
# Blank cells as empty strings ("")
df |>
rename(ID = Patient_ID) |>
pivot_wider(names_from = Gene,
values_from = Type,
values_fn = as.character,
values_fill = "")
#> # A tibble: 4 × 8
#> ID ATM MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 "3" "1" "" "" "" "" ""
#> 2 2 "" "" "3" "2" "2" "" ""
#> 3 3 "" "" "" "" "" "1" ""
#> 4 4 "" "" "" "" "3" "" "1"
由 reprex package (v2.0.1)
于 2022-05-23 创建
编辑:根据@DarrenTsai 的评论简化了第二个解决方案
假设我们有以下数据框:
df <- read.table(header=T, text=
'Patient_ID Gene Type
1 ATM 3
1 MEN1 1
2 BRCA1 3
2 RAD51C 2
2 BRCA2 2
3 CHEK2 1
4 MUTYH 1
4 BRCA2 3', stringsAsFactors=F)
如何重新排列此数据框以使其看起来如下所示:
ID ATM MEN1 BRCA1 RAD51C CHEK2 MUTYH BRCA2
1 3 1
2 3 2 2
3 1
4 1 3
请注意,现在每一行都是一个独特的案例,并且列 Type
用于为新创建的列提供值。
您的数据是long/tidy。你想让它变宽。 R中有很多函数可以做到这一点。常用的是tidyr::pivot_wider()
,我在下面演示:
library(tidyverse)
df <- read.table(header=T, text=
'Patient_ID Gene Type
1 ATM 3
1 MEN1 1
2 BRCA1 3
2 RAD51C 2
2 BRCA2 2
3 CHEK2 1
4 MUTYH 1
4 BRCA2 3', stringsAsFactors=F)
# Blank cells will be NA
df |>
rename(ID = Patient_ID) |>
pivot_wider(names_from = Gene,
values_from = Type)
#> # A tibble: 4 × 8
#> ID ATM MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 3 1 NA NA NA NA NA
#> 2 2 NA NA 3 2 2 NA NA
#> 3 3 NA NA NA NA NA 1 NA
#> 4 4 NA NA NA NA 3 NA 1
# Blank cells as empty strings ("")
df |>
rename(ID = Patient_ID) |>
pivot_wider(names_from = Gene,
values_from = Type,
values_fn = as.character,
values_fill = "")
#> # A tibble: 4 × 8
#> ID ATM MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 "3" "1" "" "" "" "" ""
#> 2 2 "" "" "3" "2" "2" "" ""
#> 3 3 "" "" "" "" "" "1" ""
#> 4 4 "" "" "" "" "3" "" "1"
由 reprex package (v2.0.1)
于 2022-05-23 创建编辑:根据@DarrenTsai 的评论简化了第二个解决方案