如果行共享相同的 id 并在 R 中创建新列,则将它们附加在一起

Attaching rows together if they share same id and creating new columns in R

假设我们有以下数据框:

df <- read.table(header=T, text=
'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

如何重新排列此数据框以使其看起来如下所示:

ID  ATM MEN1 BRCA1  RAD51C  CHEK2   MUTYH   BRCA2
1    3    1                 
2             3       2                       2
3                             1     
4                                      1      3

请注意,现在每一行都是一个独特的案例,并且列 Type 用于为新创建的列提供值。

您的数据是long/tidy。你想让它变宽。 R中有很多函数可以做到这一点。常用的是tidyr::pivot_wider(),我在下面演示:

library(tidyverse)


df <- read.table(header=T, text=
                   'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

# Blank cells will be NA
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene,
              values_from = Type) 
#> # A tibble: 4 × 8
#>      ID   ATM  MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <int> <int> <int>  <int> <int> <int> <int>
#> 1     1     3     1    NA     NA    NA    NA    NA
#> 2     2    NA    NA     3      2     2    NA    NA
#> 3     3    NA    NA    NA     NA    NA     1    NA
#> 4     4    NA    NA    NA     NA     3    NA     1

# Blank cells as empty strings ("")
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene, 
              values_from = Type, 
              values_fn = as.character, 
              values_fill = "")
#> # A tibble: 4 × 8
#>      ID ATM   MEN1  BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <chr> <chr> <chr> <chr>  <chr> <chr> <chr>
#> 1     1 "3"   "1"   ""    ""     ""    ""    ""   
#> 2     2 ""    ""    "3"   "2"    "2"   ""    ""   
#> 3     3 ""    ""    ""    ""     ""    "1"   ""   
#> 4     4 ""    ""    ""    ""     "3"   ""    "1"

reprex package (v2.0.1)

于 2022-05-23 创建

编辑:根据@DarrenTsai 的评论简化了第二个解决方案