使用具有模式检测功能的另一列替换字符列的值

Replace the values of a character column using another column with pattern detection

我有一个样本数据框 pedigrees 排列成家庭:

pedigrees %>% 
  filter(Family %in% sample(pedigrees$Family, 5)


   Family_ID   Sample_ID                      fatherID       motherID         sex status
   <chr>       <chr>                          <chr>          <chr>          <int>  <int>
 1 MtS.MIPS.61 UCSF_AGG0092_8005439845        0              0                  2      0
 2 MtS.MIPS.61 UCSF_AGG0093_8005439857        0              0                  1      0
 3 MtS.MIPS.61 UCSF_AGG0094_8005439869        AGG0093        AGG0092            2      0
 4 MtS.MIPS.61 UCSF_AGG0095_8005439881        AGG0093        AGG0092            2      2
 5 MtS.MIPS.61 UCSF_AGG0091_8005439928        AGG0093        AGG0092            1      2
 6 FAM048      UCSF_G01-GEA-259-HI_8005440194 G01-GEA-259-PA G01-GEA-259-MA     1      2
 7 FAM048      UCSF_G01-GEA-259-MA_8005440206 0              0                  2      0
 8 FAM048      UCSF_G01-GEA-259-PA_8005440218 0              0                  1      0
 9 F1543       UCSF_F1543-1_8005116638        F1543-3        F1543-2            2      2
10 F1543       UCSF_F1543-2_8005116649        0              0                  2      0
11 F1543       UCSF_F1543-3_8005116661        0              0                  1      0
12 AU0045      UCSF_AU0045201_04C32032A       0              0                  1      0
13 AU0045      UCSF_AU0045202_04C32033A       0              0                  2      0
14 AU0045      UCSF_AU0045301_04C32034A       AU0045201      AU0045202          2      2
15 AU0045      UCSF_AU0045302_04C32035A       AU0045201      AU0045202          1      2
16 1232        UCSF_1232002_8004805191        1232011        1232012            2      2
17 1232        UCSF_1232011_8004805203        0              0                  1      1
18 1232        UCSF_1232012_8004805215        0              0                  2      1

Sample_ID 的格式是列 fatherIDmotherID 也应该具有的格式,例如最后一个家族 1232 实际上会看起来像这样:

16 1232        UCSF_1232002_8004805191        UCSF_1232011_8004805203        UCSF_1232012_8004805215            2      2
17 1232        UCSF_1232011_8004805203        0              0                  1      1
18 1232        UCSF_1232012_8004805215        0              0                  2      1

我知道我应该使用 str_matchgrep,但我如何将其应用于 pedigree 中的所有样本?

如果我没理解错的话。您可以使用 dplyr 执行 group_by,然后根据 mutate 中是否等于 0 来替换 fatherID 和 motherID。我使用 grepl 来查找哪个 Sample_ID 与当前 mother/father ID 匹配。

library(dplyr)

pedigree %>% 
  group_by(Family_ID) %>% 
  mutate(motherID = ifelse(motherID != "0", 
                       Sample_ID[grepl(motherID[motherID != "0"][1], Sample_ID)], 
                       "0"), 
     fatherID = ifelse(fatherID != "0", 
                       Sample_ID[grepl(fatherID[fatherID != "0"][1], Sample_ID)], 
                       "0")
  ) 

# A tibble: 18 x 7
# Groups: Family_ID [5]
#       r Family_ID   Sample_ID                      fatherID                       motherID                    sex status
#   <int> <fct>       <chr>                          <chr>                          <chr>                     <int>  <int>
# 1     1 MtS.MIPS.61 UCSF_AGG0092_8005439845        0                              0                             2      0
# 2     2 MtS.MIPS.61 UCSF_AGG0093_8005439857        0                              0                             1      0
# 3     3 MtS.MIPS.61 UCSF_AGG0094_8005439869        UCSF_AGG0093_8005439857        UCSF_AGG0092_8005439845       2      0
# 4     4 MtS.MIPS.61 UCSF_AGG0095_8005439881        UCSF_AGG0093_8005439857        UCSF_AGG0092_8005439845       2      2
# 5     5 MtS.MIPS.61 UCSF_AGG0091_8005439928        UCSF_AGG0093_8005439857        UCSF_AGG0092_8005439845       1      2
# 6     6 FAM048      UCSF_G01-GEA-259-HI_8005440194 UCSF_G01-GEA-259-PA_8005440218 UCSF_G01-GEA-259-MA_8005~     1      2
# 7     7 FAM048      UCSF_G01-GEA-259-MA_8005440206 0                              0                             2      0
# 8     8 FAM048      UCSF_G01-GEA-259-PA_8005440218 0                              0                             1      0
# 9     9 F1543       UCSF_F1543-1_8005116638        UCSF_F1543-3_8005116661        UCSF_F1543-2_8005116649       2      2
#10    10 F1543       UCSF_F1543-2_8005116649        0                              0                             2      0
#11    11 F1543       UCSF_F1543-3_8005116661        0                              0                             1      0
#12    12 AU0045      UCSF_AU0045201_04C32032A       0                              0                             1      0
#13    13 AU0045      UCSF_AU0045202_04C32033A       0                              0                             2      0
#14    14 AU0045      UCSF_AU0045301_04C32034A       UCSF_AU0045201_04C32032A       UCSF_AU0045202_04C32033A      2      2
#15    15 AU0045      UCSF_AU0045302_04C32035A       UCSF_AU0045201_04C32032A       UCSF_AU0045202_04C32033A      1      2
#16    16 1232        UCSF_1232002_8004805191        UCSF_1232011_8004805203        UCSF_1232012_8004805215       2      2
#17    17 1232        UCSF_1232011_8004805203        0                              0                             1      1
#18    18 1232        UCSF_1232012_8004805215        0                              0                             2      1