dplyr:如何重新排列此数据框并通过提取其他列的部分内容来创建新列

dplyr: How to rearrange this dataframe and create new columns by extracting parts of other columns

假设我有这个数据框

> a
  T..Gene.names Intensity.Mut_125 Intensity.Mut_250 Intensity.Mut.1000 Intensity.Mut.500
1          NCAN               NaN           25.6628            23.8427               NaN
2          AMBP           22.8276           27.0801            25.4740           23.5596
3          CHGB           25.4463           30.0065            27.8181           27.3170
4           APP           25.0346           29.7784            27.0848           24.7314

我需要重新安排我的数据框,以便每个 a$T..Gene.names 对应一个新列。然后,我需要一个名为 a$sample 的新列来提取 Intensitynumber 之间的词(125、250、500、 1000 或 2000)。一个问题是这个词和后面的数字被 ._ 分隔 最后,我需要一个名为 a$volume 的列对应于 number. NA 应转换为 0。

我用 pivot_longerpivot_wider 尝试了几次,但这超出了我目前的技能水平。

预期输出

sample     volume       NCAN         AMBP        CHGB        APP
   Mut        125          0      22.8276     25.4463    25.0346
   Mut        250    25.6638      27.0801     30.0065    29.7784
   Mut        500          0      23.5596     27.3170    24.7314
   Mut       1000    23.8427      25.4740     27.8181    27.0848 

我更喜欢dplyr-解决方案

a <- structure(list(T..Gene.names = c("NCAN", "AMBP", "CHGB", "APP"
), Intensity.Mut_125 = c(NaN, 22.8276, 25.4463, 25.0346), Intensity.Mut_250 = c(25.6628, 
27.0801, 30.0065, 29.7784), Intensity.Mut.1000 = c(23.8427, 25.474, 
27.8181, 27.0848), Intensity.Mut.500 = c(NaN, 23.5596, 27.317, 
24.7314)), row.names = c(NA, 4L), class = "data.frame")
reshape2::recast(a, variable~T..Gene.names,fill = 0) %>%
  separate(variable, c('type','sample', 'volume'))

       type sample volume    AMBP     APP    CHGB    NCAN
1 Intensity    Mut    125 22.8276 25.0346 25.4463  0.0000
2 Intensity    Mut    250 27.0801 29.7784 30.0065 25.6628
3 Intensity    Mut   1000 25.4740 27.0848 27.8181 23.8427
4 Intensity    Mut    500 23.5596 24.7314 27.3170  0.0000

另一种可能的解决方案使用 tidyr

pivot_longer(a, cols = !"T..Gene.names", 
             names_to = c('sample', 'volume'), 
             names_prefix = "Intensity.",
             names_sep = '_|\.',
             values_drop_na = T) %>%
pivot_wider(names_from = "T..Gene.names",
              values_fill = list(value = 0))


# A tibble: 4 × 6
  sample volume  NCAN  AMBP  CHGB   APP
  <chr>  <chr>  <dbl> <dbl> <dbl> <dbl>
1 Mut    250     25.7  27.1  30.0  29.8
2 Mut    1000    23.8  25.5  27.8  27.1
3 Mut    125      0    22.8  25.4  25.0
4 Mut    500      0    23.6  27.3  24.7

names_prefix = "Intensity."删除“强度”。来自列名。

names_sep = '_|\.'._

分隔列名