将列名中包含两条信息的 3x2 列旋转成三列

Question

抱歉 post 标题；我想不出更简洁的方式来描述我的问题。假设我有一个数据集，其中包含每个参与者的三组结果变量。每个结果变量都有两列，一列表示每个观察所属的组，另一列表示该观察的分数或值。像这样

set.seed(1)
d <- tibble(id = factor(rep(c("tb_10",
                              "ah_04",
                              "ck_17"), each = 3)),
            out1Fact = factor(sample(x = letters[1:5],
                                     size = 9,
                                     replace = T)),
            out1Num = rnorm(9),
            out2Fact = factor(sample(x = letters[1:5],
                                     size = 9,
                                     replace = T)),
            out2Num = rnorm(9),
            out3Fact = factor(sample(x = letters[1:5],
                                     size = 9,
                                     replace = T)),
            out3Num = rnorm(9))

d

# output
# # A tibble: 9 x 7
#   id    out1Fact out1Num out2Fact out2Num out3Fact out3Num
#   <fct> <fct>      <dbl> <fct>      <dbl> <fct>      <dbl>
# 1 tb_10 a          0.487 b         0.0746 b         -0.832
# 2 tb_10 d          0.738 a        -1.99   e         -1.17 
# 3 tb_10 a          0.576 d         0.620  b         -1.07 
# 4 ah_04 b         -0.305 a        -0.0561 a         -1.56 
# 5 ah_04 e          1.51  d        -0.156  c          1.16 
# 6 ah_04 c          0.390 c        -1.47   c          0.832
# 7 ck_17 b         -0.621 b        -0.478  d         -0.227
# 8 ck_17 c         -2.21  b         0.418  c          0.266
# 9 ck_17 c          1.12  d         1.36   a         -0.377

现在我需要做的是将其分解为三个变量：第一个表示列名称第一部分中包含的结果（即 out1、out2 或 out3），第二个是观察因子的值（即包含在所有以 'Fact' 结尾的列中），第三个是数值观察值（即包含在所有以 'Fact' 结尾的列中=28=]).

它应该看起来像这样（注意 factVal 和 numVal 中的值与原始数据框中的相应值不匹配，这只是为了向您展示我需要的形状）

# # A tibble: 27 x 4
#    id    outType factVal numVal
#    <fct>   <int> <chr>    <dbl>
# 1  tb_10       1 a        1.10 
# 2  tb_10       1 e        0.144
# 3  tb_10       1 e       -0.118
# 4  tb_10       2 a       -0.912
# 5  tb_10       2 a       -1.44 
# 6  tb_10       2 c       -0.797
# 7  tb_10       3 b        1.25 
# 8  tb_10       3 b        0.772
# 9  tb_10       3 c       -0.220
# 10 ah_04       1 b       -0.425
# # ... with 17 more rows

现在 pivot_longer() 和 pivot_wider() 可以做一些神奇的事情，但我无法做到这一点。我尝试使用“names_pattern”参数，但这个特殊问题超出了我的范围。非常感谢任何帮助。

Answer 1

pivot_longer(d, -id, names_pattern = "out([0-9]+)(.*)", names_to = c("outType", ".value"))
# # A tibble: 27 x 4
#    id    outType Fact      Num
#    <fct> <chr>   <fct>   <dbl>
#  1 tb_10 1       a      0.487 
#  2 tb_10 2       b      0.0746
#  3 tb_10 3       b     -0.832 
#  4 tb_10 1       d      0.738 
#  5 tb_10 2       a     -1.99  
#  6 tb_10 3       e     -1.17  
#  7 tb_10 1       a      0.576 
#  8 tb_10 2       d      0.620 
#  9 tb_10 3       b     -1.07  
# 10 ah_04 1       b     -0.305 
# # ... with 17 more rows

在 names_pattern 中，我们确定了要处理的列名的两个部分：数字部分 ("(0-9]+)") 和该数字之后的所有字符 ("(.*)").这两个模式对应于 names_to 的两个组成部分，并且 ".value" 特殊值映射到一个或多个不同的列（在本例中为两个）。

说的不一样，non-special "outType" 将列名中的数字(1, 2, 3)映射到单个列名"outType"（很明显）。

如果我们可以想象这样的映射

names_pattern = "out([0-9]+)(Fact|Num)"
names_to      = c("outType", "Fact", "Num")
#                            \_ ".value" _/

那么它可能有助于理解如何确定和映射输出列的动态性质。

将列名中包含两条信息的 3x2 列旋转成三列

Pivoting 3x2 columns with two pieces of information in the column name into three columns

r

tidyr

tidyverse