"number of items to replace is not a multiple of replacement length" 从半长整形到长整形时

"number of items to replace is not a multiple of replacement length" when reshaping from semi-long to long

我想将半长数据帧转换为长格式。但是,在重塑命令之后,有几个警告说“要替换的项目数不是替换长度的倍数”。当我打开新的数据框时,格式基本上是正确的,但它说数据框已损坏。

这是怎么回事?

这是我使用的命令。它明确要求我为 idvar:

插入一个值
df2 = reshape(df,
              direction="long",
              varying=3:ncol(df),
              ids="id",
              idvar="newid",
              timevar="category")

这是我原来的dataframe的结构(其实不止有车和树,还有很多类别):

id  trial  resp.car rt.car color.car resp.tree rt.tree color.tree
 1      1         1    500    "blue"         3     765    "green"
 1      1         3    534   "green"         1     455   "yellow"
 1      2         2    553  "yellow"         2     794      "red"
 1      2         3    577   "black"         3     834     "blue"
 2      1         1    598   "green"         1     756      "red"
 2      1         3    355  "yellow"         3     457    "black"
 2      2         3    876    "blue"         1     767   "yellow"
 2      2         2    466   "black"         1     439    "green"

想要的结果:

id  trial  category   resp        rt     color
 1      1     "car"      1       500    "blue"    
 1      1     "car"      3       534   "green"  
 1      2     "car"      2       553  "yellow"     
 1      2     "car"      3       577   "black"    
 1      1    "tree"      3       765   "green"     
 1      1    "tree"      1       455  "yellow"    
 1      2    "tree"      2       794     "red"     
 1      2    "tree"      3       834    "blue"     
 2      1     "car"      1       598   "green"
 ...

使用 pivot_longer 可能更容易 - 在 cols 中指定要整形为 long 的列,捕获 names_pattern 中的列名子字符串和 [= 中的列名17=]。 .value 将 return 列的值,其中 category 将是从列名中提取的子字符串后缀的列名。正则表达式模式匹配一​​个或多个字符 (.*) 从列名称的开头 (^),捕获 ((..)) 后跟一个点 (\. - 转义因为它是一个匹配任何字符的元字符)后跟第二个捕获组((.*))以匹配后面的所有其他字符

library(tidyr)
pivot_longer(df, cols = -c(id, trial), 
  names_to = c(".value", "category"), names_pattern = "^(.*)\.(.*)")

-输出

# A tibble: 16 × 6
      id trial category  resp    rt color 
   <int> <int> <chr>    <int> <int> <chr> 
 1     1     1 car          1   500 blue  
 2     1     1 tree         3   765 green 
 3     1     1 car          3   534 green 
 4     1     1 tree         1   455 yellow
 5     1     2 car          2   553 yellow
 6     1     2 tree         2   794 red   
 7     1     2 car          3   577 black 
 8     1     2 tree         3   834 blue  
 9     2     1 car          1   598 green 
10     2     1 tree         1   756 red   
11     2     1 car          3   355 yellow
12     2     1 tree         3   457 black 
13     2     2 car          3   876 blue  
14     2     2 tree         1   767 yellow
15     2     2 car          2   466 black 
16     2     2 tree         1   439 green 

使用 reshape,我们可能必须将 varying 作为 list 唯一列与唯一索引 'idvar'

一起传递
out <- reshape(transform(df, idnew = seq_along(id)), 
 idvar = "idnew", varying = list(c(3, 6), c(4,7), c(5,8)), direction="long",
         v.names = c('resp','rt', "color"), timevar = "category")

row.names(out) <- NULL
out
   id trial idnew category resp  rt  color
1   1     1     1        1    1 500   blue
2   1     1     2        1    3 534  green
3   1     2     3        1    2 553 yellow
4   1     2     4        1    3 577  black
5   2     1     5        1    1 598  green
6   2     1     6        1    3 355 yellow
7   2     2     7        1    3 876   blue
8   2     2     8        1    2 466  black
9   1     1     1        2    3 765  green
10  1     1     2        2    1 455 yellow
11  1     2     3        2    2 794    red
12  1     2     4        2    3 834   blue
13  2     1     5        2    1 756    red
14  2     1     6        2    3 457  black
15  2     2     7        2    1 767 yellow
16  2     2     8        2    1 439  green

数据

structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), trial = c(1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L), resp.car = c(1L, 3L, 2L, 3L, 1L, 
3L, 3L, 2L), rt.car = c(500L, 534L, 553L, 577L, 598L, 355L, 876L, 
466L), color.car = c("blue", "green", "yellow", "black", "green", 
"yellow", "blue", "black"), resp.tree = c(3L, 1L, 2L, 3L, 1L, 
3L, 1L, 1L), rt.tree = c(765L, 455L, 794L, 834L, 756L, 457L, 
767L, 439L), color.tree = c("green", "yellow", "red", "blue", 
"red", "black", "yellow", "green")), class = "data.frame", row.names = c(NA, 
-8L))