"number of items to replace is not a multiple of replacement length" 从半长整形到长整形时
"number of items to replace is not a multiple of replacement length" when reshaping from semi-long to long
我想将半长数据帧转换为长格式。但是,在重塑命令之后,有几个警告说“要替换的项目数不是替换长度的倍数”。当我打开新的数据框时,格式基本上是正确的,但它说数据框已损坏。
这是怎么回事?
这是我使用的命令。它明确要求我为 idvar:
插入一个值
df2 = reshape(df,
direction="long",
varying=3:ncol(df),
ids="id",
idvar="newid",
timevar="category")
这是我原来的dataframe的结构(其实不止有车和树,还有很多类别):
id trial resp.car rt.car color.car resp.tree rt.tree color.tree
1 1 1 500 "blue" 3 765 "green"
1 1 3 534 "green" 1 455 "yellow"
1 2 2 553 "yellow" 2 794 "red"
1 2 3 577 "black" 3 834 "blue"
2 1 1 598 "green" 1 756 "red"
2 1 3 355 "yellow" 3 457 "black"
2 2 3 876 "blue" 1 767 "yellow"
2 2 2 466 "black" 1 439 "green"
想要的结果:
id trial category resp rt color
1 1 "car" 1 500 "blue"
1 1 "car" 3 534 "green"
1 2 "car" 2 553 "yellow"
1 2 "car" 3 577 "black"
1 1 "tree" 3 765 "green"
1 1 "tree" 1 455 "yellow"
1 2 "tree" 2 794 "red"
1 2 "tree" 3 834 "blue"
2 1 "car" 1 598 "green"
...
使用 pivot_longer
可能更容易 - 在 cols
中指定要整形为 long 的列,捕获 names_pattern
中的列名子字符串和 [= 中的列名17=]。 .value
将 return 列的值,其中 category
将是从列名中提取的子字符串后缀的列名。正则表达式模式匹配一个或多个字符 (.*
) 从列名称的开头 (^
),捕获 ((..)
) 后跟一个点 (\.
- 转义因为它是一个匹配任何字符的元字符)后跟第二个捕获组((.*)
)以匹配后面的所有其他字符
library(tidyr)
pivot_longer(df, cols = -c(id, trial),
names_to = c(".value", "category"), names_pattern = "^(.*)\.(.*)")
-输出
# A tibble: 16 × 6
id trial category resp rt color
<int> <int> <chr> <int> <int> <chr>
1 1 1 car 1 500 blue
2 1 1 tree 3 765 green
3 1 1 car 3 534 green
4 1 1 tree 1 455 yellow
5 1 2 car 2 553 yellow
6 1 2 tree 2 794 red
7 1 2 car 3 577 black
8 1 2 tree 3 834 blue
9 2 1 car 1 598 green
10 2 1 tree 1 756 red
11 2 1 car 3 355 yellow
12 2 1 tree 3 457 black
13 2 2 car 3 876 blue
14 2 2 tree 1 767 yellow
15 2 2 car 2 466 black
16 2 2 tree 1 439 green
使用 reshape
,我们可能必须将 varying
作为 list
唯一列与唯一索引 'idvar'
一起传递
out <- reshape(transform(df, idnew = seq_along(id)),
idvar = "idnew", varying = list(c(3, 6), c(4,7), c(5,8)), direction="long",
v.names = c('resp','rt', "color"), timevar = "category")
row.names(out) <- NULL
out
id trial idnew category resp rt color
1 1 1 1 1 1 500 blue
2 1 1 2 1 3 534 green
3 1 2 3 1 2 553 yellow
4 1 2 4 1 3 577 black
5 2 1 5 1 1 598 green
6 2 1 6 1 3 355 yellow
7 2 2 7 1 3 876 blue
8 2 2 8 1 2 466 black
9 1 1 1 2 3 765 green
10 1 1 2 2 1 455 yellow
11 1 2 3 2 2 794 red
12 1 2 4 2 3 834 blue
13 2 1 5 2 1 756 red
14 2 1 6 2 3 457 black
15 2 2 7 2 1 767 yellow
16 2 2 8 2 1 439 green
数据
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), trial = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), resp.car = c(1L, 3L, 2L, 3L, 1L,
3L, 3L, 2L), rt.car = c(500L, 534L, 553L, 577L, 598L, 355L, 876L,
466L), color.car = c("blue", "green", "yellow", "black", "green",
"yellow", "blue", "black"), resp.tree = c(3L, 1L, 2L, 3L, 1L,
3L, 1L, 1L), rt.tree = c(765L, 455L, 794L, 834L, 756L, 457L,
767L, 439L), color.tree = c("green", "yellow", "red", "blue",
"red", "black", "yellow", "green")), class = "data.frame", row.names = c(NA,
-8L))
我想将半长数据帧转换为长格式。但是,在重塑命令之后,有几个警告说“要替换的项目数不是替换长度的倍数”。当我打开新的数据框时,格式基本上是正确的,但它说数据框已损坏。
这是怎么回事?
这是我使用的命令。它明确要求我为 idvar:
插入一个值df2 = reshape(df,
direction="long",
varying=3:ncol(df),
ids="id",
idvar="newid",
timevar="category")
这是我原来的dataframe的结构(其实不止有车和树,还有很多类别):
id trial resp.car rt.car color.car resp.tree rt.tree color.tree
1 1 1 500 "blue" 3 765 "green"
1 1 3 534 "green" 1 455 "yellow"
1 2 2 553 "yellow" 2 794 "red"
1 2 3 577 "black" 3 834 "blue"
2 1 1 598 "green" 1 756 "red"
2 1 3 355 "yellow" 3 457 "black"
2 2 3 876 "blue" 1 767 "yellow"
2 2 2 466 "black" 1 439 "green"
想要的结果:
id trial category resp rt color
1 1 "car" 1 500 "blue"
1 1 "car" 3 534 "green"
1 2 "car" 2 553 "yellow"
1 2 "car" 3 577 "black"
1 1 "tree" 3 765 "green"
1 1 "tree" 1 455 "yellow"
1 2 "tree" 2 794 "red"
1 2 "tree" 3 834 "blue"
2 1 "car" 1 598 "green"
...
使用 pivot_longer
可能更容易 - 在 cols
中指定要整形为 long 的列,捕获 names_pattern
中的列名子字符串和 [= 中的列名17=]。 .value
将 return 列的值,其中 category
将是从列名中提取的子字符串后缀的列名。正则表达式模式匹配一个或多个字符 (.*
) 从列名称的开头 (^
),捕获 ((..)
) 后跟一个点 (\.
- 转义因为它是一个匹配任何字符的元字符)后跟第二个捕获组((.*)
)以匹配后面的所有其他字符
library(tidyr)
pivot_longer(df, cols = -c(id, trial),
names_to = c(".value", "category"), names_pattern = "^(.*)\.(.*)")
-输出
# A tibble: 16 × 6
id trial category resp rt color
<int> <int> <chr> <int> <int> <chr>
1 1 1 car 1 500 blue
2 1 1 tree 3 765 green
3 1 1 car 3 534 green
4 1 1 tree 1 455 yellow
5 1 2 car 2 553 yellow
6 1 2 tree 2 794 red
7 1 2 car 3 577 black
8 1 2 tree 3 834 blue
9 2 1 car 1 598 green
10 2 1 tree 1 756 red
11 2 1 car 3 355 yellow
12 2 1 tree 3 457 black
13 2 2 car 3 876 blue
14 2 2 tree 1 767 yellow
15 2 2 car 2 466 black
16 2 2 tree 1 439 green
使用 reshape
,我们可能必须将 varying
作为 list
唯一列与唯一索引 'idvar'
out <- reshape(transform(df, idnew = seq_along(id)),
idvar = "idnew", varying = list(c(3, 6), c(4,7), c(5,8)), direction="long",
v.names = c('resp','rt', "color"), timevar = "category")
row.names(out) <- NULL
out
id trial idnew category resp rt color
1 1 1 1 1 1 500 blue
2 1 1 2 1 3 534 green
3 1 2 3 1 2 553 yellow
4 1 2 4 1 3 577 black
5 2 1 5 1 1 598 green
6 2 1 6 1 3 355 yellow
7 2 2 7 1 3 876 blue
8 2 2 8 1 2 466 black
9 1 1 1 2 3 765 green
10 1 1 2 2 1 455 yellow
11 1 2 3 2 2 794 red
12 1 2 4 2 3 834 blue
13 2 1 5 2 1 756 red
14 2 1 6 2 3 457 black
15 2 2 7 2 1 767 yellow
16 2 2 8 2 1 439 green
数据
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), trial = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), resp.car = c(1L, 3L, 2L, 3L, 1L,
3L, 3L, 2L), rt.car = c(500L, 534L, 553L, 577L, 598L, 355L, 876L,
466L), color.car = c("blue", "green", "yellow", "black", "green",
"yellow", "blue", "black"), resp.tree = c(3L, 1L, 2L, 3L, 1L,
3L, 1L, 1L), rt.tree = c(765L, 455L, 794L, 834L, 756L, 457L,
767L, 439L), color.tree = c("green", "yellow", "red", "blue",
"red", "black", "yellow", "green")), class = "data.frame", row.names = c(NA,
-8L))