Reshape(splitstackshape)中的错误?
Bug in Reshape (splitstackshape)?
我相当确定这是一个错误,但我只是想先将其发布到社区。在 splitstackshape 包的 Reshape
函数的示例页面中:
set.seed(1)
mydf <- data.frame(id_1 = 1:6, id_2 = c("A", "B"), varA.1 = sample(letters, 6),
varA.2 = sample(letters, 6), varA.3 = sample(letters, 6),
varB.2 = sample(10, 6), varB.3 = sample(10, 6),
varC.3 = rnorm(6))
mydf
id_1 id_2 varA.1 varA.2 varA.3 varB.2 varB.3 varC.3
1 1 A g y r 4 3 -0.04493361
2 2 B j q j 7 4 -0.01619026
3 3 A n p s 8 1 0.94383621
4 4 B u b l 2 10 0.82122120
5 5 A e e p 10 6 0.59390132
6 6 B s d u 1 2 0.91897737
然后,
## Note that these data are unbalanced
## reshape() will not work
## Not run:
reshape(mydf, direction = "long", idvar=1:2, varying=3:ncol(mydf))
## End(Not run)
## The Reshape() function can handle such scenarios
Reshape(mydf, id.vars = c("id_1", "id_2"),
var.stubs = c("varA", "varB", "varC"))
id_1 id_2 time varA varB varC
1: 1 A 1 g 4 -0.04493361
2: 2 B 1 j 7 -0.01619026
3: 3 A 1 n 8 0.94383621
4: 4 B 1 u 2 0.82122120
5: 5 A 1 e 10 0.59390132
6: 6 B 1 s 1 0.91897737
7: 1 A 2 y 3 NA
8: 2 B 2 q 4 NA
9: 3 A 2 p 1 NA
10: 4 B 2 b 10 NA
11: 5 A 2 e 6 NA
12: 6 B 2 d 2 NA
13: 1 A 3 r NA NA
14: 2 B 3 j NA NA
15: 3 A 3 s NA NA
16: 4 B 3 l NA NA
17: 5 A 3 p NA NA
18: 6 B 3 u NA NA
但是根据宽格式的变量名(准确地说是数字后缀),输出不应该是:
id_1 id_2 time varA varB varC
1: 1 A 1 g NA NA
2: 2 B 1 j NA NA
3: 3 A 1 n NA NA
4: 4 B 1 u NA NA
5: 5 A 1 e NA NA
6: 6 B 1 s NA NA
7: 1 A 2 y 4 NA
8: 2 B 2 q 7 NA
9: 3 A 2 p 8 NA
10: 4 B 2 b 2 NA
11: 5 A 2 e 10 NA
12: 6 B 2 d 1 NA
13: 1 A 3 r 3 -0.04493361
14: 2 B 3 j 4 -0.01619026
15: 3 A 3 s 1 0.94383621
16: 4 B 3 l 10 0.82122120
17: 5 A 3 p 6 0.59390132
18: 6 B 3 u 2 0.91897737
由于在所有三个时间点(1、2 和 3)都测量了 VarA,因此在时间点 2 和 3 测量了 VarB,而 VarC 仅在时间点 3 测量。所以我错过了一些明显的东西。 ..
tidyr 版本似乎正确:
> library(tidyr)
> mydf %>% gather(key="variable", value="value", varA.1:varC.3) %>%
+ separate(variable, into=c("variable","time")) %>%
+ spread("variable", "value")
id_1 id_2 time varA varB varC
1 1 A 1 g <NA> <NA>
2 1 A 2 y 4 <NA>
3 1 A 3 r 3 -0.0449336090152309
4 2 B 1 j <NA> <NA>
5 2 B 2 q 7 <NA>
6 2 B 3 j 4 -0.0161902630989461 ...
这已在 1.4.4 版中修复,现在可在 CRAN 上使用。感谢您报告错误。
update.packages()
之后,您应该能够得到以下内容:
packageVersion("splitstackshape")
## [1] ‘1.4.4’
Reshape(mydf, id.vars = c("id_1", "id_2"), var.stubs = c("varA", "varB", "varC"))
## id_1 id_2 time varA varB varC
## 1: 1 A 1 g NA NA
## 2: 2 B 1 j NA NA
## 3: 3 A 1 n NA NA
## 4: 4 B 1 u NA NA
## 5: 5 A 1 e NA NA
## 6: 6 B 1 s NA NA
## 7: 1 A 2 y 4 NA
## 8: 2 B 2 q 7 NA
## 9: 3 A 2 p 8 NA
## 10: 4 B 2 b 2 NA
## 11: 5 A 2 e 10 NA
## 12: 6 B 2 d 1 NA
## 13: 1 A 3 r 3 -0.04493361
## 14: 2 B 3 j 4 -0.01619026
## 15: 3 A 3 s 1 0.94383621
## 16: 4 B 3 l 10 0.82122120
## 17: 5 A 3 p 6 0.59390132
## 18: 6 B 3 u 2 0.91897737
我相当确定这是一个错误,但我只是想先将其发布到社区。在 splitstackshape 包的 Reshape
函数的示例页面中:
set.seed(1)
mydf <- data.frame(id_1 = 1:6, id_2 = c("A", "B"), varA.1 = sample(letters, 6),
varA.2 = sample(letters, 6), varA.3 = sample(letters, 6),
varB.2 = sample(10, 6), varB.3 = sample(10, 6),
varC.3 = rnorm(6))
mydf
id_1 id_2 varA.1 varA.2 varA.3 varB.2 varB.3 varC.3
1 1 A g y r 4 3 -0.04493361
2 2 B j q j 7 4 -0.01619026
3 3 A n p s 8 1 0.94383621
4 4 B u b l 2 10 0.82122120
5 5 A e e p 10 6 0.59390132
6 6 B s d u 1 2 0.91897737
然后,
## Note that these data are unbalanced
## reshape() will not work
## Not run:
reshape(mydf, direction = "long", idvar=1:2, varying=3:ncol(mydf))
## End(Not run)
## The Reshape() function can handle such scenarios
Reshape(mydf, id.vars = c("id_1", "id_2"),
var.stubs = c("varA", "varB", "varC"))
id_1 id_2 time varA varB varC
1: 1 A 1 g 4 -0.04493361
2: 2 B 1 j 7 -0.01619026
3: 3 A 1 n 8 0.94383621
4: 4 B 1 u 2 0.82122120
5: 5 A 1 e 10 0.59390132
6: 6 B 1 s 1 0.91897737
7: 1 A 2 y 3 NA
8: 2 B 2 q 4 NA
9: 3 A 2 p 1 NA
10: 4 B 2 b 10 NA
11: 5 A 2 e 6 NA
12: 6 B 2 d 2 NA
13: 1 A 3 r NA NA
14: 2 B 3 j NA NA
15: 3 A 3 s NA NA
16: 4 B 3 l NA NA
17: 5 A 3 p NA NA
18: 6 B 3 u NA NA
但是根据宽格式的变量名(准确地说是数字后缀),输出不应该是:
id_1 id_2 time varA varB varC
1: 1 A 1 g NA NA
2: 2 B 1 j NA NA
3: 3 A 1 n NA NA
4: 4 B 1 u NA NA
5: 5 A 1 e NA NA
6: 6 B 1 s NA NA
7: 1 A 2 y 4 NA
8: 2 B 2 q 7 NA
9: 3 A 2 p 8 NA
10: 4 B 2 b 2 NA
11: 5 A 2 e 10 NA
12: 6 B 2 d 1 NA
13: 1 A 3 r 3 -0.04493361
14: 2 B 3 j 4 -0.01619026
15: 3 A 3 s 1 0.94383621
16: 4 B 3 l 10 0.82122120
17: 5 A 3 p 6 0.59390132
18: 6 B 3 u 2 0.91897737
由于在所有三个时间点(1、2 和 3)都测量了 VarA,因此在时间点 2 和 3 测量了 VarB,而 VarC 仅在时间点 3 测量。所以我错过了一些明显的东西。 ..
tidyr 版本似乎正确:
> library(tidyr)
> mydf %>% gather(key="variable", value="value", varA.1:varC.3) %>%
+ separate(variable, into=c("variable","time")) %>%
+ spread("variable", "value")
id_1 id_2 time varA varB varC
1 1 A 1 g <NA> <NA>
2 1 A 2 y 4 <NA>
3 1 A 3 r 3 -0.0449336090152309
4 2 B 1 j <NA> <NA>
5 2 B 2 q 7 <NA>
6 2 B 3 j 4 -0.0161902630989461 ...
这已在 1.4.4 版中修复,现在可在 CRAN 上使用。感谢您报告错误。
update.packages()
之后,您应该能够得到以下内容:
packageVersion("splitstackshape")
## [1] ‘1.4.4’
Reshape(mydf, id.vars = c("id_1", "id_2"), var.stubs = c("varA", "varB", "varC"))
## id_1 id_2 time varA varB varC
## 1: 1 A 1 g NA NA
## 2: 2 B 1 j NA NA
## 3: 3 A 1 n NA NA
## 4: 4 B 1 u NA NA
## 5: 5 A 1 e NA NA
## 6: 6 B 1 s NA NA
## 7: 1 A 2 y 4 NA
## 8: 2 B 2 q 7 NA
## 9: 3 A 2 p 8 NA
## 10: 4 B 2 b 2 NA
## 11: 5 A 2 e 10 NA
## 12: 6 B 2 d 1 NA
## 13: 1 A 3 r 3 -0.04493361
## 14: 2 B 3 j 4 -0.01619026
## 15: 3 A 3 s 1 0.94383621
## 16: 4 B 3 l 10 0.82122120
## 17: 5 A 3 p 6 0.59390132
## 18: 6 B 3 u 2 0.91897737