使用 gather() 对原始列进行键排序与排序

Question

键顺序是否取决于我是先列出收集列还是 不收集 列？

这是我的 data.frame:

library(tidyr)
wide_df <- data.frame(c("a", "b"), c("oh", "ah"), c("bla", "ble"), stringsAsFactors = FALSE)
colnames(wide_df) <- c("first", "second", "third")
wide_df

 first second third
1     a     oh   bla
2     b     ah   ble

首先，我按特定顺序收集所有列，并且我的顺序在关键列表中被尊重为 第二，第一 ，尽管这些列实际上被排序为 第一，第二：

long_01_df <- gather(wide_df, my_key, my_value, second, first, third)
long_01_df

  my_key my_value
1 second       oh
2 second       ah
3  first        a
4  first        b
5  third      bla
6  third      ble

然后我决定从收集中排除一列：

long_02_df <- gather(wide_df, my_key, my_value, second, first, -third)
long_02_df

 third my_key my_value
1   bla second       oh
2   ble second       ah
3   bla  first        a
4   ble  first        b

密钥再次排序为第二个，第一个。然后我像这样编码，相信会做同样的事情：

long_03_df <- gather(wide_df, my_key, my_value, -third, second, first)
long_03_df

而且我得到的键是按照原来的真实列顺序排序的 data.frame:

 third my_key my_value
1   bla  first        a
2   ble  first        b
3   bla second       oh
4   ble second       ah

当我用 factor_key = TRUE 调用函数时，这种行为甚至没有改变。我缺少什么？

Answer 1

总结

这是因为您不能混合使用负指数和正指数。（你也不应该：它根本没有意义。）如果你这样做，gather() 将忽略一些索引。

详细解答

同样对于标准索引，您不能混合使用正索引和负索引：

x <- 1:10
x[c(4, -2)]
## Error in x[c(4, -2)] : only 0's may be mixed with negative subscripts

这种情况是有道理的：使用 4 进行索引告诉 R 只保留第四个元素。不需要明确告诉它另外扔掉第二个元素。

根据 gather() 的文档，选择列的方式与 dplyr 的 select() 相同。那么让我们来玩玩吧。我将使用 mtcars:

的子集

mtcars <- mtcars[1:2, 1:5]
mtcars
##                mpg cyl disp  hp drat
## Mazda RX4     21.0   6  160 110 3.90
## Mazda RX4 Wag 21.0   6  160 110 3.90

您可以在 select() 中使用正索引和负索引：

select(mtcars, mpg, cyl)
##              mpg cyl
## Mazda RX4      21   6
## Mazda RX4 Wag  21   6

select(mtcars, -mpg, -cyl)
##               disp  hp drat
## Mazda RX4      160 110  3.9
## Mazda RX4 Wag  160 110  3.9

同样对于 select()，混合正指数和负指数是没有意义的。但是 select() 似乎忽略了所有与第一个符号不同的索引，而不是抛出错误：

select(mtcars, mpg, -hp, cyl)
##               mpg cyl
## Mazda RX4      21   6
## Mazda RX4 Wag  21   6

select(mtcars, -mpg, hp, -cyl)
##               disp  hp drat
## Mazda RX4      160 110  3.9
## Mazda RX4 Wag  160 110  3.9

如您所见，结果与之前完全相同。

对于 gather() 的示例，您使用这两行：

long_02_df <- gather(wide_df, my_key, my_value, second, first, -third)
long_03_df <- gather(wide_df, my_key, my_value, -third, second, first)

根据我上面显示的内容，这些行等同于：

long_02_df <- gather(wide_df, my_key, my_value, second, first)
long_03_df <- gather(wide_df, my_key, my_value, -third)

请注意，第二行中没有任何内容表明您首选的按键顺序。它只是说 third 应该被省略。

使用 gather() 对原始列进行键排序与排序

Key ordering vs. ordering of original columns with gather()

r

dataframe

tidyr

总结

详细解答