为什么基本比例不能与 Tibble 一起使用?

Why isn't base scale working with Tibble?

我有一个使用 readxl 从 excel 导入的数据集,称为 GSMA。检查对象 returns 的 class:

    class(GSMA)
[1] "tbl_df"     "tbl"        "data.frame"

我想使用基本比例对第 2 列到第 4 列进行标准化。我试试 运行ning:

GSMA[2:4] <- scale(GSMA[2:4])

这会导致数据框缩放不正确,每一行的所有列都具有相同的值。

问题的潜在线索:当我尝试对缩放不正确的数据帧进行排序时,返回此错误:

Error in xj[i, , drop = FALSE] : subscript out of bounds

当我重新导入同一个数据集,然后运行:

GSMA <- as.data.frame(GSMA)
GSMA[2:4] <- scale(GSMA[2:4])

数据框列正确缩放。

这是怎么回事?为什么基本比例在第一个实例中不起作用?

dput(head(GSMA))

structure(list(Country = c("GBR", "CHE", "DEU", "ROU", "LUX", 
"KAZ"), entry = c(98.4974384307861, 95.6549962361654, 91.4044539133708, 
90.8518393834432, 90.4088099797567, 88.0471547444662), medium = c(86.0081672668457, 
93.0372142791748, 91.2993144989014, 100, 96.7348480224609, 100
), high = c(74.6774760159579, 84.1793060302734, 79.542350769043, 
99.6931856328791, 97.031680020419, 92.5396745855158)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

奇怪的是,这是正确的:

> scale(head(GSMA[2:4]))
          entry     medium       high
[1,]  1.5644225 -1.5528676 -1.3233285
[2,]  0.8257534 -0.2694974 -0.3755223
[3,] -0.2788406 -0.5868048 -0.8380579
[4,] -0.4224492  1.0017748  1.1719851
[5,] -0.5375798  0.4056202  0.9065003
[6,] -1.1513063  1.0017748  0.4584233
attr(,"scaled:center")
   entry   medium     high 
92.47745 94.51326 87.94395 
attr(,"scaled:scale")
    entry    medium      high 
 3.848059  5.477022 10.025077 

但这不是:

> GSMA[2:4] <- scale(GSMA[2:4])
> head(GSMA)
# A tibble: 6 x 4
  Country entry[,"entry"] [,"medium"] [,"high"] medium[,"entry"] [,"medium"] [,"high"]
  <chr>             <dbl>       <dbl>     <dbl>            <dbl>       <dbl>     <dbl>
1 GBR                2.13        1.25     0.870             2.13        1.25     0.870
2 CHE                2.00        1.52     1.27              2.00        1.52     1.27 
3 DEU                1.80        1.46     1.07              1.80        1.46     1.07 
4 ROU                1.78        1.80     1.92              1.78        1.80     1.92 
5 LUX                1.76        1.67     1.81              1.76        1.67     1.81 
6 KAZ                1.65        1.80     1.62              1.65        1.80     1.62 
# ... with 3 more variables: high[,"entry"] <dbl>, [,"medium"] <dbl>, [,"high"] <dbl>

Tibble 3.0.0 的已知问题。恢复到 2.1.3 的旧行为。

或者:

library(tibble)
iris <- as_tibble(iris)
scale <- scale(iris[1:3])
class(scale)
#> [1] "matrix"
iris[1:3] <- as.data.frame(scale)