scale() 与分组 tbl_df 不兼容

scale() not compatible with grouped tbl_df

我试图对 grouped tbl_df 中的值进行标准化,我很惊讶地发现它不起作用。它确实如此,但是可以按预期使用数据帧和常规​​(即未分组)tbl_dfs

示例:

> df <- data.frame(year = c(1999, 2002, 2005, 2008),
                   LA = c(3931.120, 4273.710, 4601.415, 4101.321), 
                   NY = c(346.82000, 134.30882, 130.43038, 88.27546))
> df
  year       LA        NY
1 1999 3931.120 346.82000
2 2002 4273.710 134.30882
3 2005 4601.415 130.43038
4 2008 4101.321  88.27546

这些都工作正常:

> df %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
      year         LA         NY
1 1999 -1.0334913  1.4757715
2 2002  0.1635942 -0.3490598
3 2005  1.3086682 -0.3823639
4 2008 -0.4387712 -0.7443478

> df_tbl <- tbl_df(df)
> df_tbl %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
Source: local data frame [4 x 3]

   year         LA         NY
  (dbl)      (dbl)      (dbl)
1  1999 -1.0334913  1.4757715
2  2002  0.1635942 -0.3490598
3  2005  1.3086682 -0.3823639
4  2008 -0.4387712 -0.7443478

但是一旦分组,函数就失效了:

> df.grouped <- df %>% group_by(year)
> df.grouped %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
Source: local data frame [4 x 3]
Groups: year [4]

   year    LA    NY
  (dbl) (dbl) (dbl)
1  1999    NA    NA
2  2002    NA    NA
3  2005   NaN   NaN
4  2008    NA    NA

df.grouped %>% mutate_each(funs(scale)) # Gives the same result

我做了一些研究,很明显 tbl_df

never simplifies (drops), so always returns data.frame.

但这并不能解释为什么 未分组 tbl_df 可以但分组却不行,特别是因为 ?mutate_each 提到

... vars: Variables to include/exclude in mutate/summarise. You can use same specifications as in select. If missing, defaults to all non-grouping variables.

问题

正如 Alistaire 上面所指出的,这不起作用的原因是:

You can't scale a single value usefully, which is what you're trying to do when it's grouped.

谢谢你,阿利斯泰尔!