scale() 与分组 tbl_df 不兼容
scale() not compatible with grouped tbl_df
我试图对 grouped tbl_df
中的值进行标准化,我很惊讶地发现它不起作用。它确实如此,但是可以按预期使用数据帧和常规(即未分组)tbl_df
s
示例:
> df <- data.frame(year = c(1999, 2002, 2005, 2008),
LA = c(3931.120, 4273.710, 4601.415, 4101.321),
NY = c(346.82000, 134.30882, 130.43038, 88.27546))
> df
year LA NY
1 1999 3931.120 346.82000
2 2002 4273.710 134.30882
3 2005 4601.415 130.43038
4 2008 4101.321 88.27546
这些都工作正常:
> df %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
year LA NY
1 1999 -1.0334913 1.4757715
2 2002 0.1635942 -0.3490598
3 2005 1.3086682 -0.3823639
4 2008 -0.4387712 -0.7443478
> df_tbl <- tbl_df(df)
> df_tbl %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
Source: local data frame [4 x 3]
year LA NY
(dbl) (dbl) (dbl)
1 1999 -1.0334913 1.4757715
2 2002 0.1635942 -0.3490598
3 2005 1.3086682 -0.3823639
4 2008 -0.4387712 -0.7443478
但是一旦分组,函数就失效了:
> df.grouped <- df %>% group_by(year)
> df.grouped %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
Source: local data frame [4 x 3]
Groups: year [4]
year LA NY
(dbl) (dbl) (dbl)
1 1999 NA NA
2 2002 NA NA
3 2005 NaN NaN
4 2008 NA NA
df.grouped %>% mutate_each(funs(scale)) # Gives the same result
我做了一些研究,很明显 tbl_df
never simplifies (drops), so always returns data.frame.
但这并不能解释为什么 未分组 tbl_df
可以但分组却不行,特别是因为 ?mutate_each
提到
... vars: Variables to include/exclude in mutate/summarise. You can use same specifications as in select. If missing, defaults to all non-grouping variables.
问题
解决问题的唯一方法是像下面那样将 ungroup
添加到管道中吗?
df.grouped %>% ungroup %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
# OR
df.grouped %>% ungroup %>% mutate_each(funs(scale))
正如 Alistaire 上面所指出的,这不起作用的原因是:
You can't scale a single value usefully, which is what you're trying to do when it's grouped.
谢谢你,阿利斯泰尔!
我试图对 grouped tbl_df
中的值进行标准化,我很惊讶地发现它不起作用。它确实如此,但是可以按预期使用数据帧和常规(即未分组)tbl_df
s
示例:
> df <- data.frame(year = c(1999, 2002, 2005, 2008),
LA = c(3931.120, 4273.710, 4601.415, 4101.321),
NY = c(346.82000, 134.30882, 130.43038, 88.27546))
> df
year LA NY
1 1999 3931.120 346.82000
2 2002 4273.710 134.30882
3 2005 4601.415 130.43038
4 2008 4101.321 88.27546
这些都工作正常:
> df %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
year LA NY
1 1999 -1.0334913 1.4757715
2 2002 0.1635942 -0.3490598
3 2005 1.3086682 -0.3823639
4 2008 -0.4387712 -0.7443478
> df_tbl <- tbl_df(df)
> df_tbl %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
Source: local data frame [4 x 3]
year LA NY
(dbl) (dbl) (dbl)
1 1999 -1.0334913 1.4757715
2 2002 0.1635942 -0.3490598
3 2005 1.3086682 -0.3823639
4 2008 -0.4387712 -0.7443478
但是一旦分组,函数就失效了:
> df.grouped <- df %>% group_by(year)
> df.grouped %>% mutate_each_(funs(scale), vars = c('LA', 'NY'))
Source: local data frame [4 x 3]
Groups: year [4]
year LA NY
(dbl) (dbl) (dbl)
1 1999 NA NA
2 2002 NA NA
3 2005 NaN NaN
4 2008 NA NA
df.grouped %>% mutate_each(funs(scale)) # Gives the same result
我做了一些研究,很明显 tbl_df
never simplifies (drops), so always returns data.frame.
但这并不能解释为什么 未分组 tbl_df
可以但分组却不行,特别是因为 ?mutate_each
提到
... vars: Variables to include/exclude in mutate/summarise. You can use same specifications as in select. If missing, defaults to all non-grouping variables.
问题
解决问题的唯一方法是像下面那样将
ungroup
添加到管道中吗?df.grouped %>% ungroup %>% mutate_each_(funs(scale), vars = c('LA', 'NY')) # OR df.grouped %>% ungroup %>% mutate_each(funs(scale))
正如 Alistaire 上面所指出的,这不起作用的原因是:
You can't scale a single value usefully, which is what you're trying to do when it's grouped.
谢谢你,阿利斯泰尔!