无法在带有 tibble 的 summarize() 之后在 mutate() 中进行子集化
Unable to subset within mutate() following a summarize() with a tibble
我不知道这是否是处理 tibbles
所独有的行为,我需要以不同的方式对其进行子集化。
library(dplyr)
library(gapminder)
df <- gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp))
这会产生小标题,df
。
# A tibble: 60 x 3
# Groups: year [?]
year continent avg_life
<int> <fct> <dbl>
1 1952 Africa 39.1
2 1952 Americas 53.3
3 1952 Asia 46.3
4 1952 Europe 64.4
5 1952 Oceania 69.3
6 1957 Africa 41.3
7 1957 Americas 56.0
8 1957 Asia 49.3
9 1957 Europe 66.7
10 1957 Oceania 70.3
# ... with 50 more rows
我认为下一步会奏效, 建议应该这样做。
如果我以标准方式对它进行子集化,它会产生预期的输出。
df$avg_life[df$year == 1952]
[1] 39.13550 53.27984 46.31439 64.40850 69.25500
如果我尝试在 mutate()
内执行此操作,它不会产生任何结果。
df <- gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp)) %>%
mutate(life_chg = avg_life - avg_life[year == 1952])
Error in mutate_impl(.data, dots) :
Column life_chg
must be length 5 (the group size) or one, not 0
将 ==
更改为 >
会产生所有 0
,但它至少有效,让我知道一切都已声明。
手动传递应该给我所需输出的内容,还会生成所有 0
.
df <- gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp)) %>%
mutate(life_chg = avg_life - avg_life[c(T, T, T, T, T, rep(F, 55))])
为什么这在 mutate()
中不起作用,您如何正确地做到这一点?我想它与分组和创建变量有关,但我似乎无法找出原因。
df的结构:
str(df)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 60 obs. of 4 variables:
$ year : int 1952 1952 1952 1952 1952 1957 1957 1957 1957 1957 ...
$ continent: Factor w/ 5 levels "Africa","Americas",..: 1 2 3 4 5 1 2 3 4 5 ...
$ avg_life : num 39.1 53.3 46.3 64.4 69.3 ...
$ life_chg : num 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "vars")= chr "year"
- attr(*, "labels")='data.frame': 12 obs. of 1 variable:
..$ year: int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
..- attr(*, "vars")= chr "year"
..- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 12
..$ : int 0 1 2 3 4
..$ : int 5 6 7 8 9
..$ : int 10 11 12 13 14
..$ : int 15 16 17 18 19
..$ : int 20 21 22 23 24
..$ : int 25 26 27 28 29
..$ : int 30 31 32 33 34
..$ : int 35 36 37 38 39
..$ : int 40 41 42 43 44
..$ : int 45 46 47 48 49
..$ : int 50 51 52 53 54
..$ : int 55 56 57 58 59
- attr(*, "drop")= logi TRUE
- attr(*, "group_sizes")= int 5 5 5 5 5 5 5 5 5 5 ...
- attr(*, "biggest_group_size")= int 5
正如joran
所指出的,你必须先ungroup
。
library(dplyr)
library(gapminder)
gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp)) %>%
ungroup(.) %>%
mutate(life_chg = avg_life - avg_life[year == 1952])
# A tibble: 60 x 4
year continent avg_life life_chg
<int> <fct> <dbl> <dbl>
1 1952 Africa 39.1 0
2 1952 Americas 53.3 0
3 1952 Asia 46.3 0
4 1952 Europe 64.4 0
5 1952 Oceania 69.3 0
6 1957 Africa 41.3 2.13
7 1957 Americas 56.0 2.68
8 1957 Asia 49.3 3.00
9 1957 Europe 66.7 2.29
10 1957 Oceania 70.3 1.04
# ... with 50 more rows
我不知道这是否是处理 tibbles
所独有的行为,我需要以不同的方式对其进行子集化。
library(dplyr)
library(gapminder)
df <- gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp))
这会产生小标题,df
。
# A tibble: 60 x 3
# Groups: year [?]
year continent avg_life
<int> <fct> <dbl>
1 1952 Africa 39.1
2 1952 Americas 53.3
3 1952 Asia 46.3
4 1952 Europe 64.4
5 1952 Oceania 69.3
6 1957 Africa 41.3
7 1957 Americas 56.0
8 1957 Asia 49.3
9 1957 Europe 66.7
10 1957 Oceania 70.3
# ... with 50 more rows
我认为下一步会奏效,
如果我以标准方式对它进行子集化,它会产生预期的输出。
df$avg_life[df$year == 1952]
[1] 39.13550 53.27984 46.31439 64.40850 69.25500
如果我尝试在 mutate()
内执行此操作,它不会产生任何结果。
df <- gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp)) %>%
mutate(life_chg = avg_life - avg_life[year == 1952])
Error in mutate_impl(.data, dots) : Column
life_chg
must be length 5 (the group size) or one, not 0
将 ==
更改为 >
会产生所有 0
,但它至少有效,让我知道一切都已声明。
手动传递应该给我所需输出的内容,还会生成所有 0
.
df <- gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp)) %>%
mutate(life_chg = avg_life - avg_life[c(T, T, T, T, T, rep(F, 55))])
为什么这在 mutate()
中不起作用,您如何正确地做到这一点?我想它与分组和创建变量有关,但我似乎无法找出原因。
df的结构:
str(df)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 60 obs. of 4 variables:
$ year : int 1952 1952 1952 1952 1952 1957 1957 1957 1957 1957 ...
$ continent: Factor w/ 5 levels "Africa","Americas",..: 1 2 3 4 5 1 2 3 4 5 ...
$ avg_life : num 39.1 53.3 46.3 64.4 69.3 ...
$ life_chg : num 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "vars")= chr "year"
- attr(*, "labels")='data.frame': 12 obs. of 1 variable:
..$ year: int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
..- attr(*, "vars")= chr "year"
..- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 12
..$ : int 0 1 2 3 4
..$ : int 5 6 7 8 9
..$ : int 10 11 12 13 14
..$ : int 15 16 17 18 19
..$ : int 20 21 22 23 24
..$ : int 25 26 27 28 29
..$ : int 30 31 32 33 34
..$ : int 35 36 37 38 39
..$ : int 40 41 42 43 44
..$ : int 45 46 47 48 49
..$ : int 50 51 52 53 54
..$ : int 55 56 57 58 59
- attr(*, "drop")= logi TRUE
- attr(*, "group_sizes")= int 5 5 5 5 5 5 5 5 5 5 ...
- attr(*, "biggest_group_size")= int 5
正如joran
所指出的,你必须先ungroup
。
library(dplyr)
library(gapminder)
gapminder %>%
group_by(year, continent) %>%
summarize(avg_life = mean(lifeExp)) %>%
ungroup(.) %>%
mutate(life_chg = avg_life - avg_life[year == 1952])
# A tibble: 60 x 4
year continent avg_life life_chg
<int> <fct> <dbl> <dbl>
1 1952 Africa 39.1 0
2 1952 Americas 53.3 0
3 1952 Asia 46.3 0
4 1952 Europe 64.4 0
5 1952 Oceania 69.3 0
6 1957 Africa 41.3 2.13
7 1957 Americas 56.0 2.68
8 1957 Asia 49.3 3.00
9 1957 Europe 66.7 2.29
10 1957 Oceania 70.3 1.04
# ... with 50 more rows