Combine: rowwise(), mutate(), across(),用于多个函数
Combine: rowwise(), mutate(), across(), for multiple functions
这与此 有某种关系:
原则上,我试图理解 rowwise
如何与 mutate
跨多个列应用多个函数,如 (mean()
、sum()
、min()
等 ..)工作。
我了解到 across
做这项工作而不是 c_across
。
我了解到函数 mean()
与函数 min()
的不同之处在于 mean()
不适用于数据帧,我们需要将其更改为可以使用 unlist 完成的向量或 as.matrix -> 从 Ronak Shah
那里学到
现在以我的实际情况为例:我能够完成这项任务,但我丢失了一列 d
。如何避免此设置中 d
列的松动。
我的 df:
df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b",
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
无效:
df %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
# Output:
a b c d e avg min max
<int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1 1 6 11 a 1 NA 1 a
2 2 7 12 b 2 NA 12 b
3 3 8 13 c 3 NA 13 c
4 4 9 14 d 4 NA 14 d
5 5 10 15 e 5 NA 10 e
有效,但我丢失了专栏 d
:
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
a b c e avg min max
<int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 6 11 1 4.75 1 11
2 2 7 12 2 5.75 2 12
3 3 8 13 3 6.75 3 13
4 4 9 14 4 7.75 4 14
5 5 10 15 5 8.75 5 15
编辑:
这里是最好的出路
df %>%
rowwise() %>%
mutate(min = min(c_across(a:e & where(is.numeric)), na.rm = TRUE),
max = max(c_across(a:e & where(is.numeric)), na.rm = TRUE),
avg = mean(c_across(a:e & where(is.numeric)), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
较早的回答
您的 this will work
甚至无法正常工作,如果您更改输出顺序,请参阅
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE),
avg = mean(unlist(cur_data()), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 5.17
2 2 7 12 2 2 12 6.17
3 3 8 13 3 3 13 7.17
4 4 9 14 4 4 14 8.17
5 5 10 15 5 5 15 9.17
所以建议这样做-
df %>%
select(-d) %>%
rowwise() %>%
mutate(min = min(c_across(a:e), na.rm = TRUE),
max = max(c_across(a:e), na.rm = TRUE),
avg = mean(c_across(a:e), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 4.75
2 2 7 12 2 2 12 5.75
3 3 8 13 3 3 13 6.75
4 4 9 14 4 4 14 7.75
5 5 10 15 5 5 15 8.75
另一种选择是
cols <- c('a', 'b', 'c', 'e')
df %>%
rowwise() %>%
mutate(min = min(c_across(cols), na.rm = TRUE),
max = max(c_across(cols), na.rm = TRUE),
avg = mean(c_across(cols), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
在这些情况下,即使是@Sinh 建议的 group_by 方法也无法正常工作。
使用 purrr
中的 pmap()
可能更可取,因为您只需要 select 数据一次,并且可以使用 select 助手:
df %>%
mutate(pmap_dfr(across(where(is.numeric)),
~ data.frame(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...)))))
a b c d e max min avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 11 1 4.75
2 2 7 12 b 2 12 2 5.75
3 3 8 13 c 3 13 3 6.75
4 4 9 14 d 4 14 4 7.75
5 5 10 15 e 5 15 5 8.75
或加上tidyr
:
df %>%
mutate(res = pmap(across(where(is.numeric)),
~ list(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...))))) %>%
unnest_wider(res)
如果我们想将特定列设置为行名称属性 (column_to_rownames
),然后 return,这里有一种方法可以在 mutate
中保留 data.frame
属性] 改造后的属性
library(dplyr)
library(tibble)
library(purrr)
df %>%
column_to_rownames('d') %>%
mutate(max = reduce(., pmax), min = reduce(., pmin),
avg = rowMeans(.)) %>%
rownames_to_column('d')
# d a b c e max min avg
#1 a 1 6 11 1 11 1 4.75
#2 b 2 7 12 2 12 2 5.75
#3 c 3 8 13 3 13 3 6.75
#4 d 4 9 14 4 14 4 7.75
#5 e 5 10 15 5 15 5 8.75
这与此 rowwise
如何与 mutate
跨多个列应用多个函数,如 (mean()
、sum()
、min()
等 ..)工作。
我了解到 across
做这项工作而不是 c_across
。
我了解到函数 mean()
与函数 min()
的不同之处在于 mean()
不适用于数据帧,我们需要将其更改为可以使用 unlist 完成的向量或 as.matrix -> 从 Ronak Shah
现在以我的实际情况为例:我能够完成这项任务,但我丢失了一列 d
。如何避免此设置中 d
列的松动。
我的 df:
df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b",
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
无效:
df %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
# Output:
a b c d e avg min max
<int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1 1 6 11 a 1 NA 1 a
2 2 7 12 b 2 NA 12 b
3 3 8 13 c 3 NA 13 c
4 4 9 14 d 4 NA 14 d
5 5 10 15 e 5 NA 10 e
有效,但我丢失了专栏 d
:
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
a b c e avg min max
<int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 6 11 1 4.75 1 11
2 2 7 12 2 5.75 2 12
3 3 8 13 3 6.75 3 13
4 4 9 14 4 7.75 4 14
5 5 10 15 5 8.75 5 15
编辑:
这里是最好的出路
df %>%
rowwise() %>%
mutate(min = min(c_across(a:e & where(is.numeric)), na.rm = TRUE),
max = max(c_across(a:e & where(is.numeric)), na.rm = TRUE),
avg = mean(c_across(a:e & where(is.numeric)), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
较早的回答
您的 this will work
甚至无法正常工作,如果您更改输出顺序,请参阅
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE),
avg = mean(unlist(cur_data()), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 5.17
2 2 7 12 2 2 12 6.17
3 3 8 13 3 3 13 7.17
4 4 9 14 4 4 14 8.17
5 5 10 15 5 5 15 9.17
所以建议这样做-
df %>%
select(-d) %>%
rowwise() %>%
mutate(min = min(c_across(a:e), na.rm = TRUE),
max = max(c_across(a:e), na.rm = TRUE),
avg = mean(c_across(a:e), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 4.75
2 2 7 12 2 2 12 5.75
3 3 8 13 3 3 13 6.75
4 4 9 14 4 4 14 7.75
5 5 10 15 5 5 15 8.75
另一种选择是
cols <- c('a', 'b', 'c', 'e')
df %>%
rowwise() %>%
mutate(min = min(c_across(cols), na.rm = TRUE),
max = max(c_across(cols), na.rm = TRUE),
avg = mean(c_across(cols), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
在这些情况下,即使是@Sinh 建议的 group_by 方法也无法正常工作。
使用 purrr
中的 pmap()
可能更可取,因为您只需要 select 数据一次,并且可以使用 select 助手:
df %>%
mutate(pmap_dfr(across(where(is.numeric)),
~ data.frame(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...)))))
a b c d e max min avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 11 1 4.75
2 2 7 12 b 2 12 2 5.75
3 3 8 13 c 3 13 3 6.75
4 4 9 14 d 4 14 4 7.75
5 5 10 15 e 5 15 5 8.75
或加上tidyr
:
df %>%
mutate(res = pmap(across(where(is.numeric)),
~ list(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...))))) %>%
unnest_wider(res)
如果我们想将特定列设置为行名称属性 (column_to_rownames
),然后 return,这里有一种方法可以在 mutate
中保留 data.frame
属性] 改造后的属性
library(dplyr)
library(tibble)
library(purrr)
df %>%
column_to_rownames('d') %>%
mutate(max = reduce(., pmax), min = reduce(., pmin),
avg = rowMeans(.)) %>%
rownames_to_column('d')
# d a b c e max min avg
#1 a 1 6 11 1 11 1 4.75
#2 b 2 7 12 2 12 2 5.75
#3 c 3 8 13 3 13 3 6.75
#4 d 4 9 14 4 14 4 7.75
#5 e 5 10 15 5 15 5 8.75