使用 group_by 和 summarise_all 为分类变量创建虚拟指标
Using group_by and summarise_all to create dummy indicators for categorical variable
我想为给定的分类变量 fruit 的每个 id 生成虚拟指标。使用 summarise_all 和自定义函数时,我观察到以下警告。我还尝试使用 summarise_all(any) 并且在将 double 强制转换为逻辑时它给了我警告。有没有有效或更新的方法来实现这个?非常感谢!
fruit = c("apple", "banana", "orange", "pear",
"strawberry", "blueberry", "durian",
"grape", "pineapple")
df_sample = data.frame(id = c(rep("a", 3), rep("b", 5), rep("c", 6)),
fruit = c(sample(fruit, replace = T, size = 3),
sample(fruit, replace = T, size = 5),
sample(fruit, replace = T, size = 6)))
fruit_indicator =
model.matrix(~ -1 + fruit, df_sample) %>%
as.data.frame() %>%
bind_cols(df_sample) %>%
select(-fruit) %>%
group_by(id) %>%
summarise_all(funs(ifelse(any(. > 0), 1, 0)))
# Warning message:
# `funs()` is deprecated as of dplyr 0.8.0.
# Please use a list of either functions or lambdas:
#
# # Simple named list:
# list(mean = mean, median = median)
#
# # Auto named with `tibble::lst()`:
# tibble::lst(mean, median)
#
# # Using lambdas
# list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
您可以使用 across
,它在 dplyr
1.0.0 或更高版本中可用。
library(dplyr)
model.matrix(~ -1 + fruit, df_sample) %>%
as.data.frame() %>%
bind_cols(df_sample) %>%
select(-fruit) %>%
group_by(id) %>%
summarise(across(.fns = ~as.integer(any(. > 0))))
# id fruitapple fruitbanana fruitdurian fruitgrape fruitpear
#* <chr> <int> <int> <int> <int> <int>
#1 a 0 1 1 0 1
#2 b 1 0 0 1 0
#3 c 0 1 0 1 1
# … with 1 more variable: fruitpineapple <int>
我想为给定的分类变量 fruit 的每个 id 生成虚拟指标。使用 summarise_all 和自定义函数时,我观察到以下警告。我还尝试使用 summarise_all(any) 并且在将 double 强制转换为逻辑时它给了我警告。有没有有效或更新的方法来实现这个?非常感谢!
fruit = c("apple", "banana", "orange", "pear",
"strawberry", "blueberry", "durian",
"grape", "pineapple")
df_sample = data.frame(id = c(rep("a", 3), rep("b", 5), rep("c", 6)),
fruit = c(sample(fruit, replace = T, size = 3),
sample(fruit, replace = T, size = 5),
sample(fruit, replace = T, size = 6)))
fruit_indicator =
model.matrix(~ -1 + fruit, df_sample) %>%
as.data.frame() %>%
bind_cols(df_sample) %>%
select(-fruit) %>%
group_by(id) %>%
summarise_all(funs(ifelse(any(. > 0), 1, 0)))
# Warning message:
# `funs()` is deprecated as of dplyr 0.8.0.
# Please use a list of either functions or lambdas:
#
# # Simple named list:
# list(mean = mean, median = median)
#
# # Auto named with `tibble::lst()`:
# tibble::lst(mean, median)
#
# # Using lambdas
# list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
您可以使用 across
,它在 dplyr
1.0.0 或更高版本中可用。
library(dplyr)
model.matrix(~ -1 + fruit, df_sample) %>%
as.data.frame() %>%
bind_cols(df_sample) %>%
select(-fruit) %>%
group_by(id) %>%
summarise(across(.fns = ~as.integer(any(. > 0))))
# id fruitapple fruitbanana fruitdurian fruitgrape fruitpear
#* <chr> <int> <int> <int> <int> <int>
#1 a 0 1 1 0 1
#2 b 1 0 0 1 0
#3 c 0 1 0 1 1
# … with 1 more variable: fruitpineapple <int>