如何使用 dplyr r 为 grouped_tbl 中的 select 列改变行均值的新列?
How to mutate a new column with row means for select columns in grouped_tbl using dplyr r?
我的大数据集中有一个分组数据框,其中包含约 800 列和约 250 万条记录。我正在尝试创建一个行均值列,每个列仅包含 5-10 列,但不确定为什么,我一直将 NA
作为所有行的均值。
这是我的尝试:
clean_bmk <- clean_bmk %>%
rowwise() %>%
mutate(
BMK_Mean_Strategic = mean(!!strategic, na.rm = T),
BMK_Mean_DiffChange = mean(!!diffchange, na.rm = T),
BMK_Mean_Failure = mean(!!failure, na.rm = T),
BMK_Mean_Narrow = mean(!!narrow, na.rm = T),
BMK_R1_Performance = mean(!!performance_vars, na.rm=T),
BMK_R2_Promotion = mean(!!promote_vars, na.rm=T),
BMK_R3_Derail = mean(!!derail_vars, na.rm=T))
class(clean_bmk)
[1] "grouped_df" "tbl_df" "tbl" "data.frame"
当我这样做时,所有突变的列都是 NA。但是,以下工作:
clean_bmk$Strategic_Mean <- rowMeans(clean_bmk[,strategic], na.rm=T)
不知道为什么,我怎样才能创建一个函数,这样我只能发送包含列名的变量列表,并改变数据框中的列?
例如:
strategic <- c("column1", "column15", "column27")
与其他变量类似,如 diffchange
、failure
等
我尝试dput(clean_bmk)
与您分享数据,但由于数据集很大,我无法获取。我猜是因为它是 grouped_df
,我无法使用 [[
或 sample()
数据集。
使用 rowwise
效率低下,更好的选择是 rowMeans
在 select
感兴趣的列
之后
library(dplyr)
clean_bmk %>%
ungroup %>%
mutate(
BMK_Mean_Strategic = rowMeans(select(., strategic), na.rm = TRUE),
BMK_Mean_DiffChange = rowMeans(select(., diffchange), na.rm = TRUE),
BMK_Mean_Failure = rowMeans(select(., failure), na.rm = TRUE),
BMK_Mean_Narrow = rowMeans(select(., narrow), na.rm = TRUE),
BMK_R1_Performance = rowMeans(select(., performance_vars), na.rm=TRUE),
BMK_R2_Promotion = rowMeans(select(., promote_vars), na.rm=TRUE),
BMK_R3_Derail = rowMeans(select(., derail_vars), na.rm=TRUE))
使用可重现的例子
data(mtcars)
#v1 <- c('mpg', 'disp')
mtcars %>%
transmute(newMean = rowMeans(select(., v1), na.rm = TRUE)) %>%
head
# newMean
#Mazda RX4 90.50
#Mazda RX4 Wag 90.50
#Datsun 710 65.40
#Hornet 4 Drive 139.70
#Hornet Sportabout 189.35
#Valiant 121.55
我的大数据集中有一个分组数据框,其中包含约 800 列和约 250 万条记录。我正在尝试创建一个行均值列,每个列仅包含 5-10 列,但不确定为什么,我一直将 NA
作为所有行的均值。
这是我的尝试:
clean_bmk <- clean_bmk %>%
rowwise() %>%
mutate(
BMK_Mean_Strategic = mean(!!strategic, na.rm = T),
BMK_Mean_DiffChange = mean(!!diffchange, na.rm = T),
BMK_Mean_Failure = mean(!!failure, na.rm = T),
BMK_Mean_Narrow = mean(!!narrow, na.rm = T),
BMK_R1_Performance = mean(!!performance_vars, na.rm=T),
BMK_R2_Promotion = mean(!!promote_vars, na.rm=T),
BMK_R3_Derail = mean(!!derail_vars, na.rm=T))
class(clean_bmk)
[1] "grouped_df" "tbl_df" "tbl" "data.frame"
当我这样做时,所有突变的列都是 NA。但是,以下工作:
clean_bmk$Strategic_Mean <- rowMeans(clean_bmk[,strategic], na.rm=T)
不知道为什么,我怎样才能创建一个函数,这样我只能发送包含列名的变量列表,并改变数据框中的列?
例如:
strategic <- c("column1", "column15", "column27")
与其他变量类似,如 diffchange
、failure
等
我尝试dput(clean_bmk)
与您分享数据,但由于数据集很大,我无法获取。我猜是因为它是 grouped_df
,我无法使用 [[
或 sample()
数据集。
使用 rowwise
效率低下,更好的选择是 rowMeans
在 select
感兴趣的列
library(dplyr)
clean_bmk %>%
ungroup %>%
mutate(
BMK_Mean_Strategic = rowMeans(select(., strategic), na.rm = TRUE),
BMK_Mean_DiffChange = rowMeans(select(., diffchange), na.rm = TRUE),
BMK_Mean_Failure = rowMeans(select(., failure), na.rm = TRUE),
BMK_Mean_Narrow = rowMeans(select(., narrow), na.rm = TRUE),
BMK_R1_Performance = rowMeans(select(., performance_vars), na.rm=TRUE),
BMK_R2_Promotion = rowMeans(select(., promote_vars), na.rm=TRUE),
BMK_R3_Derail = rowMeans(select(., derail_vars), na.rm=TRUE))
使用可重现的例子
data(mtcars)
#v1 <- c('mpg', 'disp')
mtcars %>%
transmute(newMean = rowMeans(select(., v1), na.rm = TRUE)) %>%
head
# newMean
#Mazda RX4 90.50
#Mazda RX4 Wag 90.50
#Datsun 710 65.40
#Hornet 4 Drive 139.70
#Hornet Sportabout 189.35
#Valiant 121.55