有没有办法 return 具有特定平均列值的行的名称？

Question

我正在使用数据集Income_Democracy.dta 我正在尝试查找平均 dem_ind 值大于 0.95.

的国家/地区的名称

我想我需要对国家/地区进行子集化，找到平均值，然后 return 将其作为一个新数据集，但如果没有具体的国家/地区名称，我不知道该怎么做。我摆弄过 which 和 subset 函数，但我只是 R 的新手，需要帮助。对于我知道你可以做的特定国家/地区

mean(subset(incdem$dem_ind, incdem$country =="Australia"))

但我不确定如何概括。

Answer 1

按'country'分组，得到'dem_ind'的mean，filter'mean'列值大于0.95且pull 'country' 列作为 vector

library(dplyr)
incdem %>%
    group_by(country) %>%
    summarise(Avg = mean(dem_ind, na.rm = TRUE), .groups = 'drop') %>%
    filter(Avg > 0.95) %>%
    pull(country)

或者另一种选择是

names(which(sapply(split(incdem$dem_ind, incdem$country), mean, 
        na.rm = TRUE) > 0.95))

如果是数值范围

names(which(sapply(split(incdem$dem_ind, incdem$country), function(x) {
          avg <- mean(x, na.rm = TRUE)
          avg > 0.2 & avg < 0.8})))

有没有办法 return 具有特定平均列值的行的名称？

Is there a way to return the names of rows with specific average column values?

r

subset