Select/Get 在 0-10 之间具有负值的所有列的名称

Question

对于数据框，我想获取在特定范围内具有负值的所有列的名称或 select。 This post 非常接近，但它遍历了对我的数据不可行的行。此外，如果我存储该解决方案，它会变成一个列表，我更喜欢一个向量。例如，对于以下数据集：

library(data.table)
df <- fread(
     "A   B   D   E  iso   year   
      0   1   1   NA ECU   2009   
      1   0   2   0  ECU   2009   
      0   0   -3  0  BRA   2011   
      1   0   4   0  BRA   2011   
      0   1   7   NA ECU   2008   
     -1   0   1   0  ECU   2008   
      0   0   3   2  BRA   2012   
      1   0   4   NA BRA   2012",
  header = TRUE
)

我想要所有具有 0 到 10 之间的负值的列的名称（示例中的 A 和 D）。实现这一目标最简单的解决方案是什么？其他一切都等于 data.table 解决方案将是首选。

Answer 1

一个 tidyverse 可能是：

 df %>%
 gather(var, val, -c(5:6)) %>%
 group_by(var) %>%
 summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))

  var   res  
  <chr> <lgl>
1 A     TRUE 
2 B     FALSE
3 D     TRUE 
4 E     FALSE

到 select 只有数字列：

df %>%
 select_if(is.numeric) %>%
 gather(var, val) %>%
 group_by(var) %>%
 summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))

请注意，它也是 select 的 "year" 列，因为它是一个数字列。

您也可以使用 base R:

df <- Filter(is.numeric, df)
cond <- as.logical(colSums(df > -10, na.rm = TRUE) *
                    colSums(df < -0, na.rm = TRUE))
colnames(df[, cond])

[1] "A" "D"

或者写成"one-liner":

df <- Filter(is.numeric, df)
colnames(df[, as.logical(colSums(df > -10, na.rm = TRUE) * colSums(df < -0, na.rm = TRUE))])

示例数据：

df <- read.table(text = 
 "A   B   D   E  iso   year   
      0   1   1   NA ECU   2009   
      1   0   2   0  ECU   2009   
      0   0   -3  0  BRA   2011   
      1   0   4   0  BRA   2011   
      0   1   7   NA ECU   2008   
     -1   0   1   0  ECU   2008   
      0   0   3   2  BRA   2012   
      1   0   4   NA BRA   2012", 
 header = TRUE,
 stringsAsFactors = FALSE)

Answer 2

另一个 tidyverse 变体：

df %>% 
   group_by(iso,year) %>% 
   keep(~any(.x>-10 & .x<0 & !is.na(.x))) %>% 
   names()
 "A" "D"

编辑：要处理因子，请使用 mutate_if。我们也可以这样做（尽管我认为分组会更好）：

  df %>% 
   mutate_if(is.factor,as.character) %>% 
   purrr::keep(~any(.x>-10 & .x<0 & !is.na(.x))) %>% 
   names()
[1] "A" "D"

值：

df %>% 
  group_by(iso,year) %>% 
   keep(~any(.x>-10 & .x<0 & !is.na(.x)))
# A tibble: 8 x 2
      A     D
  <int> <int>
1     0     1
2     1     2
3     0    -3
4     1     4
5     0     7
6    -1     1
7     0     3
8     1     4

Select/Get 在 0-10 之间具有负值的所有列的名称

Select/Get names of all columns which have a negative value between 0-10

r

range

negative-number

lapply