R 中的子集,具有由其索引号标识的特定列的特定值

Subset in R with specific values for specific columns identified by their index number

如果我有这样的数据框:

df = data.frame(A = sample(1:5, 10, replace=T), B = sample(1:5, 10, replace=T), C = sample(1:5, 10, replace=T), D = sample(1:5, 10, replace=T), E = sample(1:5, 10, replace=T))

给我这个:

   A B C D E
1  1 5 1 4 3
2  2 3 5 4 3
3  4 2 2 4 4
4  2 1 2 5 2
5  3 3 4 4 5
6  3 2 3 1 5
7  1 5 4 2 3
8  1 3 5 5 1
9  3 1 1 3 5
10 5 3 1 2 4

我如何获得一个子集,其中包含所有行,其中某些列(例如 B 和 D)的值等于 1,并且这些列由它们的索引号(2 和 4)标识,而不是它们的索引号名字?在这种情况下:

   A B C D E
4  2 1 2 5 2
6  3 2 3 1 5
9  3 1 1 3 5
df[rowSums(df[c(2,4)] == 1) > 0,]
#   A B C D E
# 4 2 1 2 5 2
# 6 3 2 3 1 5
# 9 3 1 1 3 5
  • 你说按列索引比较值,所以df[c(2,4)]或(或df[,c(2,4)])。
  • df[c(2,4)] == 1 returns逻辑矩阵,单元格的值是否等于1。
  • rowSums(.) > 0 查找至少有一个 1.
  • 的行
  • df[rowSums(.)>0,] 只选择那些行。

数据

df <- structure(list(A = c(1L, 2L, 4L, 2L, 3L, 3L, 1L, 1L, 3L, 5L), B = c(5L, 3L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 3L), C = c(1L, 5L, 2L, 2L, 4L, 3L, 4L, 5L, 1L, 1L), D = c(4L, 4L, 4L, 5L, 4L, 1L, 2L, 5L, 3L, 2L), E = c(3L, 3L, 4L, 2L, 5L, 5L, 3L, 1L, 5L, 4L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

tidyverse

df <-
  structure(
    list(
      A = c(1L, 2L, 4L, 2L, 3L, 3L, 1L, 1L, 3L, 5L),
      B = c(5L, 3L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 3L),
      C = c(1L, 5L, 2L, 2L, 4L, 3L, 4L, 5L, 1L, 1L),
      D = c(4L, 4L, 4L, 5L, 4L, 1L, 2L, 5L, 3L, 2L),
      E = c(3L, 3L, 4L, 2L, 5L, 5L, 3L, 1L, 5L, 4L)
    ),
    class = "data.frame",
    row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")
  )

library(tidyverse)
df %>% 
  filter(B == 1 | D == 1)
#>   A B C D E
#> 4 2 1 2 5 2
#> 6 3 2 3 1 5
#> 9 3 1 1 3 5

reprex package (v2.0.1)

于 2022-01-23 创建

data.table

library(data.table)

setDT(df)[B == 1 | D == 1, ]
#>    A B C D E
#> 1: 2 1 2 5 2
#> 2: 3 2 3 1 5
#> 3: 3 1 1 3 5

reprex package (v2.0.1)

于 2022-01-23 创建