r 多列子集

r subset by multiple columns

我对根据多列的特定条件对数据集进行子集化的逻辑感到有点困惑。

例如,如果这是我的数据集

ID   Sex Age  Score
1    M   4.2  19
1    M   4.8  21
2    F   6.1  23
2    F   6.7  45
3    F   9.4  39
4    M   8    33
5    M   10   56

Age(between, 6 to 11)之间Gender=Male可接受的分数范围是Score(between, 34 to 100)

最终数据集将是,没有 ID 4

ID   Sex Age  Score
1    M   4.2  19
1    M   4.8  21
2    F   6.1  23
2    F   6.7  45
3    F   9.4  39
5    M   10   56

我试过这个方法,

Df0 <- subset( Df0, (between(Age, 6,11)&
                     Sex == "M"&
                     between(Score, 34, 100))

这没有用。非常感谢任何建议。提前致谢。

library(dplyr)
Df0 %>% 
    filter(Sex == 'M', between(Age, 6,11), between(Score, 34,100))

古典

subset(dat, Age > 6 & Age < 11 & Sex == 'M' & Score > 34 & Score < 100)
#   ID Sex Age Score
# 7  5   M  10    56

使用data.table

library(data.table)
subset(dat, between(Age, 6, 11)  & Sex == 'M' & between(Score, 34, 100))
#   ID Sex Age Score
# 7  5   M  10    56

subset(dat, Age %between% c(6, 11) & Sex == 'M' & Score %between% c(34, 100))
#   ID Sex Age Score
# 7  5   M  10    56

或完全data.table

setDT(df)[Sex == "M" & between(Age, 6, 11) & between(Score, 34, 100)]
#    ID Sex Age Score
# 1:  5   M  10    56

如果我正确理解了您的解释以及显示的预期输出,您正在寻找类似的东西 -

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(ifelse(Sex == 'M' & between(Age, 6,11), 
          between(Score, 34, 100), TRUE)) %>%
  ungroup

#     ID Sex     Age Score
#  <int> <chr> <dbl> <int>
#1     1 M       4.2    19
#2     1 M       4.8    21
#3     2 F       6.1    23
#4     2 F       6.7    45
#5     3 F       9.4    39
#6     5 M      10      56

between(Score, 34, 100) 仅在 Sex'M'Age 介于 6 和 11 之间时才会检查。