r 多列子集
r subset by multiple columns
我对根据多列的特定条件对数据集进行子集化的逻辑感到有点困惑。
例如,如果这是我的数据集
ID Sex Age Score
1 M 4.2 19
1 M 4.8 21
2 F 6.1 23
2 F 6.7 45
3 F 9.4 39
4 M 8 33
5 M 10 56
在Age(between, 6 to 11)
之间Gender=Male
可接受的分数范围是Score(between, 34 to 100)
。
最终数据集将是,没有 ID 4
ID Sex Age Score
1 M 4.2 19
1 M 4.8 21
2 F 6.1 23
2 F 6.7 45
3 F 9.4 39
5 M 10 56
我试过这个方法,
Df0 <- subset( Df0, (between(Age, 6,11)&
Sex == "M"&
between(Score, 34, 100))
这没有用。非常感谢任何建议。提前致谢。
library(dplyr)
Df0 %>%
filter(Sex == 'M', between(Age, 6,11), between(Score, 34,100))
古典
subset(dat, Age > 6 & Age < 11 & Sex == 'M' & Score > 34 & Score < 100)
# ID Sex Age Score
# 7 5 M 10 56
使用data.table
library(data.table)
subset(dat, between(Age, 6, 11) & Sex == 'M' & between(Score, 34, 100))
# ID Sex Age Score
# 7 5 M 10 56
或
subset(dat, Age %between% c(6, 11) & Sex == 'M' & Score %between% c(34, 100))
# ID Sex Age Score
# 7 5 M 10 56
或完全data.table
setDT(df)[Sex == "M" & between(Age, 6, 11) & between(Score, 34, 100)]
# ID Sex Age Score
# 1: 5 M 10 56
如果我正确理解了您的解释以及显示的预期输出,您正在寻找类似的东西 -
library(dplyr)
df %>%
group_by(ID) %>%
filter(ifelse(Sex == 'M' & between(Age, 6,11),
between(Score, 34, 100), TRUE)) %>%
ungroup
# ID Sex Age Score
# <int> <chr> <dbl> <int>
#1 1 M 4.2 19
#2 1 M 4.8 21
#3 2 F 6.1 23
#4 2 F 6.7 45
#5 3 F 9.4 39
#6 5 M 10 56
between(Score, 34, 100)
仅在 Sex
为 'M'
且 Age
介于 6 和 11 之间时才会检查。
我对根据多列的特定条件对数据集进行子集化的逻辑感到有点困惑。
例如,如果这是我的数据集
ID Sex Age Score
1 M 4.2 19
1 M 4.8 21
2 F 6.1 23
2 F 6.7 45
3 F 9.4 39
4 M 8 33
5 M 10 56
在Age(between, 6 to 11)
之间Gender=Male
可接受的分数范围是Score(between, 34 to 100)
。
最终数据集将是,没有 ID 4
ID Sex Age Score
1 M 4.2 19
1 M 4.8 21
2 F 6.1 23
2 F 6.7 45
3 F 9.4 39
5 M 10 56
我试过这个方法,
Df0 <- subset( Df0, (between(Age, 6,11)&
Sex == "M"&
between(Score, 34, 100))
这没有用。非常感谢任何建议。提前致谢。
library(dplyr)
Df0 %>%
filter(Sex == 'M', between(Age, 6,11), between(Score, 34,100))
古典
subset(dat, Age > 6 & Age < 11 & Sex == 'M' & Score > 34 & Score < 100)
# ID Sex Age Score
# 7 5 M 10 56
使用data.table
library(data.table)
subset(dat, between(Age, 6, 11) & Sex == 'M' & between(Score, 34, 100))
# ID Sex Age Score
# 7 5 M 10 56
或
subset(dat, Age %between% c(6, 11) & Sex == 'M' & Score %between% c(34, 100))
# ID Sex Age Score
# 7 5 M 10 56
或完全data.table
setDT(df)[Sex == "M" & between(Age, 6, 11) & between(Score, 34, 100)]
# ID Sex Age Score
# 1: 5 M 10 56
如果我正确理解了您的解释以及显示的预期输出,您正在寻找类似的东西 -
library(dplyr)
df %>%
group_by(ID) %>%
filter(ifelse(Sex == 'M' & between(Age, 6,11),
between(Score, 34, 100), TRUE)) %>%
ungroup
# ID Sex Age Score
# <int> <chr> <dbl> <int>
#1 1 M 4.2 19
#2 1 M 4.8 21
#3 2 F 6.1 23
#4 2 F 6.7 45
#5 3 F 9.4 39
#6 5 M 10 56
between(Score, 34, 100)
仅在 Sex
为 'M'
且 Age
介于 6 和 11 之间时才会检查。