如果至少有一个群组成员满足条件,则从 data.frame 中移除群组
Remove group from data.frame if at least one group member meets condition
我有一个 data.frame
如果其中任何成员满足条件,我想删除整个组。
在第一个示例中,如果值为数字且条件为 NA
,则下面的代码有效。
df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1, 2, 3, 2), place = c(1,
1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3,
3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA,
-10L), class = "data.frame")
ans <- ddply(df, . (group), summarize, code=mean(world))
ans$code[is.na(ans$code)] <- 0
ans2 <- merge(df,ans)
final.ans <- ans2[ans2$code !=0,]
但是,如果条件不是“NA
”,或者如果值不是数字,则使用 NA
值的 ddply
操作将不起作用。
例如,如果我想删除具有 world 值为 AF
的一行或多行的组(如下面的数据框所示)此 ddply
技巧不起作用。
df2 <-structure(list(world = structure(c(1L, 2L, 3L, 3L, 3L, 5L, 1L,
4L, 2L, 4L), .Label = c("AB", "AC", "AD", "AE", "AF"), class = "factor"),
place = c(1, 1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1,
1, 1, 2, 2, 2, 3, 3, 3, 3)), .Names = c("world", "place",
"group"), row.names = c(NA, -10L), class = "data.frame")
我可以设想一个 for 循环,其中为每个组检查每个成员的值,如果满足条件,则可以填充 code
列,然后我可以根据那个代码。
但是,也许有一种矢量化的 r 方法可以做到这一点?
尝试
library(dplyr)
df2 %>%
group_by(group) %>%
filter(!any(world == "AF"))
或者按照@akrun 的说法:
setDT(df2)[, if(!any(world == "AF")) .SD, group]
或
setDT(df2)[, if(all(world != "AF")) .SD, group]
给出:
#Source: local data frame [7 x 3]
#Groups: group
#
# world place group
#1 AB 1 1
#2 AC 1 1
#3 AD 2 1
#4 AB 1 3
#5 AE 2 3
#6 AC 3 3
#7 AE 1 3
备用data.table解决方案:
setDT(df2)
df2[!(group %in% df2[world == "AF",group])]
给出:
world place group
1: AB 1 1
2: AC 1 1
3: AD 2 1
4: AB 1 3
5: AE 2 3
6: AC 3 3
7: AE 1 3
使用按键我们可以更快一点:
setkey(df2,group)
df2[!J((df2[world == "AF",group]))]
基础包:
df2[df2$group != df2[df2$world=='AF', 3],]
输出:
world place group
1 AB 1 1
2 AC 1 1
3 AD 2 1
7 AB 1 3
8 AE 2 3
9 AC 3 3
10 AE 1 3
使用sqldf
:
library(sqldf)
sqldf("SELECT df2.world, df2.place, [group] FROM df2
LEFT JOIN
(SELECT * FROM df2 WHERE world LIKE 'AF') AS t
USING([group])
WHERE t.world IS NULL")
输出:
world place group
1 AB 1 1
2 AC 1 1
3 AD 2 1
4 AB 1 3
5 AE 2 3
6 AC 3 3
7 AE 1 3
Base R 选项使用 ave
df2[with(df2, ave(world != "AF", group, FUN = all)),]
# world place group
#1 AB 1 1
#2 AC 1 1
#3 AD 2 1
#7 AB 1 3
#8 AE 2 3
#9 AC 3 3
#10 AE 1 3
或者我们也可以使用subset
subset(df2, ave(world != "AF", group, FUN = all))
上面也可以写成
df2[with(df2, !ave(world == "AF", group, FUN = any)),]
和
subset(df2, !ave(world == "AF", group, FUN = any))
我有一个 data.frame
如果其中任何成员满足条件,我想删除整个组。
在第一个示例中,如果值为数字且条件为 NA
,则下面的代码有效。
df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1, 2, 3, 2), place = c(1,
1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3,
3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA,
-10L), class = "data.frame")
ans <- ddply(df, . (group), summarize, code=mean(world))
ans$code[is.na(ans$code)] <- 0
ans2 <- merge(df,ans)
final.ans <- ans2[ans2$code !=0,]
但是,如果条件不是“NA
”,或者如果值不是数字,则使用 NA
值的 ddply
操作将不起作用。
例如,如果我想删除具有 world 值为 AF
的一行或多行的组(如下面的数据框所示)此 ddply
技巧不起作用。
df2 <-structure(list(world = structure(c(1L, 2L, 3L, 3L, 3L, 5L, 1L,
4L, 2L, 4L), .Label = c("AB", "AC", "AD", "AE", "AF"), class = "factor"),
place = c(1, 1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1,
1, 1, 2, 2, 2, 3, 3, 3, 3)), .Names = c("world", "place",
"group"), row.names = c(NA, -10L), class = "data.frame")
我可以设想一个 for 循环,其中为每个组检查每个成员的值,如果满足条件,则可以填充 code
列,然后我可以根据那个代码。
但是,也许有一种矢量化的 r 方法可以做到这一点?
尝试
library(dplyr)
df2 %>%
group_by(group) %>%
filter(!any(world == "AF"))
或者按照@akrun 的说法:
setDT(df2)[, if(!any(world == "AF")) .SD, group]
或
setDT(df2)[, if(all(world != "AF")) .SD, group]
给出:
#Source: local data frame [7 x 3]
#Groups: group
#
# world place group
#1 AB 1 1
#2 AC 1 1
#3 AD 2 1
#4 AB 1 3
#5 AE 2 3
#6 AC 3 3
#7 AE 1 3
备用data.table解决方案:
setDT(df2)
df2[!(group %in% df2[world == "AF",group])]
给出:
world place group
1: AB 1 1
2: AC 1 1
3: AD 2 1
4: AB 1 3
5: AE 2 3
6: AC 3 3
7: AE 1 3
使用按键我们可以更快一点:
setkey(df2,group)
df2[!J((df2[world == "AF",group]))]
基础包:
df2[df2$group != df2[df2$world=='AF', 3],]
输出:
world place group
1 AB 1 1
2 AC 1 1
3 AD 2 1
7 AB 1 3
8 AE 2 3
9 AC 3 3
10 AE 1 3
使用sqldf
:
library(sqldf)
sqldf("SELECT df2.world, df2.place, [group] FROM df2
LEFT JOIN
(SELECT * FROM df2 WHERE world LIKE 'AF') AS t
USING([group])
WHERE t.world IS NULL")
输出:
world place group
1 AB 1 1
2 AC 1 1
3 AD 2 1
4 AB 1 3
5 AE 2 3
6 AC 3 3
7 AE 1 3
Base R 选项使用 ave
df2[with(df2, ave(world != "AF", group, FUN = all)),]
# world place group
#1 AB 1 1
#2 AC 1 1
#3 AD 2 1
#7 AB 1 3
#8 AE 2 3
#9 AC 3 3
#10 AE 1 3
或者我们也可以使用subset
subset(df2, ave(world != "AF", group, FUN = all))
上面也可以写成
df2[with(df2, !ave(world == "AF", group, FUN = any)),]
和
subset(df2, !ave(world == "AF", group, FUN = any))