如何根据另一列中的值比较数据框中单个列中的两个因素,如果不匹配则删除它们
How to compare two factors in a single column in a dataframe based on the values in another column and delete them if they don't match
我正在尝试根据另一列中的值(在本例中为日期)比较两个因素。如果它们不匹配,我想删除该行。
示例:
>head(data)
light date
1 0 20190314
2 0 20190317
3 1 20190314
4 0 20190318
5 1 20190316
6 1 20190318
7 1 20190314
所以我希望结果是:
>head(data)
light date
1 0 20190314
2 1 20190314
3 0 20190318
4 1 20190318
5 1 20190314
提前致谢
这是一种解决方案。
输入
tribble(~light, ~date,
"0","20190314",
"0","20190317",
"1","20190314",
"0","20190318",
"1","20190316",
"1","20190318",
"1","20190314"
) ->d
代码
library(dplyr)
d %>% group_by(date) %>% # group by date
mutate(is_keep = if_else("0" %in% light & "1" %in% light, 1,0)) %>% # create a temporary column to keep track if date has both 0 and 1.
filter(is_keep==1) %>% # filter out rows to keep
select(-is_keep) %>% # remove temp column
ungroup() #ungroup df
输出
light date
<chr> <chr>
1 0 20190314
2 1 20190314
3 0 20190318
4 1 20190318
5 1 20190314
您可以通过检查某个值是否存在于某个其他数据框中的特定列来过滤您的数据框:
data <- data %>%
filter(date %in% unique(other_df$reference_column))
选项subset
subset(data, date %in% unique(other_df$reference_column))
我正在尝试根据另一列中的值(在本例中为日期)比较两个因素。如果它们不匹配,我想删除该行。
示例:
>head(data)
light date
1 0 20190314
2 0 20190317
3 1 20190314
4 0 20190318
5 1 20190316
6 1 20190318
7 1 20190314
所以我希望结果是:
>head(data)
light date
1 0 20190314
2 1 20190314
3 0 20190318
4 1 20190318
5 1 20190314
提前致谢
这是一种解决方案。
输入
tribble(~light, ~date,
"0","20190314",
"0","20190317",
"1","20190314",
"0","20190318",
"1","20190316",
"1","20190318",
"1","20190314"
) ->d
代码
library(dplyr)
d %>% group_by(date) %>% # group by date
mutate(is_keep = if_else("0" %in% light & "1" %in% light, 1,0)) %>% # create a temporary column to keep track if date has both 0 and 1.
filter(is_keep==1) %>% # filter out rows to keep
select(-is_keep) %>% # remove temp column
ungroup() #ungroup df
输出
light date
<chr> <chr>
1 0 20190314
2 1 20190314
3 0 20190318
4 1 20190318
5 1 20190314
您可以通过检查某个值是否存在于某个其他数据框中的特定列来过滤您的数据框:
data <- data %>%
filter(date %in% unique(other_df$reference_column))
选项subset
subset(data, date %in% unique(other_df$reference_column))