当列名称-值对存储在列表中时过滤数据框?
Filter dataframe when column name-value pairs are stored in a list?
我有一个像这样的数据框:
df <- tibble::rownames_to_column(USArrests, "State") %>%
tidyr::pivot_longer(cols = -State)
head(df)
# A tibble: 6 x 3
State name value
<chr> <chr> <dbl>
1 Alabama Murder 13.2
2 Alabama Assault 236
3 Alabama UrbanPop 58
4 Alabama Rape 21.2
5 Alaska Murder 10
6 Alaska Assault 263
在一个单独的列表对象中 l
我有一些列,我需要从数据框中删除这些列。元素名称是列名称,值对应于我要删除的行:
l <- list(State = c("Alabama", "Pennsylvania", "Texas"),
name = c("Murder", "Assault"))
硬编码它会这样做:
dplyr::filter(df, !State %in% c("Alabama", "Pennsylvania", "Texas"), !name %in% c("Murder", "Assault"))
State name value
<chr> <chr> <dbl>
1 Alaska UrbanPop 48
2 Alaska Rape 44.5
3 Arizona UrbanPop 80
4 Arizona Rape 31
5 Arkansas UrbanPop 50
6 Arkansas Rape 19.5
7 California UrbanPop 91
8 California Rape 40.6
9 Colorado UrbanPop 78
10 Colorado Rape 38.7
# ... with 84 more rows
但是,l
经常变化,所以我 cannot/don 不想硬编码。我尝试了以下操作,但只评估了最后一个表达式:
library(purrr)
filter_expr <- imap_chr(l, ~ paste0("! ",
.y,
" %in% c(\"",
paste(.x, collapse = "\",\""),
"\")")) %>% parse(text = .)
filter(df, eval(filter_expr))
State name value
<chr> <chr> <dbl>
1 Alabama UrbanPop 58
2 Alabama Rape 21.2
3 Alaska UrbanPop 48
4 Alaska Rape 44.5
5 Arizona UrbanPop 80
6 Arizona Rape 31
7 Arkansas UrbanPop 50
8 Arkansas Rape 19.5
9 California UrbanPop 91
10 California Rape 40.6
# ... with 90 more rows
当过滤条件存储在像 l
这样更适合 tidyverse 的结构中时,是否有过滤 df
的方法?
我考虑过这个,但是,表达式不是动态的。
我们可以在 filter
中使用 across
循环遍历 'l' 的 names
,通过使用来自的键对 'l' 进行子集化来创建逻辑表达式列名 (cur_column()
) 和取反 (!
)。请注意,cur_column()
目前仅适用于 across
,不适用于 if_all/if_any
(dplyr
-1.0.6
on R 4.1.0
)
library(dplyr)
df %>%
filter(across(all_of(names(l)), ~ !. %in% l[[cur_column()]]))
-输出
# A tibble: 94 x 3
# State name value
# <chr> <chr> <dbl>
# 1 Alaska UrbanPop 48
# 2 Alaska Rape 44.5
# 3 Arizona UrbanPop 80
# 4 Arizona Rape 31
# 5 Arkansas UrbanPop 50
# 6 Arkansas Rape 19.5
# 7 California UrbanPop 91
# 8 California Rape 40.6
# 9 Colorado UrbanPop 78
#10 Colorado Rape 38.7
# … with 84 more rows
如果我们可以设置一个属性,我们就可以使用if_all
library(magrittr)
df %>%
mutate(across(all_of(names(l)), ~ set_attr(., 'cn', cur_column()))) %>%
filter(if_all(all_of(names(l)), ~ ! . %in% l[[attr(., 'cn')]]))
或 imap/reduce
library(purrr)
df %>%
filter(imap(l, ~ !cur_data()[[.y]] %in% .x) %>%
reduce(`&`))
或者另一种选择是 anti_join
for(nm in names(l)) df <- anti_join(df, tibble(!! nm := l[[nm]]))
这里的另一个潜在选项是使用 purrr
创建一个逻辑向量,允许 &
与 |
条件并可以访问当前列名(.y
) 不带 cur_column
,只能在 across
:
内部使用
df %>%
filter(imap(l, ~ !df[[.y]] %in% .x) %>% reduce(`&`)) # can use magrittr::and
输出
State name value
<chr> <chr> <dbl>
1 Alaska UrbanPop 48
2 Alaska Rape 44.5
3 Arizona UrbanPop 80
4 Arizona Rape 31
5 Arkansas UrbanPop 50
6 Arkansas Rape 19.5
7 California UrbanPop 91
8 California Rape 40.6
9 Colorado UrbanPop 78
10 Colorado Rape 38.7
# ... with 84 more rows
或变体是:
df %>%
filter(imap(l, ~ !df[[.y]] %in% .x) %>% reduce(`|`)) # can use magrittr::or
我有一个像这样的数据框:
df <- tibble::rownames_to_column(USArrests, "State") %>%
tidyr::pivot_longer(cols = -State)
head(df)
# A tibble: 6 x 3
State name value
<chr> <chr> <dbl>
1 Alabama Murder 13.2
2 Alabama Assault 236
3 Alabama UrbanPop 58
4 Alabama Rape 21.2
5 Alaska Murder 10
6 Alaska Assault 263
在一个单独的列表对象中 l
我有一些列,我需要从数据框中删除这些列。元素名称是列名称,值对应于我要删除的行:
l <- list(State = c("Alabama", "Pennsylvania", "Texas"),
name = c("Murder", "Assault"))
硬编码它会这样做:
dplyr::filter(df, !State %in% c("Alabama", "Pennsylvania", "Texas"), !name %in% c("Murder", "Assault"))
State name value
<chr> <chr> <dbl>
1 Alaska UrbanPop 48
2 Alaska Rape 44.5
3 Arizona UrbanPop 80
4 Arizona Rape 31
5 Arkansas UrbanPop 50
6 Arkansas Rape 19.5
7 California UrbanPop 91
8 California Rape 40.6
9 Colorado UrbanPop 78
10 Colorado Rape 38.7
# ... with 84 more rows
但是,l
经常变化,所以我 cannot/don 不想硬编码。我尝试了以下操作,但只评估了最后一个表达式:
library(purrr)
filter_expr <- imap_chr(l, ~ paste0("! ",
.y,
" %in% c(\"",
paste(.x, collapse = "\",\""),
"\")")) %>% parse(text = .)
filter(df, eval(filter_expr))
State name value
<chr> <chr> <dbl>
1 Alabama UrbanPop 58
2 Alabama Rape 21.2
3 Alaska UrbanPop 48
4 Alaska Rape 44.5
5 Arizona UrbanPop 80
6 Arizona Rape 31
7 Arkansas UrbanPop 50
8 Arkansas Rape 19.5
9 California UrbanPop 91
10 California Rape 40.6
# ... with 90 more rows
当过滤条件存储在像 l
这样更适合 tidyverse 的结构中时,是否有过滤 df
的方法?
我考虑过这个
我们可以在 filter
中使用 across
循环遍历 'l' 的 names
,通过使用来自的键对 'l' 进行子集化来创建逻辑表达式列名 (cur_column()
) 和取反 (!
)。请注意,cur_column()
目前仅适用于 across
,不适用于 if_all/if_any
(dplyr
-1.0.6
on R 4.1.0
)
library(dplyr)
df %>%
filter(across(all_of(names(l)), ~ !. %in% l[[cur_column()]]))
-输出
# A tibble: 94 x 3
# State name value
# <chr> <chr> <dbl>
# 1 Alaska UrbanPop 48
# 2 Alaska Rape 44.5
# 3 Arizona UrbanPop 80
# 4 Arizona Rape 31
# 5 Arkansas UrbanPop 50
# 6 Arkansas Rape 19.5
# 7 California UrbanPop 91
# 8 California Rape 40.6
# 9 Colorado UrbanPop 78
#10 Colorado Rape 38.7
# … with 84 more rows
如果我们可以设置一个属性,我们就可以使用if_all
library(magrittr)
df %>%
mutate(across(all_of(names(l)), ~ set_attr(., 'cn', cur_column()))) %>%
filter(if_all(all_of(names(l)), ~ ! . %in% l[[attr(., 'cn')]]))
或 imap/reduce
library(purrr)
df %>%
filter(imap(l, ~ !cur_data()[[.y]] %in% .x) %>%
reduce(`&`))
或者另一种选择是 anti_join
for(nm in names(l)) df <- anti_join(df, tibble(!! nm := l[[nm]]))
这里的另一个潜在选项是使用 purrr
创建一个逻辑向量,允许 &
与 |
条件并可以访问当前列名(.y
) 不带 cur_column
,只能在 across
:
df %>%
filter(imap(l, ~ !df[[.y]] %in% .x) %>% reduce(`&`)) # can use magrittr::and
输出
State name value
<chr> <chr> <dbl>
1 Alaska UrbanPop 48
2 Alaska Rape 44.5
3 Arizona UrbanPop 80
4 Arizona Rape 31
5 Arkansas UrbanPop 50
6 Arkansas Rape 19.5
7 California UrbanPop 91
8 California Rape 40.6
9 Colorado UrbanPop 78
10 Colorado Rape 38.7
# ... with 84 more rows
或变体是:
df %>%
filter(imap(l, ~ !df[[.y]] %in% .x) %>% reduce(`|`)) # can use magrittr::or