将列添加到数据框以显示该行中的元素是否在 R 中的某个列表中

Question

我在 R 中有一个数据框 df（在我的例子中是 tibble），在给定目录中有几个文件与 df 中其中一列的元素松散对应。我想通过添加列 has_file.

来跟踪 df 中的哪些行对应于这些文件

这是我试过的方法。

# SETUP
dir.create("temp")
setwd("temp")
LETTERS[1:4] %>% 
  str_c(., ".png") %>% 
  file.create()

df <- tibble(x = LETTERS[3:6])

file_list <- list.files()

# ATTEMPT
df %>% 
  mutate(
    has_file = file_list %>% 
      str_remove(".png") %>% 
      is.element(x, .) %>% 
      any()
  )

# RESULT
# A tibble: 4 x 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     TRUE    
4 F     TRUE

我希望只有带有 C 和 D 的行在 has_file 中获得 TRUE 值，但 E 和 F 也是如此。

这里发生了什么，我如何在专栏中生成这种对应关系？

（首选 Tidyverse 解决方案。）

Answer 1

我们可能需要在顶部添加 rowwise，因为 any 将对整列进行评估，并且因为已经有两个 TRUE 元素，any returns 该行的 TRUE 将填满整列。使用 rowwise，不需要 any，因为 is.element returns 每个 'x' 列

的每个元素一个 TRUE/FALSE

df %>% 
 rowwise %>%
  mutate(
    has_file = file_list %>% 
      str_remove(".png") %>% 
      is.element(x, .)) %>%  
  ungroup
# A tibble: 4 × 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     FALSE   
4 F     FALSE

即添加 any

后检查差异

> is.element(df$x,  LETTERS[1:4])
[1]  TRUE  TRUE FALSE FALSE
> any(is.element(df$x,  LETTERS[1:4]))
[1] TRUE

我们也可以使用map来做到这一点

library(purrr)
df %>% 
   mutate(has_file = map_lgl(x, ~ file_list %>% 
                str_remove(".png") %>% 
                is.element(.x, .)))
# A tibble: 4 × 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     FALSE   
4 F     FALSE

或者如果我们想使用矢量化选项，而不是使用 is.element，直接执行 %in%

df %>% 
   mutate(has_file = x %in% str_remove(file_list, ".png"))
# A tibble: 4 × 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     FALSE   
4 F     FALSE

将列添加到数据框以显示该行中的元素是否在 R 中的某个列表中

Add column to dataframe to show if an element in that row is in a certain list in R

r

stringr