如果一行包含列表()/没有嵌套的tibble,如何过​​滤嵌套的tibble

How to filter nested tibble if one row contains list()/no nested tibble

当一行不包含嵌套小标题时,我正在努力过滤嵌套小标题。

my_df 在列 products 中包含一个嵌套的 tibble。我想过滤嵌套的 tibble,以便它仅在其列 food.

中包含值 apple

我可以用 mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))) 做到这一点。 但是,当 my_df 中有一行包含 no/an 空嵌套 tibble (list()).

时,我没有这样做

我试图通过创建一个辅助列来规避这个问题,该辅助列检查嵌套小标题中的行数,然后仅将搜索应用于 nrow > 0 的那些行。但是,我使用 case_when 失败了,我也不知道为什么。

如有任何提示,我将不胜感激。请注意,我知道我可以将 my_df 拆分为两个单独的 df(一个带有 list(),一个带有嵌套的小标题),然后 row_bind 它们。 case_when 的方法在我的用例中似乎更方便,我想了解为什么它不起作用。在代表之下。非常感谢!

library(tidyverse)


my_df <- structure(list(branch_name = c("basket1", "basket2"), products = list(
  structure(list(), class = c(
    "tbl_df", "tbl",
    "data.frame"
  ), row.names = integer(0), .Names = character(0)),
  structure(list(
    food = c(
      "apple",
      "grape"
    ),
    supplier = c("john", "jack")),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA, -2L)
  )
)), row.names = c(NA, -2L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))
my_df
#> # A tibble: 2 x 2
#>   branch_name products        
#>   <chr>       <list>          
#> 1 basket1     <tibble [0 x 0]>
#> 2 basket2     <tibble [2 x 2]>


#Try to filter the nested df 'products', keep only rows where str_detect(food, "apple")==T
#fails
x <- my_df %>% 
  mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = map(products, ~filter(.x, str_detect(food, "apple")))`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found

  
#filter works  if in no row the nested df is list()
y <- my_df %>% 
  mutate(products_nrow=map_dbl(products, nrow)) %>% 
  filter(products_nrow>0) %>% 
  mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))

#correct result
y  
#> # A tibble: 1 x 3
#>   branch_name products         products_nrow
#>   <chr>       <list>                   <dbl>
#> 1 basket2     <tibble [1 x 2]>             2
y$products
#> [[1]]
#> # A tibble: 1 x 2
#>   food  supplier
#>   <chr> <chr>   
#> 1 apple john


#account for nrows of nested df and use case_when; fails
my_df %>% 
  mutate(products_nrow=map_dbl(products, nrow)) %>% 
  mutate(products=case_when(
    products_nrow>0 ~ map(products, ~filter(.x, str_detect(food, "apple"))),
    TRUE ~ products))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = case_when(...)`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found

reprex package (v2.0.1)

于 2022-03-18 创建

您可以使用 if 条件,例如检查数据集中是否有列food

library(dplyr)
library(purrr)
library(strings)

my_df %>% 
  mutate(products = map(products, ~ if ("food" %in% names(.x)) filter(.x, str_detect(food, "apple")) else .x))
#> # A tibble: 2 × 2
#>   branch_name products        
#>   <chr>       <list>          
#> 1 basket1     <tibble [0 × 0]>
#> 2 basket2     <tibble [1 × 2]>

一个不直接回答您的问题的 hacky 解决方案,但可能最简单的方法是简单地 unnest(删除空的小标题)并在应用您的过滤器之前再次 nest

 my_df %>% 
   unnest(products) %>%
   nest(products = -branch_name) %>%
   mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))

导致:

# A tibble: 1 × 2
  branch_name products        
  <chr>       <list>          
1 basket2     <tibble [1 × 2]>

另一个可能的解决方案:

library(tidyverse)

my_df[["products"]] <-
 map(my_df[["products"]], ~ if (nrow(.x) != 0) 
     {filter(.x, food == "apple")} else {.x})

my_df

#> # A tibble: 2 × 2
#>   branch_name products        
#>   <chr>       <list>          
#> 1 basket1     <tibble [0 × 0]>
#> 2 basket2     <tibble [1 × 2]>