通过 R 中的 strsplit() 与列表进行比较，将列添加到 df

Question

我已经研究了一段时间，但仍然没有弄清楚如何让它以我喜欢的方式工作。希望有人能帮助我：

我有一个包含大量关于城市预算的数据（5000+ obs）的数据框，因此，其中一个变量名称显然是 'city'。我有一个包含 40 个城市的单独列表，我想将其附加到此数据框，并且基本上有条件地检查 df 中的每个城市名称，如果它也在单独的列表中（等等；将其编码为 1；否则为 0）。我在下面用较小的数据集做了一个例子：

city <- c(rep("city_a", 8), rep("city_b", 5), rep("city_c", 4), rep("city_d", 7), 
rep("city_e", 3), rep("city_f", 9), rep("city_g", 4)) 
school <- c(1:8, 1:5, 1:4, 1:7,1:3, 1:9, 1:4)
df <- data.frame(city, school)

seperate_list <- tolower("City_A, City_B, City_E, City_G")
seperate_list <- gsub('[,]', '', seperate_list)
seperate_list <- strsplit(seperate_list, " ")[[1]]

注：您可能会问；为什么第二部分那样？我的数据集要大得多，我想找到一种使过程更加自动化的方法，例如我不必手动删除所有逗号并将城市名称彼此分开。现在我有了 df 和 seperate_list，我想通过添加第三列来将它们组合到 df 中，该列指定 (1) 或不 (0) 每个城市是否在单独的列表中。我试过使用 for 循环和 lapply，但没有成功，因为我在这两个方面都不是很熟练。

我会很感激你的提示，这样我就可以找到自己了！

Answer 1

library(tidyverse)

city <- c(rep("city_a", 8), rep("city_b", 5), rep("city_c", 4), rep("city_d", 7), 
          rep("city_e", 3), rep("city_f", 9), rep("city_g", 4)) 
school <- c(1:8, 1:5, 1:4, 1:7,1:3, 1:9, 1:4)
df <- data.frame(city, school)

seperate_list <- tolower("City_A, City_B, City_E, City_G")
seperate_list <- gsub('[,]', '', seperate_list)
seperate_list <- strsplit(seperate_list, " ")[[1]]


df %>%
  mutate(
    in_list = city %in% seperate_list
  ) %>%
  as_tibble()
#> # A tibble: 40 x 3
#>    city   school in_list
#>    <chr>   <int> <lgl>  
#>  1 city_a      1 TRUE   
#>  2 city_a      2 TRUE   
#>  3 city_a      3 TRUE   
#>  4 city_a      4 TRUE   
#>  5 city_a      5 TRUE   
#>  6 city_a      6 TRUE   
#>  7 city_a      7 TRUE   
#>  8 city_a      8 TRUE   
#>  9 city_b      1 TRUE   
#> 10 city_b      2 TRUE   
#> # … with 30 more rows

^{由 reprex package (v2.0.1)}

于 2021-09-09 创建

我认为您也可以考虑加入 table 并将感兴趣的列表作为另一个 table 的列。这会寻找数据库和关系代数的用途。

通过 R 中的 strsplit() 与列表进行比较，将列添加到 df

Adding a column to a df based on comparison with a list through strsplit() in R

r

gsub

strsplit