重命名并减少 R 中的多个相似观察结果

Question

我有一个具有 169 个级别的分类变量。我想减少到可管理的 7-10 个因素，-“宗教”、“文化艺术”、“教育”、“动物保护”、“紧急情况”、“环境保护”、“社会服务”等

我知道，我可以使用 levels() 函数重命名所有这 169 个因素，但是我正在寻找明智的选择，例如我可以使用“宗教”或“文化”作为过滤器来对所有因素进行分组他们在 1 个代码下？

Answer 1

你可以这样做。有关其他选项，请参阅 str_detect 上的文档。

如果您可以根据下面的可重现示例提供一些可用的最少示例数据和尝试的代码，那么其他人会更容易为您提供帮助。然后我们可以运行它并提出改进建议。

library(tidyverse)

data_df <- tribble(
  ~ label,
  "Culture and Arts",
  "Education in Japanese",
  "Culture and Recreation",
  "culture & Environment",
  "Environmental Activities",
  "Education & research"
) 

data_df2 <- data_df |>
  mutate(category = case_when(
    str_detect(label, "Cultu")   ~ "Culture & Arts",
    str_detect(label, "Educ")    ~ "Education",
    str_detect(label, "Environ") ~ "Environment",
    TRUE ~ "Other"
  ) |> factor())

data_df2
#> # A tibble: 6 × 2
#>   label                    category      
#>   <chr>                    <fct>         
#> 1 Culture and Arts         Culture & Arts
#> 2 Education in Japanese    Education     
#> 3 Culture and Recreation   Culture & Arts
#> 4 culture & Environment    Environment   
#> 5 Environmental Activities Environment   
#> 6 Education & research     Education

levels(data_df2$category)
#> [1] "Culture & Arts" "Education"      "Environment"

^{由 reprex package (v2.0.1)}

于 2022-04-23 创建

重命名并减少 R 中的多个相似观察结果

Rename & Reduce multiple similar observations in R

r

tidyr