根据跨多列的条件创建新变量

Create new variable based on a condition across multiple columns

我有一个二元变量(“处罚”)和 30 个具有相同水平的因素:“出院”、“停职”、“罚款”、“社区秩序”和“监禁”。

一个小例子:

ID 拥有 进口 生产 惩罚
1 很好 不适用 很好
2 不适用 不适用 社区秩序 没有
3 放电 放电 不适用 没有
4 不适用 不适用 已暂停
5 监禁 不适用 不适用 没有
6 很好 不适用 监禁 没有

我想根据这些列的相同条件加上二进制变量创建一个新因子,并且在同一行中有不同水平的地方希望新变量 'sentence' 保留水平这个优先顺序:监禁>社区秩序,停职>罚款>释放。例如放电只会出现在没有其他级别出现的新列中。

期望的输出:

ID 拥有 进口 生产 惩罚 句子
1 很好 不适用 很好 很好
2 不适用 不适用 社区秩序 没有 社区秩序
3 放电 放电 不适用 没有 放电
4 不适用 不适用 已暂停 已暂停
5 监禁 不适用 不适用 没有 监禁
6 很好 不适用 监禁 没有 监禁

这是我尝试过的:(其中“vec”是因子列索引的向量)

data <- data %>%
  mutate(
    crim_sanct = case_when(
      (if_any(vec) == "Discharge") ~ "Discharge",
      (if_any(vec) == "Fine") | (data$Penalty == "Yes") ~ "Fine",
      (if_any(vec) ==  "Suspended") ~ "Suspended",
      (if_any(vec) ==  "Community order") ~ "Community order",
      (if_any(vec) ==  "Imprisonment") ~ "imprisonment"))

由于我不知道如何处理Penalty栏,我们暂时忽略它。根据 PossessionImportationProduction 列创建列 Sentence 可以用

完成
library(dplyr)

data %>%
  mutate(across(
    Possession:Production,
    ~ factor(.x, 
             c("Imprisonment", "Community order", "Suspended", "Fine", "Discharge"),
             ordered = TRUE))) %>% 
  rowwise() %>% 
  mutate(Sentence = min(c_across(Possession:Production), na.rm = TRUE)) %>% 
  ungroup()

哪个returns

# A tibble: 6 x 6
     ID Possession   Importation Production      Penalty Sentence       
  <dbl> <ord>        <ord>       <ord>           <chr>   <ord>          
1     1 Fine         NA          Fine            Yes     Fine           
2     2 NA           NA          Community order No      Community order
3     3 Discharge    Discharge   NA              No      Discharge      
4     4 NA           NA          Suspended       Yes     Suspended      
5     5 Imprisonment NA          NA              No      Imprisonment   
6     6 Fine         NA          Imprisonment    No      Imprisonment   

这里的主要思想是创建有序因子并使用行向 min-函数来获取具有最高优先级的句子。

数据

data <- structure(list(ID = c(1, 2, 3, 4, 5, 6), Possession = c("Fine", 
NA, "Discharge", NA, "Imprisonment", "Fine"), Importation = c(NA, 
NA, "Discharge", NA, NA, NA), Production = c("Fine", "Community order", 
NA, "Suspended", NA, "Imprisonment"), Penalty = c("Yes", "No", 
"No", "Yes", "No", "No")), problems = structure(list(row = 6L, 
    col = "Penalty", expected = "", actual = "embedded null", 
    file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), class = "data.frame", row.names = c(NA, 
-6L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_double", 
"collector")), Possession = structure(list(), class = c("collector_character", 
"collector")), Importation = structure(list(), class = c("collector_character", 
"collector")), Production = structure(list(), class = c("collector_character", 
"collector")), Penalty = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

您的方向是正确的,但在 if_any 中存在一些小的语法问题。

同样在case_when中,您需要根据优先级放置条件。所以如果 Imprisonment > Community order 那么 Imprisonment 条件应该先于 Community order.

library(dplyr)

data <- data %>%
  mutate(
    crim_sanct = 
      case_when(
      if_any(Possession:Production, ~. ==  "Imprisonment") ~ "imprisonment",
      if_any(Possession:Production, ~ . == "Discharge") ~ "Discharge",
      if_any(Possession:Production,  ~. ==  "Suspended") ~ "Suspended",
      if_any(Possession:Production, ~. == "Fine") | (Penalty == "Yes") ~ "Fine",
      if_any(Possession:Production, ~. ==  "Community order") ~ "Community order")
)
data

#  ID   Possession Importation      Production Penalty      crim_sanct
#1  1         Fine        <NA>            Fine     Yes            Fine
#2  2         <NA>        <NA> Community order      No Community order
#3  3    Discharge   Discharge            <NA>      No       Discharge
#4  4         <NA>        <NA>       Suspended     Yes       Suspended
#5  5 Imprisonment        <NA>            <NA>      No    imprisonment
#6  6         Fine        <NA>    Imprisonment      No    imprisonment