根据跨多列的条件创建新变量
Create new variable based on a condition across multiple columns
我有一个二元变量(“处罚”)和 30 个具有相同水平的因素:“出院”、“停职”、“罚款”、“社区秩序”和“监禁”。
一个小例子:
ID
拥有
进口
生产
惩罚
1
很好
不适用
很好
是
2
不适用
不适用
社区秩序
没有
3
放电
放电
不适用
没有
4
不适用
不适用
已暂停
是
5
监禁
不适用
不适用
没有
6
很好
不适用
监禁
没有
我想根据这些列的相同条件加上二进制变量创建一个新因子,并且在同一行中有不同水平的地方希望新变量 'sentence' 保留水平这个优先顺序:监禁>社区秩序,停职>罚款>释放。例如放电只会出现在没有其他级别出现的新列中。
期望的输出:
ID
拥有
进口
生产
惩罚
句子
1
很好
不适用
很好
是
很好
2
不适用
不适用
社区秩序
没有
社区秩序
3
放电
放电
不适用
没有
放电
4
不适用
不适用
已暂停
是
已暂停
5
监禁
不适用
不适用
没有
监禁
6
很好
不适用
监禁
没有
监禁
这是我尝试过的:(其中“vec”是因子列索引的向量)
data <- data %>%
mutate(
crim_sanct = case_when(
(if_any(vec) == "Discharge") ~ "Discharge",
(if_any(vec) == "Fine") | (data$Penalty == "Yes") ~ "Fine",
(if_any(vec) == "Suspended") ~ "Suspended",
(if_any(vec) == "Community order") ~ "Community order",
(if_any(vec) == "Imprisonment") ~ "imprisonment"))
由于我不知道如何处理Penalty
栏,我们暂时忽略它。根据 Possession
、Importation
和 Production
列创建列 Sentence
可以用
完成
library(dplyr)
data %>%
mutate(across(
Possession:Production,
~ factor(.x,
c("Imprisonment", "Community order", "Suspended", "Fine", "Discharge"),
ordered = TRUE))) %>%
rowwise() %>%
mutate(Sentence = min(c_across(Possession:Production), na.rm = TRUE)) %>%
ungroup()
哪个returns
# A tibble: 6 x 6
ID Possession Importation Production Penalty Sentence
<dbl> <ord> <ord> <ord> <chr> <ord>
1 1 Fine NA Fine Yes Fine
2 2 NA NA Community order No Community order
3 3 Discharge Discharge NA No Discharge
4 4 NA NA Suspended Yes Suspended
5 5 Imprisonment NA NA No Imprisonment
6 6 Fine NA Imprisonment No Imprisonment
这里的主要思想是创建有序因子并使用行向 min
-函数来获取具有最高优先级的句子。
数据
data <- structure(list(ID = c(1, 2, 3, 4, 5, 6), Possession = c("Fine",
NA, "Discharge", NA, "Imprisonment", "Fine"), Importation = c(NA,
NA, "Discharge", NA, NA, NA), Production = c("Fine", "Community order",
NA, "Suspended", NA, "Imprisonment"), Penalty = c("Yes", "No",
"No", "Yes", "No", "No")), problems = structure(list(row = 6L,
col = "Penalty", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = "data.frame", row.names = c(NA,
-6L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_double",
"collector")), Possession = structure(list(), class = c("collector_character",
"collector")), Importation = structure(list(), class = c("collector_character",
"collector")), Production = structure(list(), class = c("collector_character",
"collector")), Penalty = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
您的方向是正确的,但在 if_any
中存在一些小的语法问题。
同样在case_when
中,您需要根据优先级放置条件。所以如果 Imprisonment > Community order
那么 Imprisonment
条件应该先于 Community order
.
library(dplyr)
data <- data %>%
mutate(
crim_sanct =
case_when(
if_any(Possession:Production, ~. == "Imprisonment") ~ "imprisonment",
if_any(Possession:Production, ~ . == "Discharge") ~ "Discharge",
if_any(Possession:Production, ~. == "Suspended") ~ "Suspended",
if_any(Possession:Production, ~. == "Fine") | (Penalty == "Yes") ~ "Fine",
if_any(Possession:Production, ~. == "Community order") ~ "Community order")
)
data
# ID Possession Importation Production Penalty crim_sanct
#1 1 Fine <NA> Fine Yes Fine
#2 2 <NA> <NA> Community order No Community order
#3 3 Discharge Discharge <NA> No Discharge
#4 4 <NA> <NA> Suspended Yes Suspended
#5 5 Imprisonment <NA> <NA> No imprisonment
#6 6 Fine <NA> Imprisonment No imprisonment
我有一个二元变量(“处罚”)和 30 个具有相同水平的因素:“出院”、“停职”、“罚款”、“社区秩序”和“监禁”。
一个小例子:
ID | 拥有 | 进口 | 生产 | 惩罚 |
---|---|---|---|---|
1 | 很好 | 不适用 | 很好 | 是 |
2 | 不适用 | 不适用 | 社区秩序 | 没有 |
3 | 放电 | 放电 | 不适用 | 没有 |
4 | 不适用 | 不适用 | 已暂停 | 是 |
5 | 监禁 | 不适用 | 不适用 | 没有 |
6 | 很好 | 不适用 | 监禁 | 没有 |
我想根据这些列的相同条件加上二进制变量创建一个新因子,并且在同一行中有不同水平的地方希望新变量 'sentence' 保留水平这个优先顺序:监禁>社区秩序,停职>罚款>释放。例如放电只会出现在没有其他级别出现的新列中。
期望的输出:
ID | 拥有 | 进口 | 生产 | 惩罚 | 句子 |
---|---|---|---|---|---|
1 | 很好 | 不适用 | 很好 | 是 | 很好 |
2 | 不适用 | 不适用 | 社区秩序 | 没有 | 社区秩序 |
3 | 放电 | 放电 | 不适用 | 没有 | 放电 |
4 | 不适用 | 不适用 | 已暂停 | 是 | 已暂停 |
5 | 监禁 | 不适用 | 不适用 | 没有 | 监禁 |
6 | 很好 | 不适用 | 监禁 | 没有 | 监禁 |
这是我尝试过的:(其中“vec”是因子列索引的向量)
data <- data %>%
mutate(
crim_sanct = case_when(
(if_any(vec) == "Discharge") ~ "Discharge",
(if_any(vec) == "Fine") | (data$Penalty == "Yes") ~ "Fine",
(if_any(vec) == "Suspended") ~ "Suspended",
(if_any(vec) == "Community order") ~ "Community order",
(if_any(vec) == "Imprisonment") ~ "imprisonment"))
由于我不知道如何处理Penalty
栏,我们暂时忽略它。根据 Possession
、Importation
和 Production
列创建列 Sentence
可以用
library(dplyr)
data %>%
mutate(across(
Possession:Production,
~ factor(.x,
c("Imprisonment", "Community order", "Suspended", "Fine", "Discharge"),
ordered = TRUE))) %>%
rowwise() %>%
mutate(Sentence = min(c_across(Possession:Production), na.rm = TRUE)) %>%
ungroup()
哪个returns
# A tibble: 6 x 6
ID Possession Importation Production Penalty Sentence
<dbl> <ord> <ord> <ord> <chr> <ord>
1 1 Fine NA Fine Yes Fine
2 2 NA NA Community order No Community order
3 3 Discharge Discharge NA No Discharge
4 4 NA NA Suspended Yes Suspended
5 5 Imprisonment NA NA No Imprisonment
6 6 Fine NA Imprisonment No Imprisonment
这里的主要思想是创建有序因子并使用行向 min
-函数来获取具有最高优先级的句子。
数据
data <- structure(list(ID = c(1, 2, 3, 4, 5, 6), Possession = c("Fine",
NA, "Discharge", NA, "Imprisonment", "Fine"), Importation = c(NA,
NA, "Discharge", NA, NA, NA), Production = c("Fine", "Community order",
NA, "Suspended", NA, "Imprisonment"), Penalty = c("Yes", "No",
"No", "Yes", "No", "No")), problems = structure(list(row = 6L,
col = "Penalty", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = "data.frame", row.names = c(NA,
-6L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_double",
"collector")), Possession = structure(list(), class = c("collector_character",
"collector")), Importation = structure(list(), class = c("collector_character",
"collector")), Production = structure(list(), class = c("collector_character",
"collector")), Penalty = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
您的方向是正确的,但在 if_any
中存在一些小的语法问题。
同样在case_when
中,您需要根据优先级放置条件。所以如果 Imprisonment > Community order
那么 Imprisonment
条件应该先于 Community order
.
library(dplyr)
data <- data %>%
mutate(
crim_sanct =
case_when(
if_any(Possession:Production, ~. == "Imprisonment") ~ "imprisonment",
if_any(Possession:Production, ~ . == "Discharge") ~ "Discharge",
if_any(Possession:Production, ~. == "Suspended") ~ "Suspended",
if_any(Possession:Production, ~. == "Fine") | (Penalty == "Yes") ~ "Fine",
if_any(Possession:Production, ~. == "Community order") ~ "Community order")
)
data
# ID Possession Importation Production Penalty crim_sanct
#1 1 Fine <NA> Fine Yes Fine
#2 2 <NA> <NA> Community order No Community order
#3 3 Discharge Discharge <NA> No Discharge
#4 4 <NA> <NA> Suspended Yes Suspended
#5 5 Imprisonment <NA> <NA> No imprisonment
#6 6 Fine <NA> Imprisonment No imprisonment