如何使用 R 语言基于多个二进制变量在数据框中创建新变量?
How to create a new variable within a data frame based on multiple binary variables using R language?
数据框df有13个变量如下,
column 1-id
column 2-pr_1 (pr_1 to pr_12 are binary variables)
column 3-pr_2
column 4-pr_3
column 5-pr_4
column 6-pr_5
column 7-pr_6
column 8-pr_7
column 9-pr_8
column10-pr_9
column11-pr_10
column12-pr_11
column13-pr_12
Now, a variable "try" need to be created with
the following rule within the data frame and
for each observation,
1)-The value of pr_1 always equals to 1.
2)-If all elements from pr_1 to pr_12 are 1, then try=13
3)-If there is a missing value(NA) between pr_1 to pr_12, then try= NA
4)-If the 1st 0 occurs right after the last 1, for example, the 1st 0 occurs in the variable pr_6 and the last 1 is in the variable pr_5, then the value of "try" should equal to 6 (6=5+1).
换句话说,“try”的值应该等于连续1的重复次数(重复次数中没有任何0或NA)加1。
带有新变量“try”的新数据集如下所示,
id pr_1 pr_2 pr_3 pr_4 pr_5 pr_6 pr_7 pr_8 pr_9 pr_10 pr_11 pr_12 try
j01 1 1 1 1 1 0 0 0 0 0 0 0 6
j02 1 1 1 0 0 0 0 0 0 0 0 0 4
j03 1 0 0 0 0 0 0 0 0 0 0 0 2
j04 1 1 1 1 1 1 1 1 1 1 1 1 13
j05 1 1 1 1 1 1 1 1 NA 1 1 NA NA
j06 1 1 1 1 1 NA NA NA NA NA NA NA NA
j07 1 0 NA 0 0 0 0 0 0 0 0 0 NA
j08 1 NA 0 0 0 0 0 0 0 0 0 0 NA
j09 1 NA 0 NA NA 1 NA NA NA NA 1 1 NA
j10 1 NA 1 1 1 1 1 1 1 1 1 0 NA
原始数据集结构如下,
structure(list(id = c("j01", "j02", "j03", "j04", "j05", "j06",
"j07", "j08", "j09", "j10"), pr_1 = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1), pr_2 = c(1, 1, 0, 1, 1, 1, 0, NA, NA, NA), pr_3 = c(1,
1, 0, 1, 1, 1, NA, 0, 0, 1), pr_4 = c(1, 0, 0, 1, 1, NA, 0,
0, NA, 1), pr_5 = c(1, 0, 0, 1, 1, NA, 0, 0, 1, 1), pr_6 = c(0,
0, 0, 1, 1, NA, 0, 0, NA, 1), pr_7 = c(0, 0, 0, 1, 1, NA, 0,
0, NA, 1), pr_8 = c(0, 0, 0, 1, 1, NA, 0, 0, NA, 1), pr_9 = c(0,
0, 0, 1, NA, NA, 0, 0, 1, 1), pr_10 = c(0, 0, 0, 1, 1, NA, 0,
0, NA, 1), pr_11 = c(0, 0, 0, 1, 1, NA, 0, 0, NA, 1), pr_12 = c(0,
0, 0, 1, 0, NA, 0, 0, NA, 0)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))->df
相信这会给你“尝试”专栏rowSums(apply(df[, 2:13], 2, function(x) (x == 1))) + 1
。大意是按列检查元素是否等于1,然后按行求和。请注意,您提供的数据集中的 pr_3
列与上面显示的不同。为了获得您想要的相同结果,我认为这是一个错字并将其从 pr_3 = c(1, 1, 0, 1, 1, 1, NA, 0, NA, 1)
更改为 pr_3 = c(1,1, 0, 1, 1, 1, 0, 0, NA, 1)
。
您也可以使用此代码。但是我相信当存在诸如 101111 之类的模式时,您想拥有 NA 吗?无论如何,下面的代码不能那样工作,仍然计算 1s。
df %>%
tidyr::pivot_longer(-id) %>%
dplyr::group_by(id) %>%
dplyr::mutate(try = sum(value) + 1) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = name, values_from = value)
您可以添加所有 pr
列的按行总和。
df$try <- rowSums(df[-1]) + 1
# id pr_1 pr_2 pr_3 pr_4 pr_5 pr_6 pr_7 pr_8 pr_9 pr_10 pr_11 pr_12 try
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 j01 1 1 1 1 1 0 0 0 0 0 0 0 6
# 2 j02 1 1 1 0 0 0 0 0 0 0 0 0 4
3 3 j03 1 0 0 0 0 0 0 0 0 0 0 0 2
# 4 j04 1 1 1 1 1 1 1 1 1 1 1 1 13
# 5 j05 1 1 1 1 1 1 1 1 NA 1 1 0 NA
# 6 j06 1 1 1 NA NA NA NA NA NA NA NA NA NA
# 7 j07 1 0 0 0 0 0 0 0 0 0 0 0 2
# 8 j08 1 NA 0 0 0 0 0 0 0 0 0 0 NA
# 9 j09 1 NA NA NA 1 NA NA NA 1 NA NA NA NA
#10 j10 1 NA 1 1 1 1 1 1 1 1 1 0 NA
或使用dplyr
:
library(dplyr)
df %>% mutate(try = rowSums(select(., starts_with('pr'))) + 1)
数据框df有13个变量如下,
column 1-id
column 2-pr_1 (pr_1 to pr_12 are binary variables)
column 3-pr_2
column 4-pr_3
column 5-pr_4
column 6-pr_5
column 7-pr_6
column 8-pr_7
column 9-pr_8
column10-pr_9
column11-pr_10
column12-pr_11
column13-pr_12
Now, a variable "try" need to be created with
the following rule within the data frame and
for each observation,
1)-The value of pr_1 always equals to 1.
2)-If all elements from pr_1 to pr_12 are 1, then try=13
3)-If there is a missing value(NA) between pr_1 to pr_12, then try= NA
4)-If the 1st 0 occurs right after the last 1, for example, the 1st 0 occurs in the variable pr_6 and the last 1 is in the variable pr_5, then the value of "try" should equal to 6 (6=5+1).
换句话说,“try”的值应该等于连续1的重复次数(重复次数中没有任何0或NA)加1。
带有新变量“try”的新数据集如下所示,
id pr_1 pr_2 pr_3 pr_4 pr_5 pr_6 pr_7 pr_8 pr_9 pr_10 pr_11 pr_12 try
j01 1 1 1 1 1 0 0 0 0 0 0 0 6
j02 1 1 1 0 0 0 0 0 0 0 0 0 4
j03 1 0 0 0 0 0 0 0 0 0 0 0 2
j04 1 1 1 1 1 1 1 1 1 1 1 1 13
j05 1 1 1 1 1 1 1 1 NA 1 1 NA NA
j06 1 1 1 1 1 NA NA NA NA NA NA NA NA
j07 1 0 NA 0 0 0 0 0 0 0 0 0 NA
j08 1 NA 0 0 0 0 0 0 0 0 0 0 NA
j09 1 NA 0 NA NA 1 NA NA NA NA 1 1 NA
j10 1 NA 1 1 1 1 1 1 1 1 1 0 NA
原始数据集结构如下,
structure(list(id = c("j01", "j02", "j03", "j04", "j05", "j06",
"j07", "j08", "j09", "j10"), pr_1 = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1), pr_2 = c(1, 1, 0, 1, 1, 1, 0, NA, NA, NA), pr_3 = c(1,
1, 0, 1, 1, 1, NA, 0, 0, 1), pr_4 = c(1, 0, 0, 1, 1, NA, 0,
0, NA, 1), pr_5 = c(1, 0, 0, 1, 1, NA, 0, 0, 1, 1), pr_6 = c(0,
0, 0, 1, 1, NA, 0, 0, NA, 1), pr_7 = c(0, 0, 0, 1, 1, NA, 0,
0, NA, 1), pr_8 = c(0, 0, 0, 1, 1, NA, 0, 0, NA, 1), pr_9 = c(0,
0, 0, 1, NA, NA, 0, 0, 1, 1), pr_10 = c(0, 0, 0, 1, 1, NA, 0,
0, NA, 1), pr_11 = c(0, 0, 0, 1, 1, NA, 0, 0, NA, 1), pr_12 = c(0,
0, 0, 1, 0, NA, 0, 0, NA, 0)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))->df
相信这会给你“尝试”专栏rowSums(apply(df[, 2:13], 2, function(x) (x == 1))) + 1
。大意是按列检查元素是否等于1,然后按行求和。请注意,您提供的数据集中的 pr_3
列与上面显示的不同。为了获得您想要的相同结果,我认为这是一个错字并将其从 pr_3 = c(1, 1, 0, 1, 1, 1, NA, 0, NA, 1)
更改为 pr_3 = c(1,1, 0, 1, 1, 1, 0, 0, NA, 1)
。
您也可以使用此代码。但是我相信当存在诸如 101111 之类的模式时,您想拥有 NA 吗?无论如何,下面的代码不能那样工作,仍然计算 1s。
df %>%
tidyr::pivot_longer(-id) %>%
dplyr::group_by(id) %>%
dplyr::mutate(try = sum(value) + 1) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = name, values_from = value)
您可以添加所有 pr
列的按行总和。
df$try <- rowSums(df[-1]) + 1
# id pr_1 pr_2 pr_3 pr_4 pr_5 pr_6 pr_7 pr_8 pr_9 pr_10 pr_11 pr_12 try
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 j01 1 1 1 1 1 0 0 0 0 0 0 0 6
# 2 j02 1 1 1 0 0 0 0 0 0 0 0 0 4
3 3 j03 1 0 0 0 0 0 0 0 0 0 0 0 2
# 4 j04 1 1 1 1 1 1 1 1 1 1 1 1 13
# 5 j05 1 1 1 1 1 1 1 1 NA 1 1 0 NA
# 6 j06 1 1 1 NA NA NA NA NA NA NA NA NA NA
# 7 j07 1 0 0 0 0 0 0 0 0 0 0 0 2
# 8 j08 1 NA 0 0 0 0 0 0 0 0 0 0 NA
# 9 j09 1 NA NA NA 1 NA NA NA 1 NA NA NA NA
#10 j10 1 NA 1 1 1 1 1 1 1 1 1 0 NA
或使用dplyr
:
library(dplyr)
df %>% mutate(try = rowSums(select(., starts_with('pr'))) + 1)