基于其他列的字符串的虚拟变量列
dummy variable columns based on strings from other columns
我有一个数据库,其中包含患者 ID 号和他们接受的治疗。我想为每个不同的个体治疗设置一个虚拟列(即,就像患者接受治疗 A、B、C、D 一样)。
这很简单,因为我有超过 20 种治疗方法和数千名患者,我想不出一个简单的方法来做到这一点。
example <- data.frame(id_number = c(0, 1, 2, 3, 4),
treatment = c("A", "A+B+C+D", "C+B", "B+A", "C"))
我想要这样的东西:
desired_result <- data.frame(id_number = c(0, 1, 2, 3, 4),
treatment = c("A", "A+B+C+D", "C+B", "B+A","C"),
A=c(1,1,0,1,0),
B=c(0,1,1,1,0),
C=c(0,1,1,0,1),
D=c(0,1,0,0,0))
一个 tidyverse
可能性是:
example %>%
mutate(treatment2 = strsplit(treatment, "+", fixed = TRUE)) %>%
unnest() %>%
spread(treatment2, treatment2) %>%
mutate_at(vars(-id_number, -treatment), ~ (!is.na(.)) * 1)
id_number treatment A B C D
1 0 A 1 0 0 0
2 1 A+B+C+D 1 1 1 1
3 2 C+B 0 1 1 0
4 3 B+A 1 1 0 0
5 4 C 0 0 1 0
或者:
example %>%
mutate(treatment2 = strsplit(treatment, "+", fixed = TRUE)) %>%
unnest() %>%
mutate(val = 1) %>%
spread(treatment2, val, fill = 0)
一个base
版本:
example["A"] <- as.numeric(grepl("A", example[,"treatment"]))
example["B"] <- as.numeric(grepl("B", example[,"treatment"]))
example["C"] <- as.numeric(grepl("C", example[,"treatment"]))
example["D"] <- as.numeric(grepl("D", example[,"treatment"]))
example
id_number treatment A B C D
1 0 A 1 0 0 0
2 1 A+B+C+D 1 1 1 1
3 2 C+B 0 1 1 0
4 3 B+A 1 1 0 0
5 4 C 0 0 1 0
grepl
函数测试每行中每个模式的存在,as.numeric
将逻辑 TRUE/FALSE 更改为 1/0
我有一个数据库,其中包含患者 ID 号和他们接受的治疗。我想为每个不同的个体治疗设置一个虚拟列(即,就像患者接受治疗 A、B、C、D 一样)。
这很简单,因为我有超过 20 种治疗方法和数千名患者,我想不出一个简单的方法来做到这一点。
example <- data.frame(id_number = c(0, 1, 2, 3, 4),
treatment = c("A", "A+B+C+D", "C+B", "B+A", "C"))
我想要这样的东西:
desired_result <- data.frame(id_number = c(0, 1, 2, 3, 4),
treatment = c("A", "A+B+C+D", "C+B", "B+A","C"),
A=c(1,1,0,1,0),
B=c(0,1,1,1,0),
C=c(0,1,1,0,1),
D=c(0,1,0,0,0))
一个 tidyverse
可能性是:
example %>%
mutate(treatment2 = strsplit(treatment, "+", fixed = TRUE)) %>%
unnest() %>%
spread(treatment2, treatment2) %>%
mutate_at(vars(-id_number, -treatment), ~ (!is.na(.)) * 1)
id_number treatment A B C D
1 0 A 1 0 0 0
2 1 A+B+C+D 1 1 1 1
3 2 C+B 0 1 1 0
4 3 B+A 1 1 0 0
5 4 C 0 0 1 0
或者:
example %>%
mutate(treatment2 = strsplit(treatment, "+", fixed = TRUE)) %>%
unnest() %>%
mutate(val = 1) %>%
spread(treatment2, val, fill = 0)
一个base
版本:
example["A"] <- as.numeric(grepl("A", example[,"treatment"]))
example["B"] <- as.numeric(grepl("B", example[,"treatment"]))
example["C"] <- as.numeric(grepl("C", example[,"treatment"]))
example["D"] <- as.numeric(grepl("D", example[,"treatment"]))
example
id_number treatment A B C D
1 0 A 1 0 0 0
2 1 A+B+C+D 1 1 1 1
3 2 C+B 0 1 1 0
4 3 B+A 1 1 0 0
5 4 C 0 0 1 0
grepl
函数测试每行中每个模式的存在,as.numeric
将逻辑 TRUE/FALSE 更改为 1/0