如何创建一个显示 4 个虚拟变量的不同组合的新变量?
How to create a new variable that shows different combinations of 4 dummy variables?
我有 4 个虚拟变量,取值 0 或 1 对应于是否采用某种技术。数据框有超过 14000 行。
我想遍历这 4 列,将 == 1 的不同组合放入一个新变量中。
数据
structure(list(tech1 = structure(c(2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), tech2 = structure(c(2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"), tech3 = structure(c(1L, 1L, 2L, 1L), .Label = c("0", "1"), class = "factor"), tech4 = structure(c(1L, 1L, 2L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, 4L), class = "data.frame")
由于可能有不同的组合,我的新变量应该包含每行表示哪些技术的信息,即在 4 种技术中,每种情况采用了哪些技术。
这是新变量的前四行最后的样子(假设“12”= 采用技术 1 和 2 等等):
变量“技术”:
structure(list(Tech = structure(c(1L, 2L, 3L, 4L), .Label = c("12", "2", "234", "2"), class = "factor")),row.names = c(NA, 4L), class = "data.frame")
我看到了一些可以工作的功能(例如聚合),但到目前为止我还没有找到解决方案。
在不知道您想要的最终状态是什么的情况下,使用 apply
函数,您可以生成每列 1 的行列表和每行 1 的列列表。
m <- matrix(sample(0:1, 100, replace = TRUE), ncol = 4)
rows <- apply(m, 1, function(x) which(x == 1))
cols <- apply(m, 2, function(x) which(x == 1))
继 SteveM 之后:
data.frame(tech=apply(df, 1, function(x) paste(which(x==1), collapse="")))
tech
#1 12
#2 2
#3 234
#4 2
或者 tidyverse 方法:
df %>%
mutate(id=row_number()) %>%
pivot_longer(tech1:tech4) %>%
filter(value==1) %>%
group_by(id) %>%
summarise(Tech=paste(gsub("tech", "", name), collapse = ""))
# A tibble: 4 x 2
# id Tech
# <int> <chr>
#1 1 12
#2 2 2
#3 3 234
#4 4 2
library(tidyverse)
(df <- tribble(
~dum1, ~dum2, ~dum3, ~dum4, ~value,
F, T, F, T, 12,
T, T, F, F, 20,
F, T, F, T, 32,
T, F, T, F , 27))
(
df
%>% mutate(dum1 = ifelse(dum1, "1", ""),
dum2 = ifelse(dum2, "2", ""),
dum3 = ifelse(dum3, "3", ""),
dum4 = ifelse(dum4, "4", ""),
which_tech = paste0(dum1, dum2, dum3, dum4))
)
输出:
# A tibble: 4 x 6
dum1 dum2 dum3 dum4 value which_tech
<chr> <chr> <chr> <chr> <dbl> <chr>
1 "" "2" "" "4" 12 24
2 "1" "2" "" "" 20 12
3 "" "2" "" "4" 32 24
4 "1" "" "3" "" 27 13
我有 4 个虚拟变量,取值 0 或 1 对应于是否采用某种技术。数据框有超过 14000 行。
我想遍历这 4 列,将 == 1 的不同组合放入一个新变量中。
数据
structure(list(tech1 = structure(c(2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), tech2 = structure(c(2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"), tech3 = structure(c(1L, 1L, 2L, 1L), .Label = c("0", "1"), class = "factor"), tech4 = structure(c(1L, 1L, 2L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, 4L), class = "data.frame")
由于可能有不同的组合,我的新变量应该包含每行表示哪些技术的信息,即在 4 种技术中,每种情况采用了哪些技术。
这是新变量的前四行最后的样子(假设“12”= 采用技术 1 和 2 等等):
变量“技术”:
structure(list(Tech = structure(c(1L, 2L, 3L, 4L), .Label = c("12", "2", "234", "2"), class = "factor")),row.names = c(NA, 4L), class = "data.frame")
我看到了一些可以工作的功能(例如聚合),但到目前为止我还没有找到解决方案。
在不知道您想要的最终状态是什么的情况下,使用 apply
函数,您可以生成每列 1 的行列表和每行 1 的列列表。
m <- matrix(sample(0:1, 100, replace = TRUE), ncol = 4)
rows <- apply(m, 1, function(x) which(x == 1))
cols <- apply(m, 2, function(x) which(x == 1))
继 SteveM 之后:
data.frame(tech=apply(df, 1, function(x) paste(which(x==1), collapse="")))
tech
#1 12
#2 2
#3 234
#4 2
或者 tidyverse 方法:
df %>%
mutate(id=row_number()) %>%
pivot_longer(tech1:tech4) %>%
filter(value==1) %>%
group_by(id) %>%
summarise(Tech=paste(gsub("tech", "", name), collapse = ""))
# A tibble: 4 x 2
# id Tech
# <int> <chr>
#1 1 12
#2 2 2
#3 3 234
#4 4 2
library(tidyverse)
(df <- tribble(
~dum1, ~dum2, ~dum3, ~dum4, ~value,
F, T, F, T, 12,
T, T, F, F, 20,
F, T, F, T, 32,
T, F, T, F , 27))
(
df
%>% mutate(dum1 = ifelse(dum1, "1", ""),
dum2 = ifelse(dum2, "2", ""),
dum3 = ifelse(dum3, "3", ""),
dum4 = ifelse(dum4, "4", ""),
which_tech = paste0(dum1, dum2, dum3, dum4))
)
输出:
# A tibble: 4 x 6
dum1 dum2 dum3 dum4 value which_tech
<chr> <chr> <chr> <chr> <dbl> <chr>
1 "" "2" "" "4" 12 24
2 "1" "2" "" "" 20 12
3 "" "2" "" "4" 32 24
4 "1" "" "3" "" 27 13