将各种 dummy/logical 变量从它们在 R 中的名称转换为单个分类 variable/factor
Convert various dummy/logical variables into a single categorical variable/factor from their name in R
我的问题与 this one and 非常相似,但我的数据集有点不同,我似乎无法使这些解决方案起作用。如果我误解了什么,请原谅,这个问题是多余的。
我有一个这样的数据集:
df <- data.frame(
id = c(1:5),
conditionA = c(1, NA, NA, NA, 1),
conditionB = c(NA, 1, NA, NA, NA),
conditionC = c(NA, NA, 1, NA, NA),
conditionD = c(NA, NA, NA, 1, NA)
)
# id conditionA conditionB conditionC conditionD
# 1 1 1 NA NA NA
# 2 2 NA 1 NA NA
# 3 3 NA NA 1 NA
# 4 4 NA NA NA 1
# 5 5 1 NA NA NA
(请注意,除了这些列之外,我还有很多其他列不应受到当前操作的影响。)
所以,我观察到 conditionA
、conditionB
、conditionC
和 conditionD
是互斥的,应该更好地呈现为单个分类变量,即 factor
,应该是这样的:
# id type
# 1 1 conditionA
# 2 2 conditionB
# 3 3 conditionC
# 4 4 conditionD
# 5 5 conditionA
我已经使用 tidyr
中的 gather
或 unite
进行了调查,但它不符合这种情况(使用 unite
,我们丢失了来自变量名).
我尝试使用 kimisc::coalescence.na
,如第一个参考答案中所建议的,但是 1. 我需要首先根据每列的名称设置一个因子值,2. 它没有按预期工作, 仅包括第一列 :
library(kimisc)
# first, factor each condition with a specific label
df$conditionA <- df$conditionA %>%
factor(levels = 1, labels = "conditionA")
df$conditionB <- df$conditionB %>%
factor(levels = 1, labels = "conditionB")
df$conditionC <- df$conditionC %>%
factor(levels = 1, labels = "conditionC")
df$conditionD <- df$conditionD %>%
factor(levels = 1, labels = "conditionD")
# now coalesce.na to merge into a single variable
df$type <- coalesce.na(df$conditionA, df$conditionB, df$conditionC, df$conditionD)
df
# id conditionA conditionB conditionC conditionD type
# 1 1 conditionA <NA> <NA> <NA> conditionA
# 2 2 <NA> conditionB <NA> <NA> <NA>
# 3 3 <NA> <NA> conditionC <NA> <NA>
# 4 4 <NA> <NA> <NA> conditionD <NA>
# 5 5 conditionA <NA> <NA> <NA> conditionA
我尝试了第二个问题的其他建议,但没有找到能给我带来预期结果的建议...
library(tidyr)
library(dplyr)
df <- df %>%
gather(type, count, -id)
df <- df[complete.cases(df),][,-3]
df[order(df$id),]
id type
1 1 conditionA
7 2 conditionB
13 3 conditionC
19 4 conditionD
5 5 conditionA
您也可以试试:
colnames(df)[2:5][max.col(!is.na(df[,2:5]))]
#[1] "conditionA" "conditionB" "conditionC" "conditionD" "conditionA"
如果每一行只有一列的值不是 NA
,则上述方法有效。如果一行的值都可以是NA
,那你可以试试:
mat<-!is.na(df[,2:5])
colnames(df)[2:5][max.col(mat)*(NA^!rowSums(mat))]
尝试:
library(dplyr)
library(tidyr)
df %>% gather(type, value, -id) %>% na.omit() %>% select(-value) %>% arrange(id)
给出:
# id type
#1 1 conditionA
#2 2 conditionB
#3 3 conditionC
#4 4 conditionD
#5 5 conditionA
更新
要处理您在评论中详述的情况,您可以对数据框的所需部分进行操作,然后 left_join()
其他列:
df %>%
select(starts_with("condition"), id) %>%
gather(type, value, -id) %>%
na.omit() %>%
select(-value) %>%
left_join(., df %>% select(-starts_with("condition"))) %>%
arrange(id)
我的问题与 this one and
我有一个这样的数据集:
df <- data.frame(
id = c(1:5),
conditionA = c(1, NA, NA, NA, 1),
conditionB = c(NA, 1, NA, NA, NA),
conditionC = c(NA, NA, 1, NA, NA),
conditionD = c(NA, NA, NA, 1, NA)
)
# id conditionA conditionB conditionC conditionD
# 1 1 1 NA NA NA
# 2 2 NA 1 NA NA
# 3 3 NA NA 1 NA
# 4 4 NA NA NA 1
# 5 5 1 NA NA NA
(请注意,除了这些列之外,我还有很多其他列不应受到当前操作的影响。)
所以,我观察到 conditionA
、conditionB
、conditionC
和 conditionD
是互斥的,应该更好地呈现为单个分类变量,即 factor
,应该是这样的:
# id type
# 1 1 conditionA
# 2 2 conditionB
# 3 3 conditionC
# 4 4 conditionD
# 5 5 conditionA
我已经使用 tidyr
中的 gather
或 unite
进行了调查,但它不符合这种情况(使用 unite
,我们丢失了来自变量名).
我尝试使用 kimisc::coalescence.na
,如第一个参考答案中所建议的,但是 1. 我需要首先根据每列的名称设置一个因子值,2. 它没有按预期工作, 仅包括第一列 :
library(kimisc)
# first, factor each condition with a specific label
df$conditionA <- df$conditionA %>%
factor(levels = 1, labels = "conditionA")
df$conditionB <- df$conditionB %>%
factor(levels = 1, labels = "conditionB")
df$conditionC <- df$conditionC %>%
factor(levels = 1, labels = "conditionC")
df$conditionD <- df$conditionD %>%
factor(levels = 1, labels = "conditionD")
# now coalesce.na to merge into a single variable
df$type <- coalesce.na(df$conditionA, df$conditionB, df$conditionC, df$conditionD)
df
# id conditionA conditionB conditionC conditionD type
# 1 1 conditionA <NA> <NA> <NA> conditionA
# 2 2 <NA> conditionB <NA> <NA> <NA>
# 3 3 <NA> <NA> conditionC <NA> <NA>
# 4 4 <NA> <NA> <NA> conditionD <NA>
# 5 5 conditionA <NA> <NA> <NA> conditionA
我尝试了第二个问题的其他建议,但没有找到能给我带来预期结果的建议...
library(tidyr)
library(dplyr)
df <- df %>%
gather(type, count, -id)
df <- df[complete.cases(df),][,-3]
df[order(df$id),]
id type
1 1 conditionA
7 2 conditionB
13 3 conditionC
19 4 conditionD
5 5 conditionA
您也可以试试:
colnames(df)[2:5][max.col(!is.na(df[,2:5]))]
#[1] "conditionA" "conditionB" "conditionC" "conditionD" "conditionA"
如果每一行只有一列的值不是 NA
,则上述方法有效。如果一行的值都可以是NA
,那你可以试试:
mat<-!is.na(df[,2:5])
colnames(df)[2:5][max.col(mat)*(NA^!rowSums(mat))]
尝试:
library(dplyr)
library(tidyr)
df %>% gather(type, value, -id) %>% na.omit() %>% select(-value) %>% arrange(id)
给出:
# id type
#1 1 conditionA
#2 2 conditionB
#3 3 conditionC
#4 4 conditionD
#5 5 conditionA
更新
要处理您在评论中详述的情况,您可以对数据框的所需部分进行操作,然后 left_join()
其他列:
df %>%
select(starts_with("condition"), id) %>%
gather(type, value, -id) %>%
na.omit() %>%
select(-value) %>%
left_join(., df %>% select(-starts_with("condition"))) %>%
arrange(id)