在不指定列名的情况下将列转换为行
Converting columns into rows without specifying the column names
我有一个具有以下结构的数据框:
bad_df <- data.frame(
id = c("id001", "id002", "id003"),
participant.1 = c("Jana", "Marina", "Vasilei"),
participant.2 = c("Niko", "Micha", "Niko"),
role.1 = c("writer", "writer", "speaker"),
role.2 = c("observer", "observer", "observer"),
stringsAsFactors = F
)
bad_df
我需要把它收集成这样的东西。每行应包含一个 ID、参与者和角色。
good_df <- data.frame(
id = c("id001", "id001", "id002", "id002", "id003", "id003"),
participant = c("Jana", "Niko", "Marina", "Micha", "Vasilei", "Niko"),
role = c("writer", "observer", "writer", "observer", "speaker", "observer"),
stringsAsFactors = F
)
good_df
我看到有无数类似的问题,但我发现很难理解如何将 tidyr
或 reshape2
应用于这种情况。我知道这必须以某种方式通过 gather() 实现。
但是,数据框可以包含更多的参与者和相应的角色,因此理想情况下,该方法不需要指定列名。下面是我想出的一种解决方案,但我认为这不是最优雅的方法。而且我仍然需要处理一些包含 participant.3、role.3 等的数据帧
good_df2 <- rbind(bad_df %>% select(id, participant.1, role.1) %>%
rename(participant = participant.1, role = role.1),
bad_df %>% select(id, participant.2, role.2) %>%
rename(participant = participant.2, role = role.2))
good_df2
谢谢!
您可以尝试 data.table
的开发版本,即。 v1.9.5
。安装说明是 here
library(data.table)
melt(setDT(bad_df), measure=list(grep('participant', names(bad_df)),
grep('role', names(bad_df))))[order(id)][, variable:= NULL]
# id value1 value2
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
或者我们可以使用 merged.stack
,我们只需要提供唯一列的前缀。根据前缀值,它将具有相同前缀的列分组在一起。
library(splitstackshape)
merged.stack(bad_df, var.stubs=c('participant', 'role'),
sep='var.stubs')[, 2:= NULL]
# id participant role
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
或使用dplyr/tidyr
library(dplyr)
library(tidyr)
gather(bad_df, Var, Val, -id) %>%
separate(Var, into=c('Var1', 'Var2')) %>%
spread(Var1, Val) %>%
select(-Var2)
# id participant role
#1 id001 Jana writer
#2 id001 Niko observer
#3 id002 Marina writer
#4 id002 Micha observer
#5 id003 Vasilei speaker
#6 id003 Niko observer
我会在 base
R:
走这条路
#find the participant columns
partCol<-grep("part",colnames(bad_df))
#... and the role columns
roleCol<-grep("role",colnames(bad_df))
data.frame(id=rep(bad_df$id,each=length(partCol)),
partecipant=as.vector(as.matrix(t(bad_df[,partCol]))),
role=as.vector(as.matrix(t(bad_df[,roleCol]))))
我有一个具有以下结构的数据框:
bad_df <- data.frame(
id = c("id001", "id002", "id003"),
participant.1 = c("Jana", "Marina", "Vasilei"),
participant.2 = c("Niko", "Micha", "Niko"),
role.1 = c("writer", "writer", "speaker"),
role.2 = c("observer", "observer", "observer"),
stringsAsFactors = F
)
bad_df
我需要把它收集成这样的东西。每行应包含一个 ID、参与者和角色。
good_df <- data.frame(
id = c("id001", "id001", "id002", "id002", "id003", "id003"),
participant = c("Jana", "Niko", "Marina", "Micha", "Vasilei", "Niko"),
role = c("writer", "observer", "writer", "observer", "speaker", "observer"),
stringsAsFactors = F
)
good_df
我看到有无数类似的问题,但我发现很难理解如何将 tidyr
或 reshape2
应用于这种情况。我知道这必须以某种方式通过 gather() 实现。
但是,数据框可以包含更多的参与者和相应的角色,因此理想情况下,该方法不需要指定列名。下面是我想出的一种解决方案,但我认为这不是最优雅的方法。而且我仍然需要处理一些包含 participant.3、role.3 等的数据帧
good_df2 <- rbind(bad_df %>% select(id, participant.1, role.1) %>%
rename(participant = participant.1, role = role.1),
bad_df %>% select(id, participant.2, role.2) %>%
rename(participant = participant.2, role = role.2))
good_df2
谢谢!
您可以尝试 data.table
的开发版本,即。 v1.9.5
。安装说明是 here
library(data.table)
melt(setDT(bad_df), measure=list(grep('participant', names(bad_df)),
grep('role', names(bad_df))))[order(id)][, variable:= NULL]
# id value1 value2
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
或者我们可以使用 merged.stack
,我们只需要提供唯一列的前缀。根据前缀值,它将具有相同前缀的列分组在一起。
library(splitstackshape)
merged.stack(bad_df, var.stubs=c('participant', 'role'),
sep='var.stubs')[, 2:= NULL]
# id participant role
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
或使用dplyr/tidyr
library(dplyr)
library(tidyr)
gather(bad_df, Var, Val, -id) %>%
separate(Var, into=c('Var1', 'Var2')) %>%
spread(Var1, Val) %>%
select(-Var2)
# id participant role
#1 id001 Jana writer
#2 id001 Niko observer
#3 id002 Marina writer
#4 id002 Micha observer
#5 id003 Vasilei speaker
#6 id003 Niko observer
我会在 base
R:
#find the participant columns
partCol<-grep("part",colnames(bad_df))
#... and the role columns
roleCol<-grep("role",colnames(bad_df))
data.frame(id=rep(bad_df$id,each=length(partCol)),
partecipant=as.vector(as.matrix(t(bad_df[,partCol]))),
role=as.vector(as.matrix(t(bad_df[,roleCol]))))