如何合并两行并避免重复行变量中的字符串条目?
How can I combine two rows and avoid duplicating the string entries in the row variables?
我想将数据框中的两行合并为一行,但要避免重复条目。那就是我想从这个开始:
RowA RowB
A,B A,B,C
A A
对此:
RowA RowB RowC
A,B A,B,C A,B,C
A A A
使用 tidyr
中的联合,我实际得到的是:
RowA RowB RowC
A,B A,B,C A,B,A,B,C
A A A,A
#sample data
df <- read.table(text='colA colB
A,B A,B,C
A A', header=T)
library(dplyr)
library(tidyr)
temp <- df %>% unite(colC, colA, colB, sep=',')
df$colC <- sapply(strsplit(temp$colC ,","), function(x) paste(unique(x), collapse=","))
输出为:
colA colB colC
1 A,B A,B,C A,B,C
2 A A A
基础 R
df <- read.table(text="RowA RowB
A,B A,B,C
A A", header=TRUE, stringsAsFactors=FALSE)
myfun <- function(dfrow) {
paste(unique(unlist(strsplit(paste(dfrow, collapse=","), ","))), collapse=",")
}
df$RowC <- sapply(seq_len(nrow(df)), function(i) myfun(df[i,]))
# RowA RowB RowC
# 1 A,B A,B,C A,B,C
# 2 A A A
我想将数据框中的两行合并为一行,但要避免重复条目。那就是我想从这个开始:
RowA RowB
A,B A,B,C
A A
对此:
RowA RowB RowC
A,B A,B,C A,B,C
A A A
使用 tidyr
中的联合,我实际得到的是:
RowA RowB RowC
A,B A,B,C A,B,A,B,C
A A A,A
#sample data
df <- read.table(text='colA colB
A,B A,B,C
A A', header=T)
library(dplyr)
library(tidyr)
temp <- df %>% unite(colC, colA, colB, sep=',')
df$colC <- sapply(strsplit(temp$colC ,","), function(x) paste(unique(x), collapse=","))
输出为:
colA colB colC
1 A,B A,B,C A,B,C
2 A A A
基础 R
df <- read.table(text="RowA RowB
A,B A,B,C
A A", header=TRUE, stringsAsFactors=FALSE)
myfun <- function(dfrow) {
paste(unique(unlist(strsplit(paste(dfrow, collapse=","), ","))), collapse=",")
}
df$RowC <- sapply(seq_len(nrow(df)), function(i) myfun(df[i,]))
# RowA RowB RowC
# 1 A,B A,B,C A,B,C
# 2 A A A