将 strsplit 的结果分配给数据框的多列
Assigning results of strsplit to multiple columns of data frame
我正在尝试在一个数据框中将一个字符向量拆分为三个不同的向量。
我的数据是这样的:
> df <- data.frame(filename = c("Author1 (2010) Title of paper",
"Author2 et al (2009) Title of paper",
"Author3 & Author4 (2004) Title of paper"),
stringsAsFactors = FALSE)
我想将这 3 条信息 (authors
、year
、title
) 分成三个不同的列,这样它将是:
> df
filename author year title
1 Author1 (2010) Title1 Author1 2010 Title1
2 Author2 et al (2009) Title2 Author2 et al 2009 Title2
3 Author3 & Author4 (2004) Title3 Author3 & Author4 2004 Title3
我已经使用 strsplit
将每个 filename
拆分为一个包含 3 个元素的向量:
df$temp <- strsplit(df$filename, " \(|\) ")
但是现在,我找不到将每个元素放在单独的列中的方法。我可以访问这样的特定信息:
> df$temp[[2]][1]
[1] "Author2 et al"
但找不到如何将其放入其他列
> df$author <- df$temp[[]][1]
Error
您可以尝试 tstrsplit
开发版 data.table
library(data.table)#v1.9.5+
setDT(df)[, c('author', 'year', 'title') :=tstrsplit(filename, ' \(|\) ')]
df
# filename author year
#1: Author1 (2010) Title of paper Author1 2010
#2: Author2 et al (2009) Title of paper Author2 et al 2009
#3: Author3 & Author4 (2004) Title of paper Author3 & Author4 2004
# title
#1: Title of paper
#2: Title of paper
#3: Title of paper
编辑:包括 OP 的拆分模式以删除空格。
result <- cbind(df, do.call("rbind", strsplit(df$filename, " \(|\) ")))
colnames(result)[2:4] <- c("author", "year", "title")
使用 tidyr
包,这里有一个 separate
解决方案:
separate(df, "filename", c("Author","Year","Title"), sep=" \(|\) "), remove=F)
# filename Author
# 1 Author1 (2010) Title of paper Author1
# 2 Author2 et al (2009) Title of paper Author2 et al
# 3 Author3 & Author4 (2004) Title of paper Author3 & Author4
# Year Title
# 1 2010 Title of paper
# 2 2009 Title of paper
# 3 2004 Title of paper
已考虑前导和尾随空格
数据帧有一个基础t
方法(转置):
res <- t( data.frame( strsplit(df$filename, " \(|\) ") ))
colnames(res) <- c("author", "year", "title")
rownames(res) <- seq_along(rownames(res) )
res
#--------------
author year title
1 "Author1" "2010" "Title of paper"
2 "Author2 et al" "2009" "Title of paper"
3 "Author3 & Author4" "2004" "Title of paper"
我正在尝试在一个数据框中将一个字符向量拆分为三个不同的向量。
我的数据是这样的:
> df <- data.frame(filename = c("Author1 (2010) Title of paper",
"Author2 et al (2009) Title of paper",
"Author3 & Author4 (2004) Title of paper"),
stringsAsFactors = FALSE)
我想将这 3 条信息 (authors
、year
、title
) 分成三个不同的列,这样它将是:
> df
filename author year title
1 Author1 (2010) Title1 Author1 2010 Title1
2 Author2 et al (2009) Title2 Author2 et al 2009 Title2
3 Author3 & Author4 (2004) Title3 Author3 & Author4 2004 Title3
我已经使用 strsplit
将每个 filename
拆分为一个包含 3 个元素的向量:
df$temp <- strsplit(df$filename, " \(|\) ")
但是现在,我找不到将每个元素放在单独的列中的方法。我可以访问这样的特定信息:
> df$temp[[2]][1]
[1] "Author2 et al"
但找不到如何将其放入其他列
> df$author <- df$temp[[]][1]
Error
您可以尝试 tstrsplit
开发版 data.table
library(data.table)#v1.9.5+
setDT(df)[, c('author', 'year', 'title') :=tstrsplit(filename, ' \(|\) ')]
df
# filename author year
#1: Author1 (2010) Title of paper Author1 2010
#2: Author2 et al (2009) Title of paper Author2 et al 2009
#3: Author3 & Author4 (2004) Title of paper Author3 & Author4 2004
# title
#1: Title of paper
#2: Title of paper
#3: Title of paper
编辑:包括 OP 的拆分模式以删除空格。
result <- cbind(df, do.call("rbind", strsplit(df$filename, " \(|\) ")))
colnames(result)[2:4] <- c("author", "year", "title")
使用 tidyr
包,这里有一个 separate
解决方案:
separate(df, "filename", c("Author","Year","Title"), sep=" \(|\) "), remove=F)
# filename Author
# 1 Author1 (2010) Title of paper Author1
# 2 Author2 et al (2009) Title of paper Author2 et al
# 3 Author3 & Author4 (2004) Title of paper Author3 & Author4
# Year Title
# 1 2010 Title of paper
# 2 2009 Title of paper
# 3 2004 Title of paper
已考虑前导和尾随空格
数据帧有一个基础t
方法(转置):
res <- t( data.frame( strsplit(df$filename, " \(|\) ") ))
colnames(res) <- c("author", "year", "title")
rownames(res) <- seq_along(rownames(res) )
res
#--------------
author year title
1 "Author1" "2010" "Title of paper"
2 "Author2 et al" "2009" "Title of paper"
3 "Author3 & Author4" "2004" "Title of paper"