R中的拆分列

Question

我面临以下问题。我有一个 table 和一个名为 title.

的列

title 列包含值类似于 To kill a mockingbird (1960) 的行。

所以基本上列的格式是[title] ([year])。我需要的是两列：title 和 year，year 不带括号。

另一个问题是某些行包含包含括号的标题。但基本上每行的最后 6 个字符是用括号括起来的年份。

如何创建两列，title 和 year？

我有的是：

Books$title <- c("To kill a mockingbird (1960)", "Harry Potter and the order of the phoenix (2003)", "Of mice and men (something something) (1937)")

title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)

我需要的是：

Books$title <- c("To kill a mockingbird", "Harry Potter and the order of the phoenix", "Of mice and men (something something)")
Book$year <- c("1960", "2003", "1937")

title                                             year
To kill a mockingbird                             1960
Harry Potter and the order of the phoenix         2003
Of mice and men (something something)             1937

Answer 1

我们可以绕过 substr 最后 6 个字符。

首先我们重新创建您的 data.frame:

df <- read.table(h=T, sep="\n", stringsAsFactors = FALSE,
text="
Title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)")

然后我们创建一个新的。第一列 Title 是从 df$Title 开始的所有内容，但最后 7 个字符（我们还删除了结尾的 space）。第二列 Year 是 df$Title 的最后 6 个字符，我们删除任何 space、左括号或右括号。 (gsub("[[:punct:]]", ...) 也可以。

data.frame(Title=substr(df$Title, 1, nchar(df$Title)-7),
           Year=gsub(" |\(|\)", "", substr(df$Title, nchar(df$Title)-6, nchar(df$Title))))


                                      Title Year
1                     To kill a mockingbird 1960
2 Harry Potter and the order of the phoenix 2003
3     Of mice and men (something something) 1937

这是否解决了您的问题？

Answer 2

尝试在循环中使用 substrRight(df$Title, 6) 提取最后 6 个字符，以便用括号括起年份并将其另存为新列

Extracting the last n characters from a string in R

Answer 3

类似于@Vincent Bonhomme：

我假设数据在我调用 so.dat 的某个文本文件中，我从那里将数据读入 data.frame，其中还包含两列标题和要提取的年份.然后我使用 substr() 将标题与最后固定长度的年份分开，将 () 单独留下，因为 OP 显然想要它们：

titles      <- data.frame( orig = readLines( "so.dat" ), 
               text = "", yr = "", stringsAsFactors = FALSE )
titles$text <- substring( titles[ , 1 ], 
               1, nchar( titles[ , 1 ] ) - 7 )
titles$yr   <- substring( titles[ , 1 ], 
               nchar( titles[ , 1 ] ) - 5, nchar( titles[ , 1 ] ) )

原始数据可以去掉，也可以不去掉，视情况而定。

R中的拆分列

splitting column in R

brackets

r