根据字符位置在 R 中拆分字符串
Split String In R Based On Character Location
我正在尝试将 R 中的这些字符串(列条目)拆分为三个单独的列:
João Moutinho Monaco, 30, M(C)
Clinton N'Jie Marseille, 23, FW
Frederic Sammaritano Dijon, 30, AM(LR)
成为
Player Team Pos
João Moutinho Monaco 30, M(C)
Clinton N'Jie Marseille 23, FW
Frederic Sammaritano Dijon 30, AM(LR)
我可以使用 gregexpr 和 nchar 找到字符的位置,但我不确定如何使用 strsplit。或者也许另一个包更容易?
在使用 gsub
创建定界符后,我们可以使用 read.csv
将向量读入 data.frame
read.csv(text=gsub("^(\S+\s+\S+)\s+(\S+),\s+(.*)",
"\1;\2;\3", v1), sep=";", header=FALSE,
col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30, M(C)
#2 Clinton N'Jie Marseille 23, FW
#3 Frederic Sammaritano Dijon 30, AM(LR)
更新
如果我们有更多模式并且 "Team" 名称只有一个单词(即在第一个 ',' 之前)
read.csv(text= sub("(\s+[A-Za-z]+),(\s+\d+),(.*)", ";\1;\2\3", v2),
header=FALSE, sep=";", col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30 M(C)
#2 Clinton N'Jie Marseille 23 FW
#3 Frederic Sammaritano Dijon 30 AM(LR)
#4 Angel Di María PSG 28 M(CLR)
#5 Jean Michael Seri Nice 25 M(C)
数据
v1 <- c("João Moutinho Monaco, 30, M(C)", "Clinton N'Jie Marseille, 23, FW",
"Frederic Sammaritano Dijon, 30, AM(LR)")
v2 <- c(v1, "Angel Di María PSG, 28, M(CLR)","Jean Michael Seri Nice, 25, M(C)")
来自 stringr
、
的 word
方法
library(stringr)
data.frame(Player = word(v1, 1, 2),
Team = sub(',','' ,word(v1, 3)),
Pos = word(v1, 4, 6), stringsAsFactors = FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30, M(C)
#2 Clinton N'Jie Marseille 23, FW
#3 Frederic Sammaritano Dijon 30, AM(LR)
我正在尝试将 R 中的这些字符串(列条目)拆分为三个单独的列:
João Moutinho Monaco, 30, M(C)
Clinton N'Jie Marseille, 23, FW
Frederic Sammaritano Dijon, 30, AM(LR)
成为
Player Team Pos
João Moutinho Monaco 30, M(C)
Clinton N'Jie Marseille 23, FW
Frederic Sammaritano Dijon 30, AM(LR)
我可以使用 gregexpr 和 nchar 找到字符的位置,但我不确定如何使用 strsplit。或者也许另一个包更容易?
在使用 gsub
read.csv
将向量读入 data.frame
read.csv(text=gsub("^(\S+\s+\S+)\s+(\S+),\s+(.*)",
"\1;\2;\3", v1), sep=";", header=FALSE,
col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30, M(C)
#2 Clinton N'Jie Marseille 23, FW
#3 Frederic Sammaritano Dijon 30, AM(LR)
更新
如果我们有更多模式并且 "Team" 名称只有一个单词(即在第一个 ',' 之前)
read.csv(text= sub("(\s+[A-Za-z]+),(\s+\d+),(.*)", ";\1;\2\3", v2),
header=FALSE, sep=";", col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30 M(C)
#2 Clinton N'Jie Marseille 23 FW
#3 Frederic Sammaritano Dijon 30 AM(LR)
#4 Angel Di María PSG 28 M(CLR)
#5 Jean Michael Seri Nice 25 M(C)
数据
v1 <- c("João Moutinho Monaco, 30, M(C)", "Clinton N'Jie Marseille, 23, FW",
"Frederic Sammaritano Dijon, 30, AM(LR)")
v2 <- c(v1, "Angel Di María PSG, 28, M(CLR)","Jean Michael Seri Nice, 25, M(C)")
来自 stringr
、
word
方法
library(stringr)
data.frame(Player = word(v1, 1, 2),
Team = sub(',','' ,word(v1, 3)),
Pos = word(v1, 4, 6), stringsAsFactors = FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30, M(C)
#2 Clinton N'Jie Marseille 23, FW
#3 Frederic Sammaritano Dijon 30, AM(LR)