从数据框中的字符串中拆分数字
Splitting numerals from string in data frame
我在 R 中有一个数据框,其中有一列如下所示:
Venue
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007
为了使数据框的使用稍微容易一些,我想将场地列分成两列,位置和年份,如下所示:
Location Year
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007
我尝试了 cSplit()
函数的各种变体来实现此目的:
df = cSplit(df, "Venue", " ") #worked somewhat, however issues with places with multiple words (e.g. Los Angeles, Rio de Janeiro)
df = cSplit(df, "Venue", "[:digit:]")
df = cSplit(df, "Venue,", "[0-9]+")
到目前为止,None 对我有用。如果有人能指出正确的方向,我将不胜感激。
最简单的方法是使用自动向量化的stringr
library(stringr)
df[,1:2] <- str_split(df$Venue, pattern = "\s+(?=\d)", simplify = TRUE)
colnames(df) <- c('Location', 'Year')
或 str_split_fixed
str_split_fixed(df$Venue, pattern = "\s+(?=\d)", 2)
你也可以用 base R
df[,1:2] <- do.call(rbind, strsplit(df$Venue, split = "\s+(?=\d)", perl = TRUE))
colnames(df) <- c('Location', 'Year')
这个怎么样?
d <- data.frame(Venue = c("AAA 2001", "BBB 2016", "CCC 1996", "cc d 2001"),
stringsAsFactors = FALSE)
d$Location <- gsub("[[:digit:]]", "", d$Venue)
d$Year <- gsub("[^[:digit:]]", "", d$Venue)
d
# Venue Location Year
# 1 AAA 2001 AAA 2001
# 2 BBB 2016 BBB 2016
# 3 CCC 1996 CCC 1996
# 4 cc d 2001 cc d 2001
我在 R 中有一个数据框,其中有一列如下所示:
Venue
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007
为了使数据框的使用稍微容易一些,我想将场地列分成两列,位置和年份,如下所示:
Location Year
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007
我尝试了 cSplit()
函数的各种变体来实现此目的:
df = cSplit(df, "Venue", " ") #worked somewhat, however issues with places with multiple words (e.g. Los Angeles, Rio de Janeiro)
df = cSplit(df, "Venue", "[:digit:]")
df = cSplit(df, "Venue,", "[0-9]+")
到目前为止,None 对我有用。如果有人能指出正确的方向,我将不胜感激。
最简单的方法是使用自动向量化的stringr
library(stringr)
df[,1:2] <- str_split(df$Venue, pattern = "\s+(?=\d)", simplify = TRUE)
colnames(df) <- c('Location', 'Year')
或 str_split_fixed
str_split_fixed(df$Venue, pattern = "\s+(?=\d)", 2)
你也可以用 base R
df[,1:2] <- do.call(rbind, strsplit(df$Venue, split = "\s+(?=\d)", perl = TRUE))
colnames(df) <- c('Location', 'Year')
这个怎么样?
d <- data.frame(Venue = c("AAA 2001", "BBB 2016", "CCC 1996", "cc d 2001"),
stringsAsFactors = FALSE)
d$Location <- gsub("[[:digit:]]", "", d$Venue)
d$Year <- gsub("[^[:digit:]]", "", d$Venue)
d
# Venue Location Year
# 1 AAA 2001 AAA 2001
# 2 BBB 2016 BBB 2016
# 3 CCC 1996 CCC 1996
# 4 cc d 2001 cc d 2001