如果有多个单词，则在逗号后提取字符串中的最后一个单词，否则提取第一个单词

Question

我有数据里面的话如下

 location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
 id<- c(1,2,3)
 df<-data.frame(location,id)

我想从数据中提取国家名称。棘手的部分是，如果我只提取最后一个词，那么我将只有一个记录（法国）。

library(stringr)
df$country<- word(df$location,-1)

关于如何从这些数据中提取国家/地区数据有什么想法吗？

 id  location                      country
  1   xyz, sss, New Zealand        New Zealand
  2   USA                          USA
  3   Pris,France                  France

Answer 1

你可以试试sub

 df$country <- sub('.*,\s*', '', df$location)
 df$country
 #[1] "New Zealand" "USA"         "France"

或

 library(stringr)
 str_extract(df$location, '\b[^,]+$')
 #[1] "New Zealand" "USA"         "France"

Answer 2

stringi 解决方案：

require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA"         "France"

stri_trim 删除不必要的空格 before/after 国家名称。

如果有多个单词，则在逗号后提取字符串中的最后一个单词，否则提取第一个单词

Extract last word in a string after comma if there are multiple words else the first word

r

string-matching

stringr

stringi