如何根据 R 中的正则表达式匹配创建新的列数据

How do I create new column data based on regex match in R

我有一些推文作者位置数据,我希望将其重新分类到国家/地区。例如,获取美国的矢量 'states' 我想检查(正则表达式)匹配项并在国家/地区列中添加“美国”条目。

示例数据:

states = c("CA", "OH", "FL", "TX", "MN") # all the states
tweets$location = data.frame("my bed", "Minneapolis, MN", "Paris, France", "Los Angeles, CA")

我尝试过的:

# This seems to do the matching part well
filter(str_detect(location, paste(usa_data$Code, collapse = "|")))

# nested for loop
for (i in length(tweets$location)){
  for (state in states){
    if (grepl(state, tweets$location[i])){
      tweets$country[i] = "USA"
      break
    }
  }
}

期望的输出(基于示例输入):

tweets$country = data.frame(NA, "USA", NA, "USA")

我对 R 比较陌生,因此非常感谢任何帮助。

我们可以将 greplifelse 一起用于基础 R 解决方案:

states = c("CA", "OH", "FL", "TX", "MN") # all the states
tweets$location = data.frame("my bed", "Minneapolis, MN", "Paris, France", "Los Angeles, CA")
regex <- paste0("\b(?:", paste(states, collapse="|"), ")\b")
tweets$country <- ifelse(grepl(regex, tweets$location), "USA", NA)

如果您更喜欢 dplyr 解决方案,但与 Tim 的回答非常相似

library(dplyr)
states <- c("CA", "OH", "FL", "TX", "MN") # all the states


tweets <- tibble(location = c(
  "my bed", "Minneapolis, MN", "Paris, France",
  "Los Angeles, CA"
))

tweets %>%
  mutate(country = if_else(stringr::str_detect(
   string =  location,
   pattern = paste0(
      "\b(?:", paste(states,
        collapse = "|"
      ),
      ")\b"
    )
  ),
  "United States", "NA"
  ))