从字符串中提取地方分部并将其转换为 R 中的国家/地区名称

Extract subnational division from string and convert it into country name in R

我有一系列仅包含省 names/subnational 分区名称的字符串,我想将其转换为 R 中的国家名称向量。使用 countrycode 包提取国家名称相对容易,但是我没有看到使用该软件包将省份名称转换为国家/地区的方法。

例如:

provinces <- c("The governor of Florida", "The Premier of Ontario", "Jalisco has a province-wide policy")

我希望有一种方法可以将 provinces 向量转换为类似于 c("United States of America", "Canada", "Mexico") 的向量。

从上面的评论中,我意识到您可以在 countrycode 中使用自定义词典,它允许您合并地方数据。

编辑:

这是一个完全可重现的例子,因为最后一个例子没有完全起作用:

require(countrycode)
require(choroplethrAdmin1)

# example data
provinces <- c("The governor of Florida", "Tim Stevenson leads Oxfordshire", "Gobierno del Estado de Hidalgo")

# remove punctuation
provinces <- gsub("[[:punct:]\n]", "", provinces)

# load administrative division dictionary
data(admin1.regions)

# remove duplicate region names (countrycode function only accepts unique names)
admin1.regions <- admin1.regions[!duplicated(admin1.regions$region),]

# convert provinces to country
provinces_to_country <- countrycode(provinces, "region", "country", custom_dict = admin1.regions, origin_regex = TRUE) 

旧的,不可重现的例子:

require(countrycode)
require(choroplethrAdmin1)

# example data
provinces <- c("The governor of Florida", "The Premier of Ontario", "Jalisco has a province-wide policy")

# remove punctuation
provinces <- gsub("[[:punct:]\n]", "", provinces)

# load administrative division dictionary
data(admin1.regions)

# remove duplicate region names (countrycode function only accepts unique names)
admin1.regions <- admin1.regions[!duplicated(admin1.regions$region),]

# convert provinces to country
provinces_to_country <- countrycode(provinces, "region", "country", custom_dict = admin1.regions, origin_regex = TRUE)