如何从文本中提取国家？

Question

我使用 Python 3（我也安装了 Python 2），我想从短文本中提取国家或城市。例如，text = "I live in Spain" 或 text = "United States (New York), United Kingdom (London)".

国家答案：

西班牙
[美国、英国]

我尝试安装 geography，但无法运行 pip install geography。我收到此错误：

Collecting geography Could not find a version that satisfies the requirement geography (from versions: ) No matching distribution found for geography

看起来 geography 只适用于 Python 2.

我也有 geopandas，但我不知道如何使用 geopandas 从文本中提取所需的信息。

Answer 1

您可以使用 pycountry 来完成您的任务（它也适用于 python 3）：

pip 安装 pycountry

import pycountry
text = "United States (New York), United Kingdom (London)"
for country in pycountry.countries:
    if country.name in text:
        print(country.name)

Answer 2

此库有一个支持 python3 的更新版本，名为 geograpy3

pip install geograpy3

它允许您从 URL 或文本中提取地名，并为这些名称添加上下文——例如区分国家、地区或城市。

示例：

import geograpy
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

您可以在 this link:

下找到更多详细信息

如何从文本中提取国家？

How to extract countries from a text?

python

geography

nltk

python-3.x