Stanfordnlp.download() 失败：[Errno -2] 名称或服务未知'))

Question

我只是尝试运行 stanfordnlp 自己给出的例子：

>>> import stanfordnlp
>>> stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

但是，我无法这样做，收到以下错误：

ConnectionError: HTTPSConnectionPool(host='nlp.stanford.edu', port=443): Max retries exceeded with url: /software/stanfordnlp_models/latest/en_ewt_models.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8f5dba7f10>: Failed to establish a new connection: [Errno -2] Name or service not known'))

为什么会这样？我看到这是一个问题 on their github，但他们表示这是由于服务器问题造成的，而且已经解决了。我该如何解决这个错误？谢谢

Answer 1

stanfordnlp 包现已弃用。我们将其重命名为 Stanza 以获得最新版本。您应该按照此处的说明进行操作：https://stanfordnlp.github.io/stanza/。按照相应的步骤，刚才对我来说效果很好：

>>> import stanza
>>> stanza.download('en') # download English model
>>> nlp = stanza.Pipeline('en') # initialize English neural pipeline
>>> doc = nlp("Barack Obama was born in Hawaii.") # run annotation over a sentence
>>> print(doc.entities)

也就是说，更多细节是：

此错误是由于无法从我们的实验室机器下载模型数据文件造成的。他们有时会沮丧。第二天再试。现在这样做，模型下载成功了（如果有点慢）。
stanfordnlp 与最新版本的 PyTorch 不兼容。如果您看到错误 RuntimeError: Integer division of tensors using div or / is no longer supported，那么您要么需要切换到 stanza，要么将您的 PyTorch 版本降级到 1.5 或更早版本
Stanza 从 GitHub 而不是从我们的实验室机器下载大型模型数据文件，因此 Stanza 模型数据文件下载应该更可靠。但如果您在访问 GitHub 时遇到问题，请参阅 https://stanfordnlp.github.io/stanza/faq.html#getting-requestsexceptionsconnectionerror-when-downloading-models

Stanfordnlp.download() 失败：[Errno -2] 名称或服务未知'))

Stanfordnlp.download() fails: [Errno -2] Name or service not known'))

nlp

stanford-nlp

data-science