python 的维基百科 API

Wikipedia API for python

我正在尝试使用 Wikipedia API for python 查看维基百科页面中的 table 内容。这是我的代码。

>>> import wikipedia
>>> ny = wikipedia.page("New York")
>>> ny.sections

但结果是一个空列表 []。当我进入页面查看时,我可以看到内容table中有内容。除此以外,文档中所说的所有其他内容似乎都有效。我是 python 的新手,背景是 java。

当前版本的维基百科 API python 库中存在错误。您可以通过 lucasdnd on github 安装一个分支来解决这个问题:

pip install git+https://github.com/lucasdnd/Wikipedia.git

(如果您已经安装,则可以--upgrade

现在:

>>> import wikipedia
>>> ny = wikipedia.page("New York")
>>> ny.sections
[u'History', u'16th century', u'17th century', u'18th century, the American Revolution, and statehood', u'19th century', u'Immigration', u'September 11, 2001 attacks', u'Hurricane Sandy, 2012', u'Geography', u'Climate', u'Statescape', u'Regions', u'Adjacent geographic entities', u'State parks', u'National parks', u'Administrative divisions', u'Demographics', u'Population', u'Most populous counties', u'Major cities', u'Metropolitan areas', u'Racial and ancestral makeup', u'Languages', u'Religion', u'LGBT', u'Economy', u'Wall Street', u'Silicon Alley', u'Microelectronic hardware and photographic processing', u'Media and entertainment', u'Tourism', u'Exports', u'Education', u'Transportation', u'Government and politics', u'Government', u'Capital punishment', u'Federal representation', u'Politics', u'Sports', u'See also', u'References', u'Further reading', u'External links'] 

希望很快 fixed in the main library

我遇到了同样的问题。由于已经快 3 年了,而且看起来不会修复,我创建了另一个简单的库 - Wikipedia-API.

import wikipediaapi

wiki = wikipediaapi.Wikipedia('en')
mutcd = wiki.page('Comparison of MUTCD-Influenced Traffic Signs')
print("\n".join([s.title for s in mutcd.sections]))

输出:

Places
Media and entertainment
Sports
Ships
Other uses
See also

最新版本有类似bug

>>> wikipedia.summary('Creativity')
PageError: Page id "creatity" does not match any pages. Try another id!
>>> wikipedia.page('Creativity')
PageError: Page id "creatity" does not match any pages. Try another id!
>>> wikipedia.suggest('Creativity')
'creatity'
>>> wikipedia.search('Creativity')
['Creativity',
 'Creativity (religion)',
 'Creativity and mental health',
...
PageError: Page id "creatity" does not match any pages. Try another id!
>>> wikipedia.page('creativity')
PageError: Page id "creatity" does not match any pages. Try another id!

小写等没有帮助,但添加“(religion)”限定符有帮助,除非您不是在寻找宗教页面。

深入研究源代码和维基百科 API,我发现是维基百科的 suggest API 返回了无效的页面标题建议。 如果您确定您的页面标题(“纽约”)存在,您可以关闭 auto_suggest

>>> wikipedia.page('Creativity', auto_suggest=False)
<WikipediaPage 'Creativity'>
>>> wikipedia.page('New York', auto_suggest=False)
DisambiguationError: "New York" may refer to: 
New York City
New York (state)
...
>>> wikipedia.page('New York City', auto_suggest=False)
<WikipediaPage 'New York City'>

并且在过去 6 个月中有几个实施修复的拉取请求,但 none 已经过审查:https://github.com/goldsmith/Wikipedia/pull/305