在 python 中使用维基百科模块

Using wikipedia module in python

我在 python 代码中使用维基百科模块。我想从用户那里获得输入以从维基百科搜索并从其摘要中获取 2 行。由于可能有很多同名主题,我是这样使用的。

import wikipedia
value=input("Enter what u want to search")
m=wikipedia.search(value,3)
print(wikipedia.summary(m[0],sentences=2))

执行此操作时会显示大约 3 页异常。这有什么问题吗? 编辑: 正如@Ruperto 所建议的那样,我更改了代码。

import wikipedia
import random
value=input("Enter the words: ")
try:
    p=wikipedia.page(value)
    print(p)
except wikipedia.exceptions.DisambiguationError as e:
    s=random.choice(e.options)
    p=wikipedia.summary(s,sentences=2)
    print(p)

现在我得到的错误是,

Traceback (most recent call last):   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\connection.py", line 84, in create_connection
    raise err   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\connection.py", line 74, in create_connection
    sock.connect(sa) TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last):   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen
    chunked=chunked, urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x03AEEAF0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

现在怎么办?

这可能是由于 No/Poor 互联网连接,如您的错误所述,

A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

您可以change/check您的互联网连接,然后重试。也不是,这是您的 python 环境的问题。 我的实现是,

import warnings
warnings.filterwarnings("ignore")

import wikipedia
import random


value=input("Enter the words: ")
try:
    m=wikipedia.search(value,3)
    print(wikipedia.summary(m[0],sentences=2))
    # print(p)
except wikipedia.exceptions.DisambiguationError as e:
    s=random.choice(e.options)
    p=wikipedia.summary(s,sentences=2)
    print(p)

输出:

Enter the words: programming
Program management or programme management is the process of managing several related projects, often with the intention of improving an organization's performance. In practice and in its aims, program management is often closely related to systems engineering, industrial engineering, change management, and business transformation.

它在 google colab 中工作正常,我的实现 colab 文件你可以找到 here

以上错误是由于互联网连接问题造成的。但是下面的代码有效

value=input("Enter the words: ")
try:
    m=wikipedia.search(value,3)
    print(wikipedia.summary(m[0],sentences=2))
except wikipedia.exceptions.DisambiguationError as e:
    s=random.choice(e.options)
    p=wikipedia.summary(s,sentences=2)
    print(p)

然而,这里要注意的是,由于这是更大代码块的一部分,因此最好使用任何 NLP 库进行抽象或抽取式总结,因为维基百科包只使用 beautifulsoup 和 soupsieve用于网络抓取并以非摘要的方式还原仅有的几行。此外,维基百科上的内容每 2 小时更改一次

我遇到了类似的问题,经过大量 head-scratching 和谷歌搜索,找到了这个解决方案:

import wikipediaapi as api
import wikipedia as wk

# Wikipediaapi 'initialization'
wiki_wiki = api.Wikipedia('en')


# Getting fixed number of sentences from summary
def summary(pg, sentences=5):
    summ = pg.summary.split('. ')
    summ = '. '.join(summ[:sentences])
    summ += '.'
    return summ


s_term = 'apple'# Any term, ambiguous or not
wk_res = wk.search(s_term)
page = wiki_wiki.page(wk_res[0])
print("Page summary", summary(page))

基本上,据我所知,仅使用 wikipedia 模块无法获得好的解决方案。 例如,如果我要搜索 'India',我将永远无法获得印度这个国家/地区的页面,而这正是我想要的。 发生这种情况是因为印度(国家/地区)维基百科页面的标题只是标题 'India'。但是,由于它可以指代的事物的数量,该标题是无效的。这种情况也适用于很多其他事情。

但是,wiki_wiki_.page可以得到一个标题不明确的页面,这是这段代码所依赖的系统。