为什么这会返回一个 NoneType?
Why is this returning a NoneType?
我正在尝试使用下面的函数从 Wikipedia 中抓取信息,但我 运行 遇到 属性错误 因为函数调用正在返回None。有人可以尝试解释为什么返回 None 吗?
import wikipedia as wp
import string
def add_section_info(search):
HTML = wp.page(search).html().encode("UTF-8") #gets HTML source from Wikipedia
with open("temp.xml",'w') as t: #write HTML to xml format
t.write(HTML)
table_of_contents = []
dict_of_section_info = {}
#This extracts the info in the table of contents
with open("temp.xml",'r') as r:
for line in r:
if "toclevel" in line:
new_string = line.partition("#")[2]
content_title = new_string.partition("\"")[0]
tbl = string.maketrans("_"," ")
content_title = content_title.translate(tbl)
table_of_contents.append(content_title)
print wp.page(search).section("Aortic rupture") #this is None, but shouldn't be
for item in table_of_contents:
section = wp.page(search).section(item).encode("UTF-8")
print section
if section == "":
continue
else:
dict_of_section_info[item] = section
with open("Section_Info.txt",'a') as sect:
sect.write(search)
sect.write("------------------------------------------\n")
for item in dict_of_section_info:
sect.write(item)
sect.write("\n\n")
sect.write(dict_of_section_info[item])
sect.write("####################################\n\n")
add_section_info("Abdominal aortic aneurysm")
我不明白的是,如果我 运行 add_section_info("HIV")
,例如,它工作得很好。
导入维基百科的源代码是here
我对上面代码的输出是这样的:
Abdominal aortic aneurysm
Signs and symptoms
Traceback (most recent call last):
File "/home/pharoslabsllc/Documents/wikitest.py", line 79, in <module>
add_section_info(line)
File "/home/pharoslabsllc/Documents/wikitest.py", line 30, in add_section_info
section = wp.page(search).section(item).encode("UTF-8")
AttributeError: 'NoneType' object has no attribute 'encode'
wp.page(search).section(item)
没有找到您要查找的部分,returns None
。您不检查它并尝试将值作为字符串处理;这预计会失败。
page
方法从不 return None
(您可以在源代码中轻松查看),但是 section
方法 会 return None
如果找不到标题。见 documentation:
section(section_title)
Get the plain text content of a section from self.sections
.
Returns None
if section_title
isn’t found, otherwise returns a whitespace stripped string.
所以答案是你所指的维基百科页面没有标题为 Aortic rupture
的部分,就图书馆而言。
查看维基百科本身,页面 Abdominal aortic aneurysm 似乎确实有这样的部分。
请注意,如果您尝试检查 wp.page(search).sections
的值是什么,您会得到:[]
。 IE。 似乎库没有正确解析这些部分。
从找到的库源代码here可以看到这个测试:
section = u"== {} ==".format(section_title)
try:
index = self.content.index(section) + len(section)
except ValueError:
return None
但是:
In [14]: p.content.find('Aortic')
Out[14]: 3223
In [15]: p.content[3220:3220+50]
Out[15]: '== Aortic ruptureEdit ===\n\nThe signs and symptoms '
In [16]: p.section('Aortic ruptureEdit')
Out[16]: "The signs and symptoms of a ruptured AAA may includes severe pain in the lower back, flank, abdomen or groin. A mass that pulses with the heart beat may also be felt. The bleeding can leads to a hypovolemic shock with low blood pressure and a fast heart rate. This may lead to brief passing out.\nThe mortality of AAA rupture is up to 90%. 65–75% of patients die before they arrive at hospital and up to 90% die before they reach the operating room. The bleeding can be retroperitoneal or into the abdominal cavity. Rupture can also create a connection between the aorta and intestine or inferior vena cava. Flank ecchymosis (appearance of a bruise) is a sign of retroperitoneal bleeding, and is also called Grey Turner's sign.\nAortic aneurysm rupture may be mistaken for the pain of kidney stones, muscle related back pain."
注意 Edit ==
。换句话说,该库有一个错误,它没有考虑到要编辑的 link。
相同的代码适用于 HIV 的页面,因为在该页面中,标题旁边没有 edit
link。我不知道为什么会这样,无论如何它看起来像是库的错误或缺点,所以你应该在它的问题跟踪器上打开一个票。
与此同时,您可以使用如下简单的修复方法:
def find_section(page, title):
res = page.section(title)
if res is None:
res = page.section(title + 'Edit')
return res
并使用此函数而不是使用 .section
方法。但是,这只能是临时修复。
我正在尝试使用下面的函数从 Wikipedia 中抓取信息,但我 运行 遇到 属性错误 因为函数调用正在返回None。有人可以尝试解释为什么返回 None 吗?
import wikipedia as wp
import string
def add_section_info(search):
HTML = wp.page(search).html().encode("UTF-8") #gets HTML source from Wikipedia
with open("temp.xml",'w') as t: #write HTML to xml format
t.write(HTML)
table_of_contents = []
dict_of_section_info = {}
#This extracts the info in the table of contents
with open("temp.xml",'r') as r:
for line in r:
if "toclevel" in line:
new_string = line.partition("#")[2]
content_title = new_string.partition("\"")[0]
tbl = string.maketrans("_"," ")
content_title = content_title.translate(tbl)
table_of_contents.append(content_title)
print wp.page(search).section("Aortic rupture") #this is None, but shouldn't be
for item in table_of_contents:
section = wp.page(search).section(item).encode("UTF-8")
print section
if section == "":
continue
else:
dict_of_section_info[item] = section
with open("Section_Info.txt",'a') as sect:
sect.write(search)
sect.write("------------------------------------------\n")
for item in dict_of_section_info:
sect.write(item)
sect.write("\n\n")
sect.write(dict_of_section_info[item])
sect.write("####################################\n\n")
add_section_info("Abdominal aortic aneurysm")
我不明白的是,如果我 运行 add_section_info("HIV")
,例如,它工作得很好。
导入维基百科的源代码是here
我对上面代码的输出是这样的:
Abdominal aortic aneurysm
Signs and symptoms
Traceback (most recent call last):
File "/home/pharoslabsllc/Documents/wikitest.py", line 79, in <module>
add_section_info(line)
File "/home/pharoslabsllc/Documents/wikitest.py", line 30, in add_section_info
section = wp.page(search).section(item).encode("UTF-8")
AttributeError: 'NoneType' object has no attribute 'encode'
wp.page(search).section(item)
没有找到您要查找的部分,returns None
。您不检查它并尝试将值作为字符串处理;这预计会失败。
page
方法从不 return None
(您可以在源代码中轻松查看),但是 section
方法 会 return None
如果找不到标题。见 documentation:
section(section_title)
Get the plain text content of a section from
self.sections
. ReturnsNone
ifsection_title
isn’t found, otherwise returns a whitespace stripped string.
所以答案是你所指的维基百科页面没有标题为 Aortic rupture
的部分,就图书馆而言。
查看维基百科本身,页面 Abdominal aortic aneurysm 似乎确实有这样的部分。
请注意,如果您尝试检查 wp.page(search).sections
的值是什么,您会得到:[]
。 IE。 似乎库没有正确解析这些部分。
从找到的库源代码here可以看到这个测试:
section = u"== {} ==".format(section_title)
try:
index = self.content.index(section) + len(section)
except ValueError:
return None
但是:
In [14]: p.content.find('Aortic')
Out[14]: 3223
In [15]: p.content[3220:3220+50]
Out[15]: '== Aortic ruptureEdit ===\n\nThe signs and symptoms '
In [16]: p.section('Aortic ruptureEdit')
Out[16]: "The signs and symptoms of a ruptured AAA may includes severe pain in the lower back, flank, abdomen or groin. A mass that pulses with the heart beat may also be felt. The bleeding can leads to a hypovolemic shock with low blood pressure and a fast heart rate. This may lead to brief passing out.\nThe mortality of AAA rupture is up to 90%. 65–75% of patients die before they arrive at hospital and up to 90% die before they reach the operating room. The bleeding can be retroperitoneal or into the abdominal cavity. Rupture can also create a connection between the aorta and intestine or inferior vena cava. Flank ecchymosis (appearance of a bruise) is a sign of retroperitoneal bleeding, and is also called Grey Turner's sign.\nAortic aneurysm rupture may be mistaken for the pain of kidney stones, muscle related back pain."
注意 Edit ==
。换句话说,该库有一个错误,它没有考虑到要编辑的 link。
相同的代码适用于 HIV 的页面,因为在该页面中,标题旁边没有 edit
link。我不知道为什么会这样,无论如何它看起来像是库的错误或缺点,所以你应该在它的问题跟踪器上打开一个票。
与此同时,您可以使用如下简单的修复方法:
def find_section(page, title):
res = page.section(title)
if res is None:
res = page.section(title + 'Edit')
return res
并使用此函数而不是使用 .section
方法。但是,这只能是临时修复。