Python lxml 遍历 xml 文件时出错
Python lxml error when iterating through xml file
我有一个这样的 xml 文件:
<location type="journal">
???INSERT location???
<journal title="J. Gen. Virol.">
<volumn> 84 </volumn>
<page start="2305" end="2315"/>
<year> 2003 </year>
</journal>
</location>
我正在像这样遍历文件:
tree_out = etree.parse(xmlfile.xml)
updatedtext_head = '???UPDATE FROM '
insert_head = '???INSERT '
delete_head = '???DELETE '
updatedattrib_head = '???UPDATE '
updatedattrib_mid = ' FROM '
mark_end = '???'
every = 60
G = nx.DiGraph()
color_list=[]
node_text=[]
inserted_out=[]
deleted_out=[]
updatedtext_out=[]
others_out=[]
updatedattrib_out=[]
old_new_attrib_pairs=[]
full_texts=[]
for x in tree_out.iter():
for y in x.iterancestors():
if '???DELETE' in y.text and x not in deleted_out:
deleted_out.append(x)
if '???DELETE' in x.text and x not in deleted_out:
deleted_out.append(x)
for y in x.iterancestors():
if '???INSERT' in y.text and x not in inserted_out:
inserted_out.append(x)
if '???INSERT' in x.text and x not in inserted_out:
inserted_out.append(x)
if '???UPDATE FROM' in x.text and x not in updatedtext_out:
updatedtext_out.append(x)
if '???UPDATE ' in x.text and ' FROM ' in x.text and '???' in x.text and x not in updatedattrib_out and x not in updatedtext_out:
updatedattrib_out.append(x)
if (re.search(r'^\s+$', x.text)) and x not in others_out and x not in deleted_out and x not in inserted_out and x not in updatedtext_out and x not in updatedattrib_out:
others_out.append(x)
但是当我遇到这样的元素时:
<page start="2305" end="2315"/>
我被抛出这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-68-b66a7d063b5b> in <module>
121 deleted_out.append(x)
122
--> 123 if '???DELETE' in x.text and x not in deleted_out:
124 deleted_out.append(x)
125
TypeError: argument of type 'NoneType' is not iterable
预期的最终结果是我希望将列表中的元素分类到单独的列表中,就像我在上面的代码段中所做的那样。为什么会出现此错误,我该如何解决?
编辑
TypeError
是由仅属性元素引起的。具体来说,元素由变量 x
表示,代码测试 '???DELETE'
是否出现在 x.text
内,但 x.text
是 None
,因为 text
属性是存储元素内容的位置。作为参考,XML 元素具有以下结构:
<element-name attribute1 attribute2>content</element-name>
错误包含消息 argument of type 'NoneType' is not iterable
,因为 in
的语法为 value in iterable
。具体来说,x.text
必须是 iterable
.
在尝试像 str
一样使用它之前,您应该测试 x.test
不是 None
。
if x.text is not None and '???DELETE' in x.text and x not in deleted_out:
deleted_out.append(x)
原创
您从未声明过 deleted_out
。试试这个:
tree_out = etree.parse(xmlfile.xml)
deleted_out = []
for x in tree_out.iter():
for y in x.iterancestors():
if '???DELETE' in y.text and x not in deleted_out:
deleted_out.append(x)
我有一个这样的 xml 文件:
<location type="journal">
???INSERT location???
<journal title="J. Gen. Virol.">
<volumn> 84 </volumn>
<page start="2305" end="2315"/>
<year> 2003 </year>
</journal>
</location>
我正在像这样遍历文件:
tree_out = etree.parse(xmlfile.xml)
updatedtext_head = '???UPDATE FROM '
insert_head = '???INSERT '
delete_head = '???DELETE '
updatedattrib_head = '???UPDATE '
updatedattrib_mid = ' FROM '
mark_end = '???'
every = 60
G = nx.DiGraph()
color_list=[]
node_text=[]
inserted_out=[]
deleted_out=[]
updatedtext_out=[]
others_out=[]
updatedattrib_out=[]
old_new_attrib_pairs=[]
full_texts=[]
for x in tree_out.iter():
for y in x.iterancestors():
if '???DELETE' in y.text and x not in deleted_out:
deleted_out.append(x)
if '???DELETE' in x.text and x not in deleted_out:
deleted_out.append(x)
for y in x.iterancestors():
if '???INSERT' in y.text and x not in inserted_out:
inserted_out.append(x)
if '???INSERT' in x.text and x not in inserted_out:
inserted_out.append(x)
if '???UPDATE FROM' in x.text and x not in updatedtext_out:
updatedtext_out.append(x)
if '???UPDATE ' in x.text and ' FROM ' in x.text and '???' in x.text and x not in updatedattrib_out and x not in updatedtext_out:
updatedattrib_out.append(x)
if (re.search(r'^\s+$', x.text)) and x not in others_out and x not in deleted_out and x not in inserted_out and x not in updatedtext_out and x not in updatedattrib_out:
others_out.append(x)
但是当我遇到这样的元素时:
<page start="2305" end="2315"/>
我被抛出这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-68-b66a7d063b5b> in <module>
121 deleted_out.append(x)
122
--> 123 if '???DELETE' in x.text and x not in deleted_out:
124 deleted_out.append(x)
125
TypeError: argument of type 'NoneType' is not iterable
预期的最终结果是我希望将列表中的元素分类到单独的列表中,就像我在上面的代码段中所做的那样。为什么会出现此错误,我该如何解决?
编辑
TypeError
是由仅属性元素引起的。具体来说,元素由变量 x
表示,代码测试 '???DELETE'
是否出现在 x.text
内,但 x.text
是 None
,因为 text
属性是存储元素内容的位置。作为参考,XML 元素具有以下结构:
<element-name attribute1 attribute2>content</element-name>
错误包含消息 argument of type 'NoneType' is not iterable
,因为 in
的语法为 value in iterable
。具体来说,x.text
必须是 iterable
.
在尝试像 str
一样使用它之前,您应该测试 x.test
不是 None
。
if x.text is not None and '???DELETE' in x.text and x not in deleted_out:
deleted_out.append(x)
原创
您从未声明过 deleted_out
。试试这个:
tree_out = etree.parse(xmlfile.xml)
deleted_out = []
for x in tree_out.iter():
for y in x.iterancestors():
if '???DELETE' in y.text and x not in deleted_out:
deleted_out.append(x)