Python lxml 遍历 xml 文件时出错

Question

我有一个这样的 xml 文件：

<location type="journal">
???INSERT location???
    <journal title="J. Gen. Virol.">
        <volumn> 84 </volumn>
        <page start="2305" end="2315"/>
        <year> 2003 </year>
    </journal>
</location>

我正在像这样遍历文件：

tree_out = etree.parse(xmlfile.xml)
updatedtext_head = '???UPDATE FROM '
insert_head = '???INSERT '
delete_head = '???DELETE '

updatedattrib_head = '???UPDATE '
updatedattrib_mid = ' FROM '
mark_end = '???'

every = 60

G = nx.DiGraph()


color_list=[]


node_text=[]


inserted_out=[]


deleted_out=[]


updatedtext_out=[]


others_out=[]


updatedattrib_out=[]


old_new_attrib_pairs=[]


full_texts=[]

for x in tree_out.iter():
        
    for y in x.iterancestors():
        if '???DELETE' in y.text and x not in deleted_out:
            deleted_out.append(x)

    if '???DELETE' in x.text and x not in deleted_out:
            deleted_out.append(x)

    for y in x.iterancestors():
        if '???INSERT' in y.text and x not in inserted_out:
            inserted_out.append(x)

    if '???INSERT' in x.text and x not in inserted_out:
            inserted_out.append(x)

    if '???UPDATE FROM' in x.text and x not in updatedtext_out:
            updatedtext_out.append(x)

    if  '???UPDATE ' in x.text and ' FROM ' in x.text and '???' in x.text and x not in updatedattrib_out and x not in updatedtext_out:
            updatedattrib_out.append(x)

    if (re.search(r'^\s+$', x.text)) and x not in others_out and x not in deleted_out and x not in inserted_out and x not in updatedtext_out and x not in updatedattrib_out:
        others_out.append(x)

但是当我遇到这样的元素时：

<page start="2305" end="2315"/>

我被抛出这个错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-b66a7d063b5b> in <module>
    121             deleted_out.append(x)
    122 
--> 123     if '???DELETE' in x.text and x not in deleted_out:
    124             deleted_out.append(x)
    125 

TypeError: argument of type 'NoneType' is not iterable

预期的最终结果是我希望将列表中的元素分类到单独的列表中，就像我在上面的代码段中所做的那样。为什么会出现此错误，我该如何解决？

Answer 1

编辑

TypeError 是由仅属性元素引起的。具体来说，元素由变量 x 表示，代码测试 '???DELETE' 是否出现在 x.text 内，但 x.text 是 None，因为 text 属性是存储元素内容的位置。作为参考，XML 元素具有以下结构：

<element-name attribute1 attribute2>content</element-name>

错误包含消息 argument of type 'NoneType' is not iterable，因为 in 的语法为 value in iterable。具体来说，x.text 必须是 iterable.

在尝试像 str 一样使用它之前，您应该测试 x.test 不是 None。

if x.text is not None and '???DELETE' in x.text and x not in deleted_out:
    deleted_out.append(x)

原创

您从未声明过 deleted_out。试试这个：

tree_out = etree.parse(xmlfile.xml)
deleted_out = []

for x in tree_out.iter():
        
    for y in x.iterancestors():
        if '???DELETE' in y.text and x not in deleted_out:
            deleted_out.append(x)

Python lxml 遍历 xml 文件时出错

Python lxml error when iterating through xml file

python

xml

lxml

编辑

原创