添加新元素后,lxml 库不向树添加换行符或缩进

lxml library not adding newlines or indentation to tree after adding new element

标题不言自明,在将其标记为重复之前,请考虑我已经检查过 this answer 但它对我不起作用,因为我什至没有在 sys.stdout 中获得正确的格式不仅在写入文件时。所以我有以下 xml (test.xml):

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
        </DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

以及以下代码:

from lxml import etree

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("test.xml", parser)

def get_data_fields():
    for node in tree.iter():
        if 'DataFields' in node.tag:
            return node
a = get_data_fields()
field = etree.Element('Field_1')
child_1 = etree.Element('FieldName')
child_2 = etree.Element('FieldValue')
child_3 = etree.Element('FieldIndex')
child_1.text = 'dateTime'
child_2.text = '2016-07-29T12:00:00'
child_3.text = '1'

for i in [child_1, child_2, child_3]:
    field.append(i)
a.append(field)

s = etree.tostring(tree, pretty_print=True)
print(s.decode('utf-8'))

输出

<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
        <Field_1><FieldName>dateTime</FieldName><FieldValue>2016-07-29T12:00:00</FieldValue><FieldIndex>1</FieldIndex></Field_1></DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

预期

<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
          <Field_1>
            <FieldName>dateTime</FieldName>
            <FieldValue>2016-07-29T12:00:00</FieldValue>
            <FieldIndex>1</FieldIndex>
          </Field_1>
        </DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

我真的不明白为什么我添加的新字段没有按预期格式化,因为如果我只打印 field,一切看起来都很好:

s = etree.tostring(root, pretty_print=True)
print(s.decode('utf-8'))

#<Field_1 xmlns="http://www." xmlns:soap="http://www...">
#  <FieldName>dateTime</FieldName>
#  <FieldValue>2016-07-29T12:00:00</FieldValue>
#  <FieldIndex>1</FieldIndex>
#</Field_1>

注意:我正在使用 python 3.4(这就是我必须 .decode('utf-8') 的原因,否则我只会得到字节文字)。

如果在 a = get_data_fields():

之后添加此行,它会起作用
a.text = None

lxml 不能总是确定哪些空格是可忽略的,因此在某些情况下需要手动删除空格。

参见http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output

If you want to be sure all blank text is removed from an XML document (or just more blank text than the parser does by itself), you have to use either a DTD to tell the parser which whitespace it can safely ignore, or remove the ignorable whitespace manually after parsing, e.g. by setting all tail text to None: