在 lxml 树上使用 copy.deepcopy() 时出现多重重复

Multiple duplication when using copy.deepcopy() on lxml tree

假设我有一个原始的 lxml 树如下:

my_data.xml

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <country name="Liechtenstein" xmlns="aaa:bbb:ccc:liechtenstein:eee">
    <rank updated="yes">2</rank>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <year>2008</year>
    <gdppc>141100</gdppc>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
  </country>
  <country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
    <continent>Asia</continent>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <rank updated="yes">5</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
  </country>
  <country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
    <rank updated="yes">69</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
  </country>
  <ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
    <malay>
      <holidays>
        <ramadan>Yes</ramadan>
      </holidays>
    </malay>
  </ethnicity>
</data>

正在解析:

xt = etree.parse("my_data.xml")
xr = xt.getroot()

现在我想创建一个重复树列表。在此示例中,我创建了一个包含 3 个重复树的列表:

f_list = [1, 2, 3]

xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]

除了这 3 棵树,我还有一个 ramadan 节点列表,每个节点都属于单独的树。 现在我想复制这 3 棵新树中每棵树中的 ramadan 节点,并将其分别附加到它们。

for i in range(3):
    new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
    ramadan_parent = ramadan_nodes[i].getparent()
    position = ramadan_parent.index(ramadan_nodes[i]) + 1
    ramadan_parent.insert(position, new_ramadan_node)

如上所述,我打算在每棵树中只有一个重复的 ramadan 节点。但是,根据 运行 该代码,3 个重复的树中的每一个都包含四个 ramadan 个节点(1 个是原始节点,3 个是由上面的 for 循环添加的)。

为什么会这样?另外,我注意到如果我想打印列表 ramadan 节点:

print(ramadan_nodes)

我得到这些数字 Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0> 重复了 3 次如下:

[<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>, 
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>, 
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>]

这个数字是多少0x203b4f849c0?我怀疑这与这里的多重重复有关。如果有人可以帮助解释。谢谢。

下面是完整的连续代码:

import copy
import lxml.etree as etree

file_path = "my_data.xml"
xt = etree.parse(file_path)
xr = xt.getroot()

f_list = [1, 2, 3]

xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]

for i in range(3):
    new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
    ramadan_parent = ramadan_nodes[i].getparent()
    position = ramadan_parent.index(ramadan_nodes[i]) + 1
    ramadan_parent.insert(position, new_ramadan_node)

print(ramadan_nodes)
etree.dump(xroots[0])
etree.dump(xroots[1])
etree.dump(xroots[2])

更新:

如果我替换这两行:

xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]

xtrees = []
xroots = []
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())

我得到了预期的输出。似乎 copy.deepcopylist 中使用时不会产生不同的对象?为什么会这样?

您正在使用 * 运算符:

xtrees = [copy.deepcopy(xt)] * len(f_list)

这不会创建副本;它创建对原始 xt 对象的引用。

要获得实际副本,您可以按照以下步骤操作:

xtrees = [copy.deepcopy(xt) for _ in range(len(f_list))]

相关信息: