在 lxml 树上使用 copy.deepcopy() 时出现多重重复
Multiple duplication when using copy.deepcopy() on lxml tree
假设我有一个原始的 lxml 树如下:
my_data.xml
<?xml version="1.0" encoding="UTF-8"?>
<data>
<country name="Liechtenstein" xmlns="aaa:bbb:ccc:liechtenstein:eee">
<rank updated="yes">2</rank>
<holidays>
<christmas>Yes</christmas>
</holidays>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
<continent>Asia</continent>
<holidays>
<christmas>Yes</christmas>
</holidays>
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
<ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
<malay>
<holidays>
<ramadan>Yes</ramadan>
</holidays>
</malay>
</ethnicity>
</data>
正在解析:
xt = etree.parse("my_data.xml")
xr = xt.getroot()
现在我想创建一个重复树列表。在此示例中,我创建了一个包含 3 个重复树的列表:
f_list = [1, 2, 3]
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]
除了这 3 棵树,我还有一个 ramadan
节点列表,每个节点都属于单独的树。
现在我想复制这 3 棵新树中每棵树中的 ramadan
节点,并将其分别附加到它们。
for i in range(3):
new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
ramadan_parent = ramadan_nodes[i].getparent()
position = ramadan_parent.index(ramadan_nodes[i]) + 1
ramadan_parent.insert(position, new_ramadan_node)
如上所述,我打算在每棵树中只有一个重复的 ramadan
节点。但是,根据 运行 该代码,3 个重复的树中的每一个都包含四个 ramadan
个节点(1 个是原始节点,3 个是由上面的 for
循环添加的)。
为什么会这样?另外,我注意到如果我想打印列表 ramadan
节点:
print(ramadan_nodes)
我得到这些数字 Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>
重复了 3 次如下:
[<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>,
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>,
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>]
这个数字是多少0x203b4f849c0
?我怀疑这与这里的多重重复有关。如果有人可以帮助解释。谢谢。
下面是完整的连续代码:
import copy
import lxml.etree as etree
file_path = "my_data.xml"
xt = etree.parse(file_path)
xr = xt.getroot()
f_list = [1, 2, 3]
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]
for i in range(3):
new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
ramadan_parent = ramadan_nodes[i].getparent()
position = ramadan_parent.index(ramadan_nodes[i]) + 1
ramadan_parent.insert(position, new_ramadan_node)
print(ramadan_nodes)
etree.dump(xroots[0])
etree.dump(xroots[1])
etree.dump(xroots[2])
更新:
如果我替换这两行:
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
与
xtrees = []
xroots = []
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
我得到了预期的输出。似乎 copy.deepcopy
在 list
中使用时不会产生不同的对象?为什么会这样?
您正在使用 *
运算符:
xtrees = [copy.deepcopy(xt)] * len(f_list)
这不会创建副本;它创建对原始 xt
对象的引用。
要获得实际副本,您可以按照以下步骤操作:
xtrees = [copy.deepcopy(xt) for _ in range(len(f_list))]
相关信息:
- https://docs.python.org/3/faq/programming.html#how-do-i-create-a-multidimensional-list
- Repeat a list within a list X number of times
假设我有一个原始的 lxml 树如下:
my_data.xml
<?xml version="1.0" encoding="UTF-8"?>
<data>
<country name="Liechtenstein" xmlns="aaa:bbb:ccc:liechtenstein:eee">
<rank updated="yes">2</rank>
<holidays>
<christmas>Yes</christmas>
</holidays>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
<continent>Asia</continent>
<holidays>
<christmas>Yes</christmas>
</holidays>
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
<ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
<malay>
<holidays>
<ramadan>Yes</ramadan>
</holidays>
</malay>
</ethnicity>
</data>
正在解析:
xt = etree.parse("my_data.xml")
xr = xt.getroot()
现在我想创建一个重复树列表。在此示例中,我创建了一个包含 3 个重复树的列表:
f_list = [1, 2, 3]
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]
除了这 3 棵树,我还有一个 ramadan
节点列表,每个节点都属于单独的树。
现在我想复制这 3 棵新树中每棵树中的 ramadan
节点,并将其分别附加到它们。
for i in range(3):
new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
ramadan_parent = ramadan_nodes[i].getparent()
position = ramadan_parent.index(ramadan_nodes[i]) + 1
ramadan_parent.insert(position, new_ramadan_node)
如上所述,我打算在每棵树中只有一个重复的 ramadan
节点。但是,根据 运行 该代码,3 个重复的树中的每一个都包含四个 ramadan
个节点(1 个是原始节点,3 个是由上面的 for
循环添加的)。
为什么会这样?另外,我注意到如果我想打印列表 ramadan
节点:
print(ramadan_nodes)
我得到这些数字 Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>
重复了 3 次如下:
[<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>,
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>,
<Element {aaa:bbb:ccc:ethnicity:eee}ramadan at 0x203b4f849c0>]
这个数字是多少0x203b4f849c0
?我怀疑这与这里的多重重复有关。如果有人可以帮助解释。谢谢。
下面是完整的连续代码:
import copy
import lxml.etree as etree
file_path = "my_data.xml"
xt = etree.parse(file_path)
xr = xt.getroot()
f_list = [1, 2, 3]
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
ramadan_nodes = [xtree.find('.//{*}ramadan') for xtree in xtrees]
for i in range(3):
new_ramadan_node = copy.deepcopy(ramadan_nodes[i])
ramadan_parent = ramadan_nodes[i].getparent()
position = ramadan_parent.index(ramadan_nodes[i]) + 1
ramadan_parent.insert(position, new_ramadan_node)
print(ramadan_nodes)
etree.dump(xroots[0])
etree.dump(xroots[1])
etree.dump(xroots[2])
更新:
如果我替换这两行:
xtrees = [copy.deepcopy(xt)] * len(f_list)
xroots = [xtree.getroot() for xtree in xtrees]
与
xtrees = []
xroots = []
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
xtrees.append(copy.deepcopy(xt))
xroots.append(xtrees[-1].getroot())
我得到了预期的输出。似乎 copy.deepcopy
在 list
中使用时不会产生不同的对象?为什么会这样?
您正在使用 *
运算符:
xtrees = [copy.deepcopy(xt)] * len(f_list)
这不会创建副本;它创建对原始 xt
对象的引用。
要获得实际副本,您可以按照以下步骤操作:
xtrees = [copy.deepcopy(xt) for _ in range(len(f_list))]
相关信息:
- https://docs.python.org/3/faq/programming.html#how-do-i-create-a-multidimensional-list
- Repeat a list within a list X number of times