在给定范围内多次复制 xml 个元素

Question

我遇到过需要根据从 csv 文件接收到的数据多次复制 xml 元素的情况。我们将 csv 记录在字典列表中，我们正在获取 csv 文件的长度以及我们需要将相同元素复制到单个 head 标签中的次数。案例：如果我们有 len_csv 记录 14 并且元素数量需要将 4 复制到单个 head 标签中，即 14/4=3.5 意味着 4 个根标签将所有 14 个标签放入所有 4 个标签意味着最后一个根标签应该只有2 个标签。示例 xml：

<head_tag>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
</head_tag>

预期输出：-

<head_tag>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
</head_tag>

我得到了什么：

<head_tag>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
</head_tag>

现在我们应该有 14 个标签，每个根标签中有 4 个副本，即 3 个，最后一个根标签应该只有 2 个标签副本，如上所述。

代码：

from lxml import etree
from copy import deepcopy
src='abc.xml'
tree = etree.parse(src)
#Get the root element
root=tree.getroot()
#Get the namespace
nsmap = {k if k is not None else 'default':v for k,v in root.nsmap.items()}
check_value= lambda x: int(x) if x==int(x) else int(x)+1 
#let suppose length of csv_records is 14 which having list of dict
len_csv=14
def copy_element(tag_name, len_csv, num_of_copies, batches):

    #Copying the head tag as par the calculation in this case 4
    for name in root:
        for i in range(num_of_copies-1):
            new_name = deepcopy(name)
            name.addnext(new_name)

    k=batches-1
    #Copying the tag for xpath given in this case <a> tag as per the calculation
    for name in root.findall(tag_name, namespaces=nsmap):
        get_last_val=len_csv%num_of_copies
        for i in range(num_of_copies):
             if k==num_of_copies-1 and get_last_val!=0:
                batches=get_last_val
             if len_csv!=1:
                for j in range(batches-1):
                    if len_csv<=1:
                        break                        
                    new_name = deepcopy(name)
                    name.addnext(new_name)
                len_csv-=batches
                break
        k-=1

    #Caluclation how many head tags required for batch process
    head_tag_required=len_csv/4
    copy_element('root/body/values/ns:value/a', len_csv, check_value(head_tag_required),4)
    tree.write(src)

上面的代码正确地创建了 4 个标签，但是在每个 head 标签中创建了 4 次标签。我试图打破 findall() 条件，但其中 none 正在工作。

那么，我的问题是：

有什么办法可以解决上述问题。？我们如何为满足上述条件的给定 xpath 多次复制相同的元素
我们传递的 xpath 也不起作用，由于命名空间的原因，它无法找到元素，我们如何解决这个问题？上面显示的输出是元素没有任何命名空间..

Answer 1

你的问题有多个问题，这不适合 Stack Overflow 问题，因此我试图在这里解决一件事：

在树结构中生成多个分支：

import lxml
from lxml import etree
import xml.etree.ElementTree as ET
import copy

ns_dict={"ns": "http://schemas.org/"}

s = '''
<head_tag>
    <root xmlns:ns="http://schemas.org/">
        <body>
            <values>
                <ns:value>
                    <a>HI1</a>
                </ns:value>     
            </values>
        </body>
    </root>
</head_tag>
'''

n = 14
l = 4

lst_entries = [l if x*l<=n else n%l for x in range(1,1+round(n/l))]


root = lxml.etree.fromstring(s)

for i in range(0,round(n/l)):
    dupe = root.xpath(r"/head_tag/root/body/values/ns:value", namespaces=ns_dict)[0]
    for j in range(0, lst_entries[i]):
        dupe.append(etree.fromstring('<a>HI1</a>'))


print("\n".join(str(ET.tostring(root)).split()))

然后您可以根据需要计算停止值，如果您不想在树的末尾有一个精确的副本，我建议您在制作一个简单的副本后修改最后一个分支。您可以在此处查看该代码的位置：# remove the addtional <

在给定范围内多次复制 xml 个元素

Copy xml element multiple time in given range

python

xml

lxml