参考 pandas 数据框替换 XML 中的元素
Replace elements in XML with reference to pandas dataframe
我正在使用 lxml 读取我的 xml 文件:
tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\SalesTransactionCustom.xml')
并得到一个 xml 文件,如:
<?xml version="1.0" encoding="UTF-8"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
<ApplicationArea>
<Sender>
<LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
<ComponentID>External</ComponentID>
<ConfirmationCode>OnError</ConfirmationCode>
</Sender>
<CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
<BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&verb=Process</BODID>
</ApplicationArea>
<DataArea>
<Process>
<TenantID>xxx</TenantID>
<AccountingEntityID>4710</AccountingEntityID>
<LocationID>S_4710</LocationID>
<ActionCriteria>
<ActionExpression actionCode="Add"/>
</ActionCriteria>
</Process>
<SalesTransactionCustom>
<FinancialBatch>
<TransactionDate>2019-09-27T00:00:00</TransactionDate>
<BatchReference>KUKS_20190928052427</BatchReference>
</FinancialBatch>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001160</Invoice>
<BusinessPartner>417B00</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>152248.80</Amount>
<Currency>EUR</Currency>
<TaxCountry>DK</TaxCountry>
<TaxCode>BESIT</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>417B00 PC210LCI-11</TransactionReference>
<LedgerAccount>50000400</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>152248.80</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001161</Invoice>
<BusinessPartner>412600</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>113848.17</Amount>
<Currency>EUR</Currency>
<TaxCountry>AT</TaxCountry>
<TaxCode>GBSI</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>412600 PC210NLC-11</TransactionReference>
<LedgerAccount>50000400</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>113848.17</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
</SalesTransactionCustom>
</DataArea>
</ProcessSalesTransactionCustom>
我有一个 pandas 数据框(这里第一行是列名):
Tag Old Value New Value
BusinessPartner 417B00 BPE000104
BusinessPartner 412600 BPE000153
LedgerAccount 50000400 108092200
我想参考此 pandas 数据框替换 xml 中元素的属性。我希望能够找到标签和旧值的组合,并将属性替换为新值。我还需要能够将编辑后的文本作为 XML.
写回磁盘
如何使用 lxml 和 pandas 执行此操作?
提前致谢
编辑:感谢@Partha Mandal
,这是有效的代码
import pandas as pd
from lxml import etree
df=pd.read_excel("Sample.xlsx")
df.columns=['Tag','Old','New']
df['Old'] = df['Old'].astype(str)
df['New'] = df['New'].astype(str)
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\testxml2.xml',parser)
string = etree.tostring(tree)
string = bytes.decode(string)
tag = df.Tag; old = df.Old; new = df.New
for i in range(len(tag)):
string = string.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")
string=str.encode(string)
root = etree.fromstring(string)
my_tree = etree.ElementTree(root)
with open('testxml2.xml', 'wb') as f:
f.write(etree.tostring(my_tree))
为什么不直接将 XML
作为字符串读取并执行 str.replace
?
tag = df.Tag; old = df.Old; new = df.New
for i in range(len(tag)):
_str = _str.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")
因为您使用 lxml
,请考虑 XSLT,这种专用语言旨在将 XML 文件转换为不同的 XML 并支持从顶层传递参数,例如Python。因此,在跨数据帧记录的循环中集成参数化:
唯一的挑战是将 Tag 的所有唯一值硬编码到 XSLT 的第二个模板匹配中(管道后的换行符很好):
doc:BusinessPartner|doc:LedgerAccount
你可以用
"|".join(['doc:'+ val for val in df['Tag'].unique()])
"|\n".join(['doc:'+ val for val in df['Tag'].unique()])
XSLT (另存为.xsl,一个特殊的.xml文件)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://schema.xxxx.com/xxxxx/2">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- PARAMETERS -->
<xsl:param name="tag" />
<xsl:param name="old_value" />
<xsl:param name="new_value" />
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- CONDITIONALL ASSIGN PARAMS -->
<xsl:template match="doc:BusinessPartner|doc:LedgerAccount">
<xsl:choose>
<xsl:when test = "text() = $old_value">
<xsl:copy>
<xsl:value-of select="$new_value"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:value-of select="text()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Python
import pandas as pd
import lxml.etree as et
df = pd.read_csv(...)
# LOAD XML AND XSL SCRIPT
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')
transform = et.XSLT(xsl)
# PASS PARAMETER TO XSLT
df_list = df.to_dict('records')
for v in df_list:
result = transform(xml, tag = et.XSLT.strparam(v['Tag']),
old_value = et.XSLT.strparam(v['Old Value']),
new_value = et.XSLT.strparam(v['New Value']))
xml = result
# SAVE TO NEW XML
with open("Output.xml", 'wb') as f:
f.write(result)
XML输出
<?xml version="1.0"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
<ApplicationArea>
<Sender>
<LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
<ComponentID>External</ComponentID>
<ConfirmationCode>OnError</ConfirmationCode>
</Sender>
<CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
<BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&verb=Process</BODID>
</ApplicationArea>
<DataArea>
<Process>
<TenantID>infor</TenantID>
<AccountingEntityID>4710</AccountingEntityID>
<LocationID>S_4710</LocationID>
<ActionCriteria>
<ActionExpression actionCode="Add"/>
</ActionCriteria>
</Process>
<SalesTransactionCustom>
<FinancialBatch>
<TransactionDate>2019-09-27T00:00:00</TransactionDate>
<BatchReference>KUKS_20190928052427</BatchReference>
</FinancialBatch>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001160</Invoice>
<BusinessPartner>BPE000104</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>152248.80</Amount>
<Currency>EUR</Currency>
<TaxCountry>DK</TaxCountry>
<TaxCode>BESIT</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>417B00 PC210LCI-11</TransactionReference>
<LedgerAccount>108092200</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>152248.80</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001161</Invoice>
<BusinessPartner>BPE000153</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>113848.17</Amount>
<Currency>EUR</Currency>
<TaxCountry>AT</TaxCountry>
<TaxCode>GBSI</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>412600 PC210NLC-11</TransactionReference>
<LedgerAccount>108092200</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>113848.17</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
</SalesTransactionCustom>
</DataArea>
</ProcessSalesTransactionCustom>
我正在使用 lxml 读取我的 xml 文件:
tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\SalesTransactionCustom.xml')
并得到一个 xml 文件,如:
<?xml version="1.0" encoding="UTF-8"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
<ApplicationArea>
<Sender>
<LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
<ComponentID>External</ComponentID>
<ConfirmationCode>OnError</ConfirmationCode>
</Sender>
<CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
<BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&verb=Process</BODID>
</ApplicationArea>
<DataArea>
<Process>
<TenantID>xxx</TenantID>
<AccountingEntityID>4710</AccountingEntityID>
<LocationID>S_4710</LocationID>
<ActionCriteria>
<ActionExpression actionCode="Add"/>
</ActionCriteria>
</Process>
<SalesTransactionCustom>
<FinancialBatch>
<TransactionDate>2019-09-27T00:00:00</TransactionDate>
<BatchReference>KUKS_20190928052427</BatchReference>
</FinancialBatch>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001160</Invoice>
<BusinessPartner>417B00</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>152248.80</Amount>
<Currency>EUR</Currency>
<TaxCountry>DK</TaxCountry>
<TaxCode>BESIT</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>417B00 PC210LCI-11</TransactionReference>
<LedgerAccount>50000400</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>152248.80</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001161</Invoice>
<BusinessPartner>412600</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>113848.17</Amount>
<Currency>EUR</Currency>
<TaxCountry>AT</TaxCountry>
<TaxCode>GBSI</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>412600 PC210NLC-11</TransactionReference>
<LedgerAccount>50000400</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>113848.17</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
</SalesTransactionCustom>
</DataArea>
</ProcessSalesTransactionCustom>
我有一个 pandas 数据框(这里第一行是列名):
Tag Old Value New Value
BusinessPartner 417B00 BPE000104
BusinessPartner 412600 BPE000153
LedgerAccount 50000400 108092200
我想参考此 pandas 数据框替换 xml 中元素的属性。我希望能够找到标签和旧值的组合,并将属性替换为新值。我还需要能够将编辑后的文本作为 XML.
写回磁盘如何使用 lxml 和 pandas 执行此操作?
提前致谢
编辑:感谢@Partha Mandal
,这是有效的代码import pandas as pd
from lxml import etree
df=pd.read_excel("Sample.xlsx")
df.columns=['Tag','Old','New']
df['Old'] = df['Old'].astype(str)
df['New'] = df['New'].astype(str)
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\testxml2.xml',parser)
string = etree.tostring(tree)
string = bytes.decode(string)
tag = df.Tag; old = df.Old; new = df.New
for i in range(len(tag)):
string = string.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")
string=str.encode(string)
root = etree.fromstring(string)
my_tree = etree.ElementTree(root)
with open('testxml2.xml', 'wb') as f:
f.write(etree.tostring(my_tree))
为什么不直接将 XML
作为字符串读取并执行 str.replace
?
tag = df.Tag; old = df.Old; new = df.New
for i in range(len(tag)):
_str = _str.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")
因为您使用 lxml
,请考虑 XSLT,这种专用语言旨在将 XML 文件转换为不同的 XML 并支持从顶层传递参数,例如Python。因此,在跨数据帧记录的循环中集成参数化:
唯一的挑战是将 Tag 的所有唯一值硬编码到 XSLT 的第二个模板匹配中(管道后的换行符很好):
doc:BusinessPartner|doc:LedgerAccount
你可以用
"|".join(['doc:'+ val for val in df['Tag'].unique()])
"|\n".join(['doc:'+ val for val in df['Tag'].unique()])
XSLT (另存为.xsl,一个特殊的.xml文件)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://schema.xxxx.com/xxxxx/2">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- PARAMETERS -->
<xsl:param name="tag" />
<xsl:param name="old_value" />
<xsl:param name="new_value" />
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- CONDITIONALL ASSIGN PARAMS -->
<xsl:template match="doc:BusinessPartner|doc:LedgerAccount">
<xsl:choose>
<xsl:when test = "text() = $old_value">
<xsl:copy>
<xsl:value-of select="$new_value"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:value-of select="text()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Python
import pandas as pd
import lxml.etree as et
df = pd.read_csv(...)
# LOAD XML AND XSL SCRIPT
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')
transform = et.XSLT(xsl)
# PASS PARAMETER TO XSLT
df_list = df.to_dict('records')
for v in df_list:
result = transform(xml, tag = et.XSLT.strparam(v['Tag']),
old_value = et.XSLT.strparam(v['Old Value']),
new_value = et.XSLT.strparam(v['New Value']))
xml = result
# SAVE TO NEW XML
with open("Output.xml", 'wb') as f:
f.write(result)
XML输出
<?xml version="1.0"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
<ApplicationArea>
<Sender>
<LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
<ComponentID>External</ComponentID>
<ConfirmationCode>OnError</ConfirmationCode>
</Sender>
<CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
<BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&verb=Process</BODID>
</ApplicationArea>
<DataArea>
<Process>
<TenantID>infor</TenantID>
<AccountingEntityID>4710</AccountingEntityID>
<LocationID>S_4710</LocationID>
<ActionCriteria>
<ActionExpression actionCode="Add"/>
</ActionCriteria>
</Process>
<SalesTransactionCustom>
<FinancialBatch>
<TransactionDate>2019-09-27T00:00:00</TransactionDate>
<BatchReference>KUKS_20190928052427</BatchReference>
</FinancialBatch>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001160</Invoice>
<BusinessPartner>BPE000104</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>152248.80</Amount>
<Currency>EUR</Currency>
<TaxCountry>DK</TaxCountry>
<TaxCode>BESIT</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>417B00 PC210LCI-11</TransactionReference>
<LedgerAccount>108092200</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>152248.80</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
<TransactionHeader>
<TransactionType>HEI</TransactionType>
<SalesInvoice>
<Invoice>19001161</Invoice>
<BusinessPartner>BPE000153</BusinessPartner>
<DocumentDate>2019-09-27T00:00:00</DocumentDate>
<DueDate>2019-11-20T00:00:00</DueDate>
<Amount>113848.17</Amount>
<Currency>EUR</Currency>
<TaxCountry>AT</TaxCountry>
<TaxCode>GBSI</TaxCode>
<NonFinalizedTransaction>
<TransactionReference>412600 PC210NLC-11</TransactionReference>
<LedgerAccount>108092200</LedgerAccount>
<Dimension1>100</Dimension1>
<Dimension2>KUK</Dimension2>
<Dimension3/>
<Dimension4/>
<Dimension5/>
<Dimension6/>
<Dimension7/>
<Dimension8/>
<TaxAmount>0.00</TaxAmount>
<DebitCreditFlag>credit</DebitCreditFlag>
<Amount>113848.17</Amount>
</NonFinalizedTransaction>
</SalesInvoice>
</TransactionHeader>
</SalesTransactionCustom>
</DataArea>
</ProcessSalesTransactionCustom>