解析:字符串到 XML
Parsing : String to XML
我的 API 应该获取一个字符串并将其转换为 XML 格式。
但我一直收到此错误:
ParseError: mismatched tag: line 1, column 764
XML
<?xml version="1.0" encoding="utf-8" ?>
<MasterDetails IssuerId="5" Version="12.2">
<XMLRequest />
<BookingDetails Amount="768" Comment="Hotel Travel Purchase" CurrencyCode="INR" PurchaseType="Hotel" SupplierName="SomeHotel" CardAlias="C_ALIAS" ValidFor="-1D" CurrencyType="B" />
<CDFs>
<CDF FieldName="Order Date" FieldValue="2015-01-01" />
</CDFs>
<SomeTag>
<Rule Action="A" Alias="MyAlias">
<Controls>
<OPMCCControl Negate="False"/>
<OPMIDControl />
<SomeControlsTags CumulativeLimit="768" MaxTrans="None" Period="C" />
<ValidityPeriod ValidFrom="2015-01-01 00:00:00.0 +0000" ValidTo="2015-01-11 00:00:00.0 +0000" />
</Controls>
</Rule>
</SomeTag>
</BookingDetails>
<Email EmailAddress="T@J.COM"/>
<MasterDetails />
实现方式:
tree = ET.ElementTree(ET.fromstring(kk.strip()))
我确定我的 XML 字符串包含所有匹配的标签并且已格式化,但我眼前可能仍然缺少某些内容!!
BookingDetails
标签是 self-closed 这一行:
<BookingDetails Amount="768" Comment="Hotel Travel Purchase" CurrencyCode="INR" PurchaseType="Hotel" SupplierName="SomeHotel" CardAlias="C_ALIAS" ValidFor="-1D" CurrencyType="B" />
但是当有一个单独的关闭BookingDetails
元素时:
</BookingDetails>
此外,<MasterDetails />
最后一行没有正确关闭。应该是 </MasterDetails>
而不是 <MasterDetails />
.
请注意,如果使用 lxml.etree
,您可以在 "recover" mode 中解析此 XML:
import lxml.etree as ET
parser = ET.XMLParser(recover=True)
tree = ET.ElementTree(ET.fromstring(data, parser=parser))
或者,使用具有 xml
特征的 BeautifulSoup
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, "xml")
print(soup.prettify())
我的 API 应该获取一个字符串并将其转换为 XML 格式。
但我一直收到此错误:
ParseError: mismatched tag: line 1, column 764
XML
<?xml version="1.0" encoding="utf-8" ?>
<MasterDetails IssuerId="5" Version="12.2">
<XMLRequest />
<BookingDetails Amount="768" Comment="Hotel Travel Purchase" CurrencyCode="INR" PurchaseType="Hotel" SupplierName="SomeHotel" CardAlias="C_ALIAS" ValidFor="-1D" CurrencyType="B" />
<CDFs>
<CDF FieldName="Order Date" FieldValue="2015-01-01" />
</CDFs>
<SomeTag>
<Rule Action="A" Alias="MyAlias">
<Controls>
<OPMCCControl Negate="False"/>
<OPMIDControl />
<SomeControlsTags CumulativeLimit="768" MaxTrans="None" Period="C" />
<ValidityPeriod ValidFrom="2015-01-01 00:00:00.0 +0000" ValidTo="2015-01-11 00:00:00.0 +0000" />
</Controls>
</Rule>
</SomeTag>
</BookingDetails>
<Email EmailAddress="T@J.COM"/>
<MasterDetails />
实现方式:
tree = ET.ElementTree(ET.fromstring(kk.strip()))
我确定我的 XML 字符串包含所有匹配的标签并且已格式化,但我眼前可能仍然缺少某些内容!!
BookingDetails
标签是 self-closed 这一行:
<BookingDetails Amount="768" Comment="Hotel Travel Purchase" CurrencyCode="INR" PurchaseType="Hotel" SupplierName="SomeHotel" CardAlias="C_ALIAS" ValidFor="-1D" CurrencyType="B" />
但是当有一个单独的关闭BookingDetails
元素时:
</BookingDetails>
此外,<MasterDetails />
最后一行没有正确关闭。应该是 </MasterDetails>
而不是 <MasterDetails />
.
请注意,如果使用 lxml.etree
,您可以在 "recover" mode 中解析此 XML:
import lxml.etree as ET
parser = ET.XMLParser(recover=True)
tree = ET.ElementTree(ET.fromstring(data, parser=parser))
或者,使用具有 xml
特征的 BeautifulSoup
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, "xml")
print(soup.prettify())