lxml 动态获取元素名称,即使对于嵌套的元素也是如此
lxml get element names dynamically even for the ones which are nested
我有以下xml
<?xml version="1.0" encoding="UTF-8"?><gudid xmlns="http://www.fda.gov/cdrh/gudid" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://www.fda.gov/cdrh/gudid gudid.xsd">
<device xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.fda.gov/cdrh/gudid">
<publicDeviceRecordKey>7c36b446-020c-44ab-9ce7-a85387467e0f</publicDeviceRecordKey>
<publicVersionStatus>New</publicVersionStatus>
<deviceRecordStatus>Published</deviceRecordStatus>
<identifiers>
<identifier>
<deviceId>M930756120810</deviceId>
<deviceIdType>Primary</deviceIdType>
<deviceIdIssuingAgency>HIBCC</deviceIdIssuingAgency>
<containsDINumber xsi:nil="true"></containsDINumber>
<pkgQuantity xsi:nil="true"></pkgQuantity>
<pkgDiscontinueDate xsi:nil="true"></pkgDiscontinueDate>
<pkgStatus xsi:nil="true"></pkgStatus>
<pkgType xsi:nil="true"></pkgType>
</identifier>
</identifiers>
<brandName>Life Instruments</brandName>
<gmdnTerms>
<gmdn>
<gmdnPTName>Orthopaedic knife</gmdnPTName>
<gmdnPTDefinition>A hand-held manual surgical instrument designed for cutting/shaping bone during an orthopaedic surgical intervention. It is typically a heavy, one-piece instrument with a sharp, single-edged, strong cutting blade at the distal end available in various shapes and sizes, with a handle at the proximal end. It is normally made of high-grade stainless steel. This is a reusable device.</gmdnPTDefinition>
</gmdn>
</gmdnTerms>
<productCodes>
<fdaProductCode>
<productCode>LXH</productCode>
<productCodeName>Orthopedic Manual Surgical Instrument</productCodeName>
</fdaProductCode>
</productCodes>
<deviceSizes/>
<environmentalConditions/>
</device>
</gudid>
我用 lxml 解析这个 xml
with open("sample.xml", encoding="utf-8") as f:
xml = f.read().encode()
root = objectify.fromstring(xml)
当我尝试遍历 xml 时,出现以下问题
for event, element in etree.iterwalk(root, events=("start", "end")):
if event == "start":
print(event, element.tag, element.text)
start {http://www.fda.gov/cdrh/gudid}publicDeviceRecordKey 7c36b446-020c-44ab-9ce7-a85387467e0f
start {http://www.fda.gov/cdrh/gudid}publicVersionStatus New
start {http://www.fda.gov/cdrh/gudid}deviceRecordStatus Published
start {http://www.fda.gov/cdrh/gudid}identifiers None
....
start {http://www.fda.gov/cdrh/gudid}device None
start {http://www.fda.gov/cdrh/gudid}publicDeviceRecordKey be401033-96bf-46ec-8ac0-b2ce302d2b11
start {http://www.fda.gov/cdrh/gudid}sterilizationMethod Moist Heat or Steam Sterilization
每个元素似乎都有一个我不想要的名称空间。
我宁愿想要一个来自 xml
的平面列表
publicDeviceRecordKey deviceId gmdnPtName .....
7c36b446-020c-44ab-9ce7-a85387467e0f M930756120810 Orthopaedic knife
请问如何删除 element.tag 中的命名空间?
如果想访问没有命名空间定义的标签名,可以像
这样获取元素的localname
etree.QName(element).localname
您可以打印本地名称并查看
print(event, etree.QName(element).localname, element.text)
我有以下xml
<?xml version="1.0" encoding="UTF-8"?><gudid xmlns="http://www.fda.gov/cdrh/gudid" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://www.fda.gov/cdrh/gudid gudid.xsd">
<device xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.fda.gov/cdrh/gudid">
<publicDeviceRecordKey>7c36b446-020c-44ab-9ce7-a85387467e0f</publicDeviceRecordKey>
<publicVersionStatus>New</publicVersionStatus>
<deviceRecordStatus>Published</deviceRecordStatus>
<identifiers>
<identifier>
<deviceId>M930756120810</deviceId>
<deviceIdType>Primary</deviceIdType>
<deviceIdIssuingAgency>HIBCC</deviceIdIssuingAgency>
<containsDINumber xsi:nil="true"></containsDINumber>
<pkgQuantity xsi:nil="true"></pkgQuantity>
<pkgDiscontinueDate xsi:nil="true"></pkgDiscontinueDate>
<pkgStatus xsi:nil="true"></pkgStatus>
<pkgType xsi:nil="true"></pkgType>
</identifier>
</identifiers>
<brandName>Life Instruments</brandName>
<gmdnTerms>
<gmdn>
<gmdnPTName>Orthopaedic knife</gmdnPTName>
<gmdnPTDefinition>A hand-held manual surgical instrument designed for cutting/shaping bone during an orthopaedic surgical intervention. It is typically a heavy, one-piece instrument with a sharp, single-edged, strong cutting blade at the distal end available in various shapes and sizes, with a handle at the proximal end. It is normally made of high-grade stainless steel. This is a reusable device.</gmdnPTDefinition>
</gmdn>
</gmdnTerms>
<productCodes>
<fdaProductCode>
<productCode>LXH</productCode>
<productCodeName>Orthopedic Manual Surgical Instrument</productCodeName>
</fdaProductCode>
</productCodes>
<deviceSizes/>
<environmentalConditions/>
</device>
</gudid>
我用 lxml 解析这个 xml
with open("sample.xml", encoding="utf-8") as f:
xml = f.read().encode()
root = objectify.fromstring(xml)
当我尝试遍历 xml 时,出现以下问题
for event, element in etree.iterwalk(root, events=("start", "end")):
if event == "start":
print(event, element.tag, element.text)
start {http://www.fda.gov/cdrh/gudid}publicDeviceRecordKey 7c36b446-020c-44ab-9ce7-a85387467e0f
start {http://www.fda.gov/cdrh/gudid}publicVersionStatus New
start {http://www.fda.gov/cdrh/gudid}deviceRecordStatus Published
start {http://www.fda.gov/cdrh/gudid}identifiers None
....
start {http://www.fda.gov/cdrh/gudid}device None
start {http://www.fda.gov/cdrh/gudid}publicDeviceRecordKey be401033-96bf-46ec-8ac0-b2ce302d2b11
start {http://www.fda.gov/cdrh/gudid}sterilizationMethod Moist Heat or Steam Sterilization
每个元素似乎都有一个我不想要的名称空间。 我宁愿想要一个来自 xml
的平面列表publicDeviceRecordKey deviceId gmdnPtName .....
7c36b446-020c-44ab-9ce7-a85387467e0f M930756120810 Orthopaedic knife
请问如何删除 element.tag 中的命名空间?
如果想访问没有命名空间定义的标签名,可以像
这样获取元素的localnameetree.QName(element).localname
您可以打印本地名称并查看
print(event, etree.QName(element).localname, element.text)