lxml 动态获取元素名称，即使对于嵌套的元素也是如此

Question

我有以下xml

<?xml version="1.0" encoding="UTF-8"?><gudid xmlns="http://www.fda.gov/cdrh/gudid" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://www.fda.gov/cdrh/gudid gudid.xsd">
<device xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.fda.gov/cdrh/gudid">
  <publicDeviceRecordKey>7c36b446-020c-44ab-9ce7-a85387467e0f</publicDeviceRecordKey>
  <publicVersionStatus>New</publicVersionStatus>
  <deviceRecordStatus>Published</deviceRecordStatus>
  <identifiers>
    <identifier>
      <deviceId>M930756120810</deviceId>
      <deviceIdType>Primary</deviceIdType>
      <deviceIdIssuingAgency>HIBCC</deviceIdIssuingAgency>
      <containsDINumber xsi:nil="true"></containsDINumber>
      <pkgQuantity xsi:nil="true"></pkgQuantity>
      <pkgDiscontinueDate xsi:nil="true"></pkgDiscontinueDate>
      <pkgStatus xsi:nil="true"></pkgStatus>
      <pkgType xsi:nil="true"></pkgType>
    </identifier>
  </identifiers>
  <brandName>Life Instruments</brandName>
  <gmdnTerms>
    <gmdn>
      <gmdnPTName>Orthopaedic knife</gmdnPTName>
      <gmdnPTDefinition>A hand-held manual surgical instrument designed for cutting/shaping bone during an orthopaedic surgical intervention. It is typically a heavy, one-piece instrument with a sharp, single-edged, strong cutting blade at the distal end available in various shapes and sizes, with a handle at the proximal end. It is normally made of high-grade stainless steel. This is a reusable device.</gmdnPTDefinition>
    </gmdn>
  </gmdnTerms>
  <productCodes>
    <fdaProductCode>
      <productCode>LXH</productCode>
      <productCodeName>Orthopedic Manual Surgical Instrument</productCodeName>
    </fdaProductCode>
  </productCodes>
  <deviceSizes/>
  <environmentalConditions/>
</device>
</gudid>

我用 lxml 解析这个 xml

with open("sample.xml", encoding="utf-8") as f:          
    xml = f.read().encode()                              
                                                         
root = objectify.fromstring(xml)

当我尝试遍历 xml 时，出现以下问题

for event, element in etree.iterwalk(root, events=("start", "end")):   
    if event == "start":                                               
        print(event, element.tag, element.text)   

                 
start {http://www.fda.gov/cdrh/gudid}publicDeviceRecordKey 7c36b446-020c-44ab-9ce7-a85387467e0f
start {http://www.fda.gov/cdrh/gudid}publicVersionStatus New
start {http://www.fda.gov/cdrh/gudid}deviceRecordStatus Published
start {http://www.fda.gov/cdrh/gudid}identifiers None
....
start {http://www.fda.gov/cdrh/gudid}device None
start {http://www.fda.gov/cdrh/gudid}publicDeviceRecordKey be401033-96bf-46ec-8ac0-b2ce302d2b11
start {http://www.fda.gov/cdrh/gudid}sterilizationMethod Moist Heat or Steam Sterilization

每个元素似乎都有一个我不想要的名称空间。我宁愿想要一个来自 xml

的平面列表

publicDeviceRecordKey                 deviceId             gmdnPtName .....
7c36b446-020c-44ab-9ce7-a85387467e0f  M930756120810        Orthopaedic knife

请问如何删除 element.tag 中的命名空间？

Answer 1

如果想访问没有命名空间定义的标签名，可以像

这样获取元素的localname

etree.QName(element).localname

您可以打印本地名称并查看

print(event, etree.QName(element).localname, element.text)

lxml 动态获取元素名称，即使对于嵌套的元素也是如此

lxml get element names dynamically even for the ones which are nested

python

lxml