使用 lxml 解析 XML 中的多嵌套子项时出现问题
Issues Parsing Multi Nested Childs in XML using lxml
我在解析 xml 文件中的每个子节点时遇到问题。每个 Instrument_Root 的节点数可以更改。例如,Instrument_Watch 在这里是 NULL,但会在这之后的其他实例中填充。我的目标是让每个子节点单独解析(Instrument_Ratings、Instrument_Attribute_Ratings、Instrument_Organization、Instrument_Supports 等)
我尝试执行以下操作,但它只是重复返回第一个实例 - 文件中有 3700 个 Instrument_Root,Instrument_Rating 这个 Instrument_Root 重复了 3700 次.由于命名空间,我也 运行 遇到了 etree 的错误。
from lxml import objectify
xml = objectify.parse(file)
root = xml.getroot()
tree1 = []
tree2 = []
tree3 = []
tree4 = []
for children in range(len(root.getchildren())):
tree1.append([child.text for child in root.getchildren()[children].iterchildren()])
for children2 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.getchildren():
tree2.append([child2.text for child2 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.getchildren()])
for children3 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.Instrument_Rating_Attributes.Instrument_Rating_Attribute.getchildren():
tree3.append([child3.text for child3 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.Instrument_Rating_Attributes.Instrument_Rating_Attribute.getchildren()])
for children4 in root.Instrument_Root.Instrument_Organizations.Instrument_Organization.getchildren():
tree4.append([child4.text for child4 in root.Instrument_Root.Instrument_Organizations.Instrument_Organization.getchildren()])
XML:
<?xml version="1.0" encoding="utf-8"?>
<Instrument_Roots xmlns="http://www.XXXXX.com" xmlns:xsi="http://www.XXXXXXX.XMLSchema-instance" file_type="Baseline" frequency="Hourly-12" generation_time="2020-04-06T12:00:00Z">
<Instrument_Root>
<Instrument_ID>831295951</Instrument_ID>
<Deal_number>831275547</Deal_number>
<Class_Code>18705</Class_Code>
<Class_Text>Pass-Through</Class_Text>
<Class_Short_Description>PAS</Class_Short_Description>
<Dated_Date>2020-03-21T00:00:00</Dated_Date>
<ISO_Currency_Code>AUD</ISO_Currency_Code>
<Currency_Multiple_Indicator>N</Currency_Multiple_Indicator>
<Maturity_Date>2051-03-21T00:00:00</Maturity_Date>
<Maturity_Year>2051</Maturity_Year>
<Sale_Date>2020-03-21T00:00:00</Sale_Date>
<Face_Amount_USD>7.2534316791</Face_Amount_USD>
<Credit_Linked_Indicator>N</Credit_Linked_Indicator>
<Takedown_Indicator>N</Takedown_Indicator>
<Security_Description>Class B</Security_Description>
<Instrument_Type_Code>24657</Instrument_Type_Code>
<Instrument_Type_Text>PASS-THRU CTFS</Instrument_Type_Text>
<Private_Placement_Code>24922</Private_Placement_Code>
<Private_Placement_Text>Not Applicable</Private_Placement_Text>
<Coupon_Type_Code>26</Coupon_Type_Code>
<Coupon_Type_Text>Floating</Coupon_Type_Text>
<Coupon_Type_Short_Description>FLT</Coupon_Type_Short_Description>
<Coupon_Frequency_Code>16</Coupon_Frequency_Code>
<Coupon_Frequency_Text>Monthly</Coupon_Frequency_Text>
<Coupon_Frequency_Short_Description>MON</Coupon_Frequency_Short_Description>
<Coupon_Rate xsi:nil="true"/>
<Instrument_Description xsi:nil="true"/>
<Product_Line_Description>MBS - Prime</Product_Line_Description>
<Series_Class_Text>Class B</Series_Class_Text>
<Market_Description>AUSTRALIA</Market_Description>
<Face_Amount>11.2500000000</Face_Amount>
<Structured_Finance_Indicator>Y</Structured_Finance_Indicator>
<Structured_Finance_Sequence_Number>3</Structured_Finance_Sequence_Number>
<Instrument_Ratings>
<Instrument_Rating>
<Instrument_ID>831295951</Instrument_ID>
<Rating_Class_Number>37203</Rating_Class_Number>
<Rating_Date>2020-03-02T01:30:03</Rating_Date>
<XXXX_Rating_ID>831295958</XXXX_Rating_ID>
<Rating_Level>I</Rating_Level>
<Rating_Class_Text>Senior Secured</Rating_Class_Text>
<Security_Class_Code>18705</Security_Class_Code>
<Security_Class_Text>Pass-Through</Security_Class_Text>
<Security_Class_Short_Description>PAS</Security_Class_Short_Description>
<Duration_Code>25636</Duration_Code>
<Duration_Text>Long-Term Debt Rating</Duration_Text>
<Duration_Short_Description>LT</Duration_Short_Description>
<Seniority_Code>18743</Seniority_Code>
<Seniority_Text>Senior Secured</Seniority_Text>
<Seniority_Short_Description>SS</Seniority_Short_Description>
<Evaluation_Type_Code>25648</Evaluation_Type_Code>
<Evaluation_Type_Text>Credit Risk</Evaluation_Type_Text>
<Shadow_Code>5734</Shadow_Code>
<Shadow_Text>Enhanced</Shadow_Text>
<Shadow_Short_Description>ENH</Shadow_Short_Description>
<Rating_Subclass_Code xsi:nil="true"/>
<Rating_Subclass_Text xsi:nil="true"/>
<Currency_Capd_Code>19142</Currency_Capd_Code>
<Currency_Capd_Text>Local Currency</Currency_Capd_Text>
<Rating_Text>NR</Rating_Text>
<Credit_Grade xsi:nil="true"/>
<Rating_Rank>0</Rating_Rank>
<Rating_Direction_Code>19102</Rating_Direction_Code>
<Rating_Direction_Text>DECISION NOT TO RATE</Rating_Direction_Text>
<Rating_Direction_Short_Description>NR</Rating_Direction_Short_Description>
<Rating_Type_Code>534</Rating_Type_Code>
<Rating_Type_Text>Long-Term Debt Rating</Rating_Type_Text>
<Rating_Type_Short_Description>LT</Rating_Type_Short_Description>
<Rating_Enhancement_Level>ENH</Rating_Enhancement_Level>
<Rating_Local_Date>2020-03-02T17:30:03</Rating_Local_Date>
<Rating_Termination_Date xsi:nil="true"/>
<Rating_Termination_Local_Date xsi:nil="true"/>
<Rating_Reason_Code>25530</Rating_Reason_Code>
<Rating_Reason_Text>DECISION NOT TO RATE</Rating_Reason_Text>
<Rating_Currency_Code>20525</Rating_Currency_Code>
<Rating_Currency_Text>Australian Dollar</Rating_Currency_Text>
<Rating_Currency_ISO_Code>AUD</Rating_Currency_ISO_Code>
<Rating_Monitor_Indicator>1</Rating_Monitor_Indicator>
<Initial_Rating_Indicator>Y</Initial_Rating_Indicator>
<Instrument_Watchlist xsi:nil="true"/>
<Instrument_Rating_Attributes>
</Instrument_Rating_Attributes>
</Instrument_Rating>
</Instrument_Ratings>
<Instrument_Supports xsi:nil="true"/>
<Instrument_Organizations>
<Instrument_Organization>
<Organization_ID>88100</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>5156484</Organization_Role_Code>
<Organization_Role_Text>Issuer Account Bank</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>435700</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25017</Organization_Role_Code>
<Organization_Role_Text>Seller</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25020</Organization_Role_Code>
<Organization_Role_Text>Servicer</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>530750</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>24998</Organization_Role_Code>
<Organization_Role_Text>Arranger</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>540500</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>714325</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>192835</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600020914</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>831295948</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>129</Organization_Role_Code>
<Organization_Role_Text>Issuer</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600018753</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>140</Organization_Role_Code>
<Organization_Role_Text>Trustee</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25002</Organization_Role_Code>
<Organization_Role_Text>Custodian</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>806942842</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>140</Organization_Role_Code>
<Organization_Role_Text>Trustee</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>820688318</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>5155837</Organization_Role_Code>
<Organization_Role_Text>Cash Manager</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>530750</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25012</Organization_Role_Code>
<Organization_Role_Text>Originator</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
</Instrument_Organizations>
<Instrument_Identifiers xsi:nil="true"/>
<Instrument_Attributes>
<Instrument_Attribute>
<Instrument_ID>831295951</Instrument_ID>
<Rating_Attribute_Type_Code>5156438</Rating_Attribute_Type_Code>
<Rating_Attribute_Type_Text>SF Indicator</Rating_Attribute_Type_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
<Rating_Attribute_Code>5156439</Rating_Attribute_Code>
<Rating_Attribute_Text>(sf)</Rating_Attribute_Text>
</Instrument_Attribute>
<Instrument_Attribute>
<Instrument_ID>831295951</Instrument_ID>
<Rating_Attribute_Type_Code>5168187</Rating_Attribute_Type_Code>
<Rating_Attribute_Type_Text>SEC Exempt</Rating_Attribute_Type_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
<Rating_Attribute_Code>5168185</Rating_Attribute_Code>
<Rating_Attribute_Text>Exempt</Rating_Attribute_Text>
</Instrument_Attribute>
</Instrument_Attributes>
<Instrument_Markets>
<Instrument_Market>
<Instrument_ID>831295951</Instrument_ID>
<Domain_Number>75300</Domain_Number>
<Effective_Date>2020-03-21T00:00:00</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Market>
</Instrument_Markets>
</Instrument_Root>
</Instrument_Roots>
任何关于如何解决这个问题的想法都将不胜感激。谢谢。
问题的根源是你的 XML 有一个 默认命名空间
(http://www.XXXXX.com
),所以每次尝试定位元素必须
包含此命名空间(您的代码在此细节上失败)。
为了处理您的 XML 文件,我使用了以下代码:
进口:
from lxml import etree as et
读取XML文件:
parser = et.XMLParser(remove_blank_text=True)
tree = et.parse('Instrum.xml', parser)
root = tree.getroot()
定义使用的命名空间:
ns = {'xx': 'http://www.XXXXX.com'}
(下面会用到)
用每个Instrument_Root的children的文本内容填充tree1:
tree1 = []
for elem in root.findall('xx:Instrument_Root/*', ns):
txt = elem.text
if txt is not None:
tree1.append(txt)
注意Instrument_Root是根节点的直系后代,
所以只输入节点名称就足够了。
用每个Instrument_Rating的children的文本内容填充tree2:
tree2 = []
for elem in root.findall('.//xx:Instrument_Rating/*', ns):
txt = elem.text
if txt is not None and len(txt.strip()) > 0:
tree2.append(txt)
这次 Instrument_Rating 位于 XML 树的更深处,
所以 XPath 必须包含 // 才能执行 "all levels" 搜索。
我还添加了一些逻辑以避免附加 non-existing 文本
或仅包含 "while" 个字符的文本(如果您不想,请将其删除
跳过它们)。
对于你的 XML 输入样本,我得到了:
tree1:
['831295951', '831275547', '18705', 'Pass-Through', 'PAS', '2020-03-21T00:00:00',
'AUD', 'N', '2051-03-21T00:00:00', '2051', '2020-03-21T00:00:00', '7.2534316791',
'N', 'N', 'Class B', '24657', 'PASS-THRU CTFS', '24922', 'Not Applicable',
'26', 'Floating', 'FLT', '16', 'Monthly', 'MON', 'MBS - Prime', 'Class B',
'AUSTRALIA', '11.2500000000', 'Y', '3']
tree2:
['831295951', '37203', '2020-03-02T01:30:03', '831295958', 'I', 'Senior Secured',
'18705', 'Pass-Through', 'PAS', '25636', 'Long-Term Debt Rating', 'LT',
'18743', 'Senior Secured', 'SS', '25648', 'Credit Risk', '5734', 'Enhanced',
'ENH', '19142', 'Local Currency', 'NR', '0', '19102', 'DECISION NOT TO RATE',
'NR', '534', 'Long-Term Debt Rating', 'LT', 'ENH', '2020-03-02T17:30:03', '25530',
'DECISION NOT TO RATE', '20525', 'Australian Dollar', 'AUD', '1', 'Y']
请注意,不需要任何嵌套循环。
我想根据上面的代码你就会知道如何提取内容了
填充 tree3 和 tree4.
我在解析 xml 文件中的每个子节点时遇到问题。每个 Instrument_Root 的节点数可以更改。例如,Instrument_Watch 在这里是 NULL,但会在这之后的其他实例中填充。我的目标是让每个子节点单独解析(Instrument_Ratings、Instrument_Attribute_Ratings、Instrument_Organization、Instrument_Supports 等)
我尝试执行以下操作,但它只是重复返回第一个实例 - 文件中有 3700 个 Instrument_Root,Instrument_Rating 这个 Instrument_Root 重复了 3700 次.由于命名空间,我也 运行 遇到了 etree 的错误。
from lxml import objectify
xml = objectify.parse(file)
root = xml.getroot()
tree1 = []
tree2 = []
tree3 = []
tree4 = []
for children in range(len(root.getchildren())):
tree1.append([child.text for child in root.getchildren()[children].iterchildren()])
for children2 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.getchildren():
tree2.append([child2.text for child2 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.getchildren()])
for children3 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.Instrument_Rating_Attributes.Instrument_Rating_Attribute.getchildren():
tree3.append([child3.text for child3 in root.Instrument_Root.Instrument_Ratings.Instrument_Rating.Instrument_Rating_Attributes.Instrument_Rating_Attribute.getchildren()])
for children4 in root.Instrument_Root.Instrument_Organizations.Instrument_Organization.getchildren():
tree4.append([child4.text for child4 in root.Instrument_Root.Instrument_Organizations.Instrument_Organization.getchildren()])
XML:
<?xml version="1.0" encoding="utf-8"?>
<Instrument_Roots xmlns="http://www.XXXXX.com" xmlns:xsi="http://www.XXXXXXX.XMLSchema-instance" file_type="Baseline" frequency="Hourly-12" generation_time="2020-04-06T12:00:00Z">
<Instrument_Root>
<Instrument_ID>831295951</Instrument_ID>
<Deal_number>831275547</Deal_number>
<Class_Code>18705</Class_Code>
<Class_Text>Pass-Through</Class_Text>
<Class_Short_Description>PAS</Class_Short_Description>
<Dated_Date>2020-03-21T00:00:00</Dated_Date>
<ISO_Currency_Code>AUD</ISO_Currency_Code>
<Currency_Multiple_Indicator>N</Currency_Multiple_Indicator>
<Maturity_Date>2051-03-21T00:00:00</Maturity_Date>
<Maturity_Year>2051</Maturity_Year>
<Sale_Date>2020-03-21T00:00:00</Sale_Date>
<Face_Amount_USD>7.2534316791</Face_Amount_USD>
<Credit_Linked_Indicator>N</Credit_Linked_Indicator>
<Takedown_Indicator>N</Takedown_Indicator>
<Security_Description>Class B</Security_Description>
<Instrument_Type_Code>24657</Instrument_Type_Code>
<Instrument_Type_Text>PASS-THRU CTFS</Instrument_Type_Text>
<Private_Placement_Code>24922</Private_Placement_Code>
<Private_Placement_Text>Not Applicable</Private_Placement_Text>
<Coupon_Type_Code>26</Coupon_Type_Code>
<Coupon_Type_Text>Floating</Coupon_Type_Text>
<Coupon_Type_Short_Description>FLT</Coupon_Type_Short_Description>
<Coupon_Frequency_Code>16</Coupon_Frequency_Code>
<Coupon_Frequency_Text>Monthly</Coupon_Frequency_Text>
<Coupon_Frequency_Short_Description>MON</Coupon_Frequency_Short_Description>
<Coupon_Rate xsi:nil="true"/>
<Instrument_Description xsi:nil="true"/>
<Product_Line_Description>MBS - Prime</Product_Line_Description>
<Series_Class_Text>Class B</Series_Class_Text>
<Market_Description>AUSTRALIA</Market_Description>
<Face_Amount>11.2500000000</Face_Amount>
<Structured_Finance_Indicator>Y</Structured_Finance_Indicator>
<Structured_Finance_Sequence_Number>3</Structured_Finance_Sequence_Number>
<Instrument_Ratings>
<Instrument_Rating>
<Instrument_ID>831295951</Instrument_ID>
<Rating_Class_Number>37203</Rating_Class_Number>
<Rating_Date>2020-03-02T01:30:03</Rating_Date>
<XXXX_Rating_ID>831295958</XXXX_Rating_ID>
<Rating_Level>I</Rating_Level>
<Rating_Class_Text>Senior Secured</Rating_Class_Text>
<Security_Class_Code>18705</Security_Class_Code>
<Security_Class_Text>Pass-Through</Security_Class_Text>
<Security_Class_Short_Description>PAS</Security_Class_Short_Description>
<Duration_Code>25636</Duration_Code>
<Duration_Text>Long-Term Debt Rating</Duration_Text>
<Duration_Short_Description>LT</Duration_Short_Description>
<Seniority_Code>18743</Seniority_Code>
<Seniority_Text>Senior Secured</Seniority_Text>
<Seniority_Short_Description>SS</Seniority_Short_Description>
<Evaluation_Type_Code>25648</Evaluation_Type_Code>
<Evaluation_Type_Text>Credit Risk</Evaluation_Type_Text>
<Shadow_Code>5734</Shadow_Code>
<Shadow_Text>Enhanced</Shadow_Text>
<Shadow_Short_Description>ENH</Shadow_Short_Description>
<Rating_Subclass_Code xsi:nil="true"/>
<Rating_Subclass_Text xsi:nil="true"/>
<Currency_Capd_Code>19142</Currency_Capd_Code>
<Currency_Capd_Text>Local Currency</Currency_Capd_Text>
<Rating_Text>NR</Rating_Text>
<Credit_Grade xsi:nil="true"/>
<Rating_Rank>0</Rating_Rank>
<Rating_Direction_Code>19102</Rating_Direction_Code>
<Rating_Direction_Text>DECISION NOT TO RATE</Rating_Direction_Text>
<Rating_Direction_Short_Description>NR</Rating_Direction_Short_Description>
<Rating_Type_Code>534</Rating_Type_Code>
<Rating_Type_Text>Long-Term Debt Rating</Rating_Type_Text>
<Rating_Type_Short_Description>LT</Rating_Type_Short_Description>
<Rating_Enhancement_Level>ENH</Rating_Enhancement_Level>
<Rating_Local_Date>2020-03-02T17:30:03</Rating_Local_Date>
<Rating_Termination_Date xsi:nil="true"/>
<Rating_Termination_Local_Date xsi:nil="true"/>
<Rating_Reason_Code>25530</Rating_Reason_Code>
<Rating_Reason_Text>DECISION NOT TO RATE</Rating_Reason_Text>
<Rating_Currency_Code>20525</Rating_Currency_Code>
<Rating_Currency_Text>Australian Dollar</Rating_Currency_Text>
<Rating_Currency_ISO_Code>AUD</Rating_Currency_ISO_Code>
<Rating_Monitor_Indicator>1</Rating_Monitor_Indicator>
<Initial_Rating_Indicator>Y</Initial_Rating_Indicator>
<Instrument_Watchlist xsi:nil="true"/>
<Instrument_Rating_Attributes>
</Instrument_Rating_Attributes>
</Instrument_Rating>
</Instrument_Ratings>
<Instrument_Supports xsi:nil="true"/>
<Instrument_Organizations>
<Instrument_Organization>
<Organization_ID>88100</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>5156484</Organization_Role_Code>
<Organization_Role_Text>Issuer Account Bank</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>435700</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25017</Organization_Role_Code>
<Organization_Role_Text>Seller</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25020</Organization_Role_Code>
<Organization_Role_Text>Servicer</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>530750</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>24998</Organization_Role_Code>
<Organization_Role_Text>Arranger</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>540500</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>714325</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>192835</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600020914</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>831295948</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>129</Organization_Role_Code>
<Organization_Role_Text>Issuer</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600018753</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>140</Organization_Role_Code>
<Organization_Role_Text>Trustee</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25002</Organization_Role_Code>
<Organization_Role_Text>Custodian</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>806942842</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>140</Organization_Role_Code>
<Organization_Role_Text>Trustee</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>820688318</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>5155837</Organization_Role_Code>
<Organization_Role_Text>Cash Manager</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>530750</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>141</Organization_Role_Code>
<Organization_Role_Text>Underwriter</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
<Instrument_Organization>
<Organization_ID>600045234</Organization_ID>
<Instrument_ID>831295951</Instrument_ID>
<Organization_Role_Code>25012</Organization_Role_Code>
<Organization_Role_Text>Originator</Organization_Role_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Organization>
</Instrument_Organizations>
<Instrument_Identifiers xsi:nil="true"/>
<Instrument_Attributes>
<Instrument_Attribute>
<Instrument_ID>831295951</Instrument_ID>
<Rating_Attribute_Type_Code>5156438</Rating_Attribute_Type_Code>
<Rating_Attribute_Type_Text>SF Indicator</Rating_Attribute_Type_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
<Rating_Attribute_Code>5156439</Rating_Attribute_Code>
<Rating_Attribute_Text>(sf)</Rating_Attribute_Text>
</Instrument_Attribute>
<Instrument_Attribute>
<Instrument_ID>831295951</Instrument_ID>
<Rating_Attribute_Type_Code>5168187</Rating_Attribute_Type_Code>
<Rating_Attribute_Type_Text>SEC Exempt</Rating_Attribute_Type_Text>
<Effective_Date>2020-03-02T01:30:03</Effective_Date>
<Termination_Date xsi:nil="true"/>
<Rating_Attribute_Code>5168185</Rating_Attribute_Code>
<Rating_Attribute_Text>Exempt</Rating_Attribute_Text>
</Instrument_Attribute>
</Instrument_Attributes>
<Instrument_Markets>
<Instrument_Market>
<Instrument_ID>831295951</Instrument_ID>
<Domain_Number>75300</Domain_Number>
<Effective_Date>2020-03-21T00:00:00</Effective_Date>
<Termination_Date xsi:nil="true"/>
</Instrument_Market>
</Instrument_Markets>
</Instrument_Root>
</Instrument_Roots>
任何关于如何解决这个问题的想法都将不胜感激。谢谢。
问题的根源是你的 XML 有一个 默认命名空间
(http://www.XXXXX.com
),所以每次尝试定位元素必须
包含此命名空间(您的代码在此细节上失败)。
为了处理您的 XML 文件,我使用了以下代码:
进口:
from lxml import etree as et
读取XML文件:
parser = et.XMLParser(remove_blank_text=True) tree = et.parse('Instrum.xml', parser) root = tree.getroot()
定义使用的命名空间:
ns = {'xx': 'http://www.XXXXX.com'}
(下面会用到)
用每个Instrument_Root的children的文本内容填充tree1:
tree1 = [] for elem in root.findall('xx:Instrument_Root/*', ns): txt = elem.text if txt is not None: tree1.append(txt)
注意Instrument_Root是根节点的直系后代, 所以只输入节点名称就足够了。
用每个Instrument_Rating的children的文本内容填充tree2:
tree2 = [] for elem in root.findall('.//xx:Instrument_Rating/*', ns): txt = elem.text if txt is not None and len(txt.strip()) > 0: tree2.append(txt)
这次 Instrument_Rating 位于 XML 树的更深处, 所以 XPath 必须包含 // 才能执行 "all levels" 搜索。
我还添加了一些逻辑以避免附加 non-existing 文本 或仅包含 "while" 个字符的文本(如果您不想,请将其删除 跳过它们)。
对于你的 XML 输入样本,我得到了:
tree1:
['831295951', '831275547', '18705', 'Pass-Through', 'PAS', '2020-03-21T00:00:00', 'AUD', 'N', '2051-03-21T00:00:00', '2051', '2020-03-21T00:00:00', '7.2534316791', 'N', 'N', 'Class B', '24657', 'PASS-THRU CTFS', '24922', 'Not Applicable', '26', 'Floating', 'FLT', '16', 'Monthly', 'MON', 'MBS - Prime', 'Class B', 'AUSTRALIA', '11.2500000000', 'Y', '3']
tree2:
['831295951', '37203', '2020-03-02T01:30:03', '831295958', 'I', 'Senior Secured', '18705', 'Pass-Through', 'PAS', '25636', 'Long-Term Debt Rating', 'LT', '18743', 'Senior Secured', 'SS', '25648', 'Credit Risk', '5734', 'Enhanced', 'ENH', '19142', 'Local Currency', 'NR', '0', '19102', 'DECISION NOT TO RATE', 'NR', '534', 'Long-Term Debt Rating', 'LT', 'ENH', '2020-03-02T17:30:03', '25530', 'DECISION NOT TO RATE', '20525', 'Australian Dollar', 'AUD', '1', 'Y']
请注意,不需要任何嵌套循环。
我想根据上面的代码你就会知道如何提取内容了 填充 tree3 和 tree4.