Python xml 上的解析器无法 return 分支

Python parser on xml not able to return branches

我目前正在尝试解析下载的 xml 文件并写入 csv 文件,但是我对 xml 格式设置有点缺乏经验。无论我能够 return 第一个分支 Filing 中的元素,但我无法 return 以下分支中的任何内容,例如 Registrant.

这是我要筛选的 xml:

<?xml version='1.0' encoding='UTF-16'?>
<PublicFilings>
<Filing ID="146DC558-FB00-4BAB-A393-EC50483FB7A9" Year="2014" Received="2014-10-08T12:31:59.127" Amount="10000" Type="THIRD QUARTER REPORT" Period="3rd Quarter (July 1 - Sep 30)">
<Registrant xmlns="" RegistrantID="36366" RegistrantName="Sports &amp; Fitness Industry Association" GeneralDescription="Representing Manufacturers, retailers and other interests in sports and fitness business" Address="8505 Fenton Street&#xD;&#xA;Suite 211&#xD;&#xA;Silver Spring, MD 20910" RegistrantCountry="USA" RegistrantPPBCountry="USA" />
<Client xmlns="" ClientName="Sports &amp; Fitness Industry Association" GeneralDescription="" ClientID="12" SelfFiler="TRUE" ContactFullname="WILLIAM H. SELLS III" IsStateOrLocalGov="TRUE" ClientCountry="USA" ClientPPBCountry="USA" ClientState="MARYLAND" ClientPPBState="MARYLAND" />
<Lobbyists>
<Lobbyist xmlns="" LobbyistName="Sells, William Howard III" LobbyistCoveredGovPositionIndicator="NOT COVERED" OfficialPosition="" /></Lobbyists>
<GovernmentEntities>
<GovernmentEntity xmlns="" GovEntityName="Education, Dept of" />
<GovernmentEntity xmlns="" GovEntityName="Health &amp; Human Services, Dept of (HHS)" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Trade Representative (USTR)" /><GovernmentEntity xmlns="" GovEntityName="Consumer Product Safety Commission (CPSC)" />
<GovernmentEntity xmlns="" GovEntityName="Internal Revenue Service (IRS)" />
<GovernmentEntity xmlns="" GovEntityName="Federal Trade Commission (FTC)" />
<GovernmentEntity xmlns="" GovEntityName="Intl Trade Administration (ITA)" />
<GovernmentEntity xmlns="" GovEntityName="Interior, Dept of (DOI)" />
<GovernmentEntity xmlns="" GovEntityName="Centers For Disease Control &amp; Prevention (CDC)" />
<GovernmentEntity xmlns="" GovEntityName="Transportation, Dept of (DOT)" />
<GovernmentEntity xmlns="" GovEntityName="Natl Institutes of Health (NIH)" />
<GovernmentEntity xmlns="" GovEntityName="Justice, Dept of (DOJ)" />
<GovernmentEntity xmlns="" GovEntityName="Commerce, Dept of (DOC)" />
<GovernmentEntity xmlns="" GovEntityName="HOUSE OF REPRESENTATIVES" />
<GovernmentEntity xmlns="" GovEntityName="SENATE" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Customs &amp; Border Protection" />
</GovernmentEntities>
<Issues>
<Issue xmlns="" Code="SPORTS/ATHLETICS" SpecificIssue="Physical Activity, Sports, Recreation, Exercise &amp; Fitness, Sedentary Lifestyles, Pay-to-Play, Title IX, Sports Injuries &amp; Concussions" />
<Issue xmlns="" Code="HEALTH ISSUES" SpecificIssue="Childhood Obesity, Obesity, Chronic Disease, Prevention via Physical Activity, Wellness Benefits of Physical Activity" />
<Issue xmlns="" Code="TRANSPORTATION" SpecificIssue="Trail Development, Park &amp; Recreation Access, Highway Fees, Safe Routes to School" />
<Issue xmlns="" Code="TAXATION/INTERNAL REVENUE CODE" SpecificIssue="Physical Activity Tax Incentives, Duties &amp; Tariffs, Tax Relief, Tax Reform, Internet Sales Tax" />
<Issue xmlns="" Code="COPYRIGHT/PATENT/TRADEMARK" SpecificIssue="Intellectual Property Rights, Rogue Websites, False Markings, Counterfeit &amp; Fake Products, Patent Reform" />
<Issue xmlns="" Code="TRADE (DOMESTIC/FOREIGN)" SpecificIssue="Shipping Act Reform, Intellectual Property Rights Enforcement, Free Trade Agreements, Tariffs, Duties, Quotas, Market Access" />
<Issue xmlns="" Code="TORTS" SpecificIssue="Product Liability, Intellectual Property Rights" />
<Issue xmlns="" Code="REAL ESTATE/LAND USE/CONSERVATION" SpecificIssue="Park &amp; Recreation Development &amp; Maintenance, Land &amp; Water Conservation Fund, Urban Planning, Park &amp; Recreation Access, National Park Preservation" />
<Issue xmlns="" Code="TARIFF (MISCELLANEOUS TARIFF BILLS)" SpecificIssue="Tariffs &amp; Duties on Sporting Goods &amp; Ftiness Products and Equipment, Trade Agreements" />
<Issue xmlns="" Code="EDUCATION" SpecificIssue="Phyical Education Funding, ESEA Reauthorization, Physical Activity, Pay-to-Play School Sports, School Sports Injuries" />
<Issue xmlns="" Code="APPAREL/CLOTHING INDUSTRY/TEXTILES" SpecificIssue="Tariffs, Duties, Free Trade Agreements, Chinese Currency Valuation, Market Access, TPP, TTIP, TPA" />
<Issue xmlns="" Code="CONSUMER ISSUES/SAFETY/PRODUCTS" SpecificIssue="CPSIA Compliance, Product Testing, Product Safety Database, Sports Equipment &amp; Helmet Safety" />
<Issue xmlns="" Code="MANUFACTURING" SpecificIssue="Trade Agreements, Product Safety, Domestic Job Creation, Access to Raw Materials, Restrictions on Product Content, Outsourcing" /></Issues></Filing>

这是我当前的 python 代码,使用 python

中的元素树
import xml.etree.ElementTree as ET
import xml
import csv
import datetime

e = xml.etree.ElementTree.parse('/Users/Ryan/Downloads/2015_1/2015_1_1_1.xml').getroot()

#filing_elements = ['filing_ID', 'Year', 'Amount', 'Type', 'Period']
#Filing

IDs = []
for atype in e.findall('Filing'):
    IDs.append(atype.get('ID'))
Year = []
for atype in e.findall('Filing'):
    Year.append(atype.get('Year'))
Amount = []
for atype in e.findall('Filing'):
    Amount.append(atype.get('Amount'))
Type = []
for atype in e.findall('Filing'):
    Type.append(atype.get('Type'))
Period = []
for atype in e.findall('Filing'):
    Period.append(atype.get('Period'))
#Registrant 
RegistrantID = []
for ty in e.findall('Registrant'):
    RegistrantID.append(ty.get('RegistrantID'))
RegistrantName = []
for atype in e.findall('Registrant'):
    RegistrantName.append(atype.get('RegistrantName'))
GeneralDescription = []
for atype in e.findall('Registrant'):
    GeneralDescription.append(atype.get('GeneralDescription'))
ClientName = []
for atype in e.findall('Client'):
    ClientName.append(atype.get('ClientName'))

在这种情况下,所有在 Registrant return 空白列表

中搜索元素的循环

您所做的只会对根的直接子项(即 "PublicFilings")有效。但是,您之后的标签 "Registrant" 是孙代,而不是直系子代。

为了从根开始在树中的任何位置查找标签,请对每个搜索循环使用以下内容:

findall(".//Registrant"):

例如:

RegistrantID = []
for ty in e.findall(".//Registrant"):
    RegistrantID.append(ty.get('RegistrantID'))