Python xml 上的解析器无法 return 分支
Python parser on xml not able to return branches
我目前正在尝试解析下载的 xml 文件并写入 csv 文件,但是我对 xml 格式设置有点缺乏经验。无论我能够 return 第一个分支 Filing
中的元素,但我无法 return 以下分支中的任何内容,例如 Registrant
.
这是我要筛选的 xml:
<?xml version='1.0' encoding='UTF-16'?>
<PublicFilings>
<Filing ID="146DC558-FB00-4BAB-A393-EC50483FB7A9" Year="2014" Received="2014-10-08T12:31:59.127" Amount="10000" Type="THIRD QUARTER REPORT" Period="3rd Quarter (July 1 - Sep 30)">
<Registrant xmlns="" RegistrantID="36366" RegistrantName="Sports & Fitness Industry Association" GeneralDescription="Representing Manufacturers, retailers and other interests in sports and fitness business" Address="8505 Fenton Street
Suite 211
Silver Spring, MD 20910" RegistrantCountry="USA" RegistrantPPBCountry="USA" />
<Client xmlns="" ClientName="Sports & Fitness Industry Association" GeneralDescription="" ClientID="12" SelfFiler="TRUE" ContactFullname="WILLIAM H. SELLS III" IsStateOrLocalGov="TRUE" ClientCountry="USA" ClientPPBCountry="USA" ClientState="MARYLAND" ClientPPBState="MARYLAND" />
<Lobbyists>
<Lobbyist xmlns="" LobbyistName="Sells, William Howard III" LobbyistCoveredGovPositionIndicator="NOT COVERED" OfficialPosition="" /></Lobbyists>
<GovernmentEntities>
<GovernmentEntity xmlns="" GovEntityName="Education, Dept of" />
<GovernmentEntity xmlns="" GovEntityName="Health & Human Services, Dept of (HHS)" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Trade Representative (USTR)" /><GovernmentEntity xmlns="" GovEntityName="Consumer Product Safety Commission (CPSC)" />
<GovernmentEntity xmlns="" GovEntityName="Internal Revenue Service (IRS)" />
<GovernmentEntity xmlns="" GovEntityName="Federal Trade Commission (FTC)" />
<GovernmentEntity xmlns="" GovEntityName="Intl Trade Administration (ITA)" />
<GovernmentEntity xmlns="" GovEntityName="Interior, Dept of (DOI)" />
<GovernmentEntity xmlns="" GovEntityName="Centers For Disease Control & Prevention (CDC)" />
<GovernmentEntity xmlns="" GovEntityName="Transportation, Dept of (DOT)" />
<GovernmentEntity xmlns="" GovEntityName="Natl Institutes of Health (NIH)" />
<GovernmentEntity xmlns="" GovEntityName="Justice, Dept of (DOJ)" />
<GovernmentEntity xmlns="" GovEntityName="Commerce, Dept of (DOC)" />
<GovernmentEntity xmlns="" GovEntityName="HOUSE OF REPRESENTATIVES" />
<GovernmentEntity xmlns="" GovEntityName="SENATE" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Customs & Border Protection" />
</GovernmentEntities>
<Issues>
<Issue xmlns="" Code="SPORTS/ATHLETICS" SpecificIssue="Physical Activity, Sports, Recreation, Exercise & Fitness, Sedentary Lifestyles, Pay-to-Play, Title IX, Sports Injuries & Concussions" />
<Issue xmlns="" Code="HEALTH ISSUES" SpecificIssue="Childhood Obesity, Obesity, Chronic Disease, Prevention via Physical Activity, Wellness Benefits of Physical Activity" />
<Issue xmlns="" Code="TRANSPORTATION" SpecificIssue="Trail Development, Park & Recreation Access, Highway Fees, Safe Routes to School" />
<Issue xmlns="" Code="TAXATION/INTERNAL REVENUE CODE" SpecificIssue="Physical Activity Tax Incentives, Duties & Tariffs, Tax Relief, Tax Reform, Internet Sales Tax" />
<Issue xmlns="" Code="COPYRIGHT/PATENT/TRADEMARK" SpecificIssue="Intellectual Property Rights, Rogue Websites, False Markings, Counterfeit & Fake Products, Patent Reform" />
<Issue xmlns="" Code="TRADE (DOMESTIC/FOREIGN)" SpecificIssue="Shipping Act Reform, Intellectual Property Rights Enforcement, Free Trade Agreements, Tariffs, Duties, Quotas, Market Access" />
<Issue xmlns="" Code="TORTS" SpecificIssue="Product Liability, Intellectual Property Rights" />
<Issue xmlns="" Code="REAL ESTATE/LAND USE/CONSERVATION" SpecificIssue="Park & Recreation Development & Maintenance, Land & Water Conservation Fund, Urban Planning, Park & Recreation Access, National Park Preservation" />
<Issue xmlns="" Code="TARIFF (MISCELLANEOUS TARIFF BILLS)" SpecificIssue="Tariffs & Duties on Sporting Goods & Ftiness Products and Equipment, Trade Agreements" />
<Issue xmlns="" Code="EDUCATION" SpecificIssue="Phyical Education Funding, ESEA Reauthorization, Physical Activity, Pay-to-Play School Sports, School Sports Injuries" />
<Issue xmlns="" Code="APPAREL/CLOTHING INDUSTRY/TEXTILES" SpecificIssue="Tariffs, Duties, Free Trade Agreements, Chinese Currency Valuation, Market Access, TPP, TTIP, TPA" />
<Issue xmlns="" Code="CONSUMER ISSUES/SAFETY/PRODUCTS" SpecificIssue="CPSIA Compliance, Product Testing, Product Safety Database, Sports Equipment & Helmet Safety" />
<Issue xmlns="" Code="MANUFACTURING" SpecificIssue="Trade Agreements, Product Safety, Domestic Job Creation, Access to Raw Materials, Restrictions on Product Content, Outsourcing" /></Issues></Filing>
这是我当前的 python 代码,使用 python
中的元素树
import xml.etree.ElementTree as ET
import xml
import csv
import datetime
e = xml.etree.ElementTree.parse('/Users/Ryan/Downloads/2015_1/2015_1_1_1.xml').getroot()
#filing_elements = ['filing_ID', 'Year', 'Amount', 'Type', 'Period']
#Filing
IDs = []
for atype in e.findall('Filing'):
IDs.append(atype.get('ID'))
Year = []
for atype in e.findall('Filing'):
Year.append(atype.get('Year'))
Amount = []
for atype in e.findall('Filing'):
Amount.append(atype.get('Amount'))
Type = []
for atype in e.findall('Filing'):
Type.append(atype.get('Type'))
Period = []
for atype in e.findall('Filing'):
Period.append(atype.get('Period'))
#Registrant
RegistrantID = []
for ty in e.findall('Registrant'):
RegistrantID.append(ty.get('RegistrantID'))
RegistrantName = []
for atype in e.findall('Registrant'):
RegistrantName.append(atype.get('RegistrantName'))
GeneralDescription = []
for atype in e.findall('Registrant'):
GeneralDescription.append(atype.get('GeneralDescription'))
ClientName = []
for atype in e.findall('Client'):
ClientName.append(atype.get('ClientName'))
在这种情况下,所有在 Registrant
return 空白列表
中搜索元素的循环
您所做的只会对根的直接子项(即 "PublicFilings")有效。但是,您之后的标签 "Registrant" 是孙代,而不是直系子代。
为了从根开始在树中的任何位置查找标签,请对每个搜索循环使用以下内容:
findall(".//Registrant"):
例如:
RegistrantID = []
for ty in e.findall(".//Registrant"):
RegistrantID.append(ty.get('RegistrantID'))
我目前正在尝试解析下载的 xml 文件并写入 csv 文件,但是我对 xml 格式设置有点缺乏经验。无论我能够 return 第一个分支 Filing
中的元素,但我无法 return 以下分支中的任何内容,例如 Registrant
.
这是我要筛选的 xml:
<?xml version='1.0' encoding='UTF-16'?>
<PublicFilings>
<Filing ID="146DC558-FB00-4BAB-A393-EC50483FB7A9" Year="2014" Received="2014-10-08T12:31:59.127" Amount="10000" Type="THIRD QUARTER REPORT" Period="3rd Quarter (July 1 - Sep 30)">
<Registrant xmlns="" RegistrantID="36366" RegistrantName="Sports & Fitness Industry Association" GeneralDescription="Representing Manufacturers, retailers and other interests in sports and fitness business" Address="8505 Fenton Street
Suite 211
Silver Spring, MD 20910" RegistrantCountry="USA" RegistrantPPBCountry="USA" />
<Client xmlns="" ClientName="Sports & Fitness Industry Association" GeneralDescription="" ClientID="12" SelfFiler="TRUE" ContactFullname="WILLIAM H. SELLS III" IsStateOrLocalGov="TRUE" ClientCountry="USA" ClientPPBCountry="USA" ClientState="MARYLAND" ClientPPBState="MARYLAND" />
<Lobbyists>
<Lobbyist xmlns="" LobbyistName="Sells, William Howard III" LobbyistCoveredGovPositionIndicator="NOT COVERED" OfficialPosition="" /></Lobbyists>
<GovernmentEntities>
<GovernmentEntity xmlns="" GovEntityName="Education, Dept of" />
<GovernmentEntity xmlns="" GovEntityName="Health & Human Services, Dept of (HHS)" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Trade Representative (USTR)" /><GovernmentEntity xmlns="" GovEntityName="Consumer Product Safety Commission (CPSC)" />
<GovernmentEntity xmlns="" GovEntityName="Internal Revenue Service (IRS)" />
<GovernmentEntity xmlns="" GovEntityName="Federal Trade Commission (FTC)" />
<GovernmentEntity xmlns="" GovEntityName="Intl Trade Administration (ITA)" />
<GovernmentEntity xmlns="" GovEntityName="Interior, Dept of (DOI)" />
<GovernmentEntity xmlns="" GovEntityName="Centers For Disease Control & Prevention (CDC)" />
<GovernmentEntity xmlns="" GovEntityName="Transportation, Dept of (DOT)" />
<GovernmentEntity xmlns="" GovEntityName="Natl Institutes of Health (NIH)" />
<GovernmentEntity xmlns="" GovEntityName="Justice, Dept of (DOJ)" />
<GovernmentEntity xmlns="" GovEntityName="Commerce, Dept of (DOC)" />
<GovernmentEntity xmlns="" GovEntityName="HOUSE OF REPRESENTATIVES" />
<GovernmentEntity xmlns="" GovEntityName="SENATE" />
<GovernmentEntity xmlns="" GovEntityName="U.S. Customs & Border Protection" />
</GovernmentEntities>
<Issues>
<Issue xmlns="" Code="SPORTS/ATHLETICS" SpecificIssue="Physical Activity, Sports, Recreation, Exercise & Fitness, Sedentary Lifestyles, Pay-to-Play, Title IX, Sports Injuries & Concussions" />
<Issue xmlns="" Code="HEALTH ISSUES" SpecificIssue="Childhood Obesity, Obesity, Chronic Disease, Prevention via Physical Activity, Wellness Benefits of Physical Activity" />
<Issue xmlns="" Code="TRANSPORTATION" SpecificIssue="Trail Development, Park & Recreation Access, Highway Fees, Safe Routes to School" />
<Issue xmlns="" Code="TAXATION/INTERNAL REVENUE CODE" SpecificIssue="Physical Activity Tax Incentives, Duties & Tariffs, Tax Relief, Tax Reform, Internet Sales Tax" />
<Issue xmlns="" Code="COPYRIGHT/PATENT/TRADEMARK" SpecificIssue="Intellectual Property Rights, Rogue Websites, False Markings, Counterfeit & Fake Products, Patent Reform" />
<Issue xmlns="" Code="TRADE (DOMESTIC/FOREIGN)" SpecificIssue="Shipping Act Reform, Intellectual Property Rights Enforcement, Free Trade Agreements, Tariffs, Duties, Quotas, Market Access" />
<Issue xmlns="" Code="TORTS" SpecificIssue="Product Liability, Intellectual Property Rights" />
<Issue xmlns="" Code="REAL ESTATE/LAND USE/CONSERVATION" SpecificIssue="Park & Recreation Development & Maintenance, Land & Water Conservation Fund, Urban Planning, Park & Recreation Access, National Park Preservation" />
<Issue xmlns="" Code="TARIFF (MISCELLANEOUS TARIFF BILLS)" SpecificIssue="Tariffs & Duties on Sporting Goods & Ftiness Products and Equipment, Trade Agreements" />
<Issue xmlns="" Code="EDUCATION" SpecificIssue="Phyical Education Funding, ESEA Reauthorization, Physical Activity, Pay-to-Play School Sports, School Sports Injuries" />
<Issue xmlns="" Code="APPAREL/CLOTHING INDUSTRY/TEXTILES" SpecificIssue="Tariffs, Duties, Free Trade Agreements, Chinese Currency Valuation, Market Access, TPP, TTIP, TPA" />
<Issue xmlns="" Code="CONSUMER ISSUES/SAFETY/PRODUCTS" SpecificIssue="CPSIA Compliance, Product Testing, Product Safety Database, Sports Equipment & Helmet Safety" />
<Issue xmlns="" Code="MANUFACTURING" SpecificIssue="Trade Agreements, Product Safety, Domestic Job Creation, Access to Raw Materials, Restrictions on Product Content, Outsourcing" /></Issues></Filing>
这是我当前的 python 代码,使用 python
中的元素树import xml.etree.ElementTree as ET
import xml
import csv
import datetime
e = xml.etree.ElementTree.parse('/Users/Ryan/Downloads/2015_1/2015_1_1_1.xml').getroot()
#filing_elements = ['filing_ID', 'Year', 'Amount', 'Type', 'Period']
#Filing
IDs = []
for atype in e.findall('Filing'):
IDs.append(atype.get('ID'))
Year = []
for atype in e.findall('Filing'):
Year.append(atype.get('Year'))
Amount = []
for atype in e.findall('Filing'):
Amount.append(atype.get('Amount'))
Type = []
for atype in e.findall('Filing'):
Type.append(atype.get('Type'))
Period = []
for atype in e.findall('Filing'):
Period.append(atype.get('Period'))
#Registrant
RegistrantID = []
for ty in e.findall('Registrant'):
RegistrantID.append(ty.get('RegistrantID'))
RegistrantName = []
for atype in e.findall('Registrant'):
RegistrantName.append(atype.get('RegistrantName'))
GeneralDescription = []
for atype in e.findall('Registrant'):
GeneralDescription.append(atype.get('GeneralDescription'))
ClientName = []
for atype in e.findall('Client'):
ClientName.append(atype.get('ClientName'))
在这种情况下,所有在 Registrant
return 空白列表
您所做的只会对根的直接子项(即 "PublicFilings")有效。但是,您之后的标签 "Registrant" 是孙代,而不是直系子代。
为了从根开始在树中的任何位置查找标签,请对每个搜索循环使用以下内容:
findall(".//Registrant"):
例如:
RegistrantID = []
for ty in e.findall(".//Registrant"):
RegistrantID.append(ty.get('RegistrantID'))