XML 文件解析 Python
XML file parsing with Python
我无法以这种方式为转换为 CSV 的 XML 文件解析数据:
对于第一列,我想获取通用名称标签(recordingSystem、Ports 等)并将其与行标签(closedFileCount、processedFileCount 等)中的子名称连接起来
subName所在的标签一直在变化,可能是“usage”、“lwGuage”、“hwGauge”等等。我也要把那些收集起来放在旁边的栏里。
请参阅下面的示例 XML:
<?xml version="1.0" encoding="UTF-8"?>
<omGroups xmlns="urn:nortel:namespaces:mcp:oms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:oms OMSchema.xsd" >
<group>
<name>RecordingSystem</name>
<row>
<package>com.nortelnetworks.mcp.ne.base.recsystem.fw.system</package>
<class>RecSysFileOMRow</class>
<usage name="closedFileCount" hasThresholds="true">
<measures>
closed file count
</measures>
<description>
This register counts the number
of closed files in the spool directory of a
particular stream and a particular system.
Files in the spool directory store the raw
OAM records where they are sent to the
Element Manager for formatting.
</description>
<notes>
Minor and major alarms
when the value of closedFileCount
exceeds certain thresholds. Configure
the threshold values for minor and major
alarms for this OM through engineering
parameters for minorBackLogCount and
majorBackLogCount, respectively. These
engineering parameters are grouped under
the parameter group of Log, OM, and
Accounting for the logs’ corresponding
system.
</notes>
</usage>
<usage name="processedFileCount" hasThresholds="true">
<measures>
Processed file count
</measures>
<description>
The register counts the number
of processed files in the spool directory of
a particular stream and a particular system.
Files in the spool directory store the raw
OAM records and then send the records to
the Element Manager for formatting.
</description>
</usage>
</row>
<documentation>
<description>
Rows of this OM group provide a count of the number of files contained
within the directory (which is the OM row key value).
</description>
<rowKey>
The full name of the directory containing the files counted by this row.
</rowKey>
</documentation>
<generatedOn>
<all/>
</generatedOn>
</group>
<group traffic="true">
<name>Ports</name>
<row>
<package>com.nortelnetworks.ims.cap.mediaportal.host</package>
<class>PortsOMRow</class>
<usage name="rtpMpPortUsage">
<measures>
BCP port usage
</measures>
<description>
Meter showing number of ports in use.
</description>
</usage>
<lwGauge name="connMapEntriesLWM">
<measures>
Lowest simultaneous port usage
</measures>
<description>
Lowest number of
simultaneous ports detected to be in
use during the collection interval
</description>
</lwGauge>
<hwGauge name="connMapEntriesHWM">
<measures>
Highest simultaneous port usage
</measures>
<description>
Highest number of
simultaneous ports detected to be in
use during the collection interval.
</description>
</hwGauge>
<waterMark name="connMapEntries">
<measures>
Connections map entries
</measures>
<description>
Meter showing the number of connections in the host
CPU connection map.
</description>
<bwg lwref="connMapEntriesLWM" hwref="connMapEntriesHWM"/>
</waterMark>
<counter name="portUsageSampleCnt">
<measures>
Usage sample count
</measures>
<description>
The number of 100-second samples taken during the
collection interval contributing to the average report.
</description>
</counter>
<counter name="sampledRtpMpPortUsage">
<measures>
In-use ports usage
</measures>
<description>
Provides the sum of the in-use ports every 100 seconds.
</description>
</counter>
<precollector>
<package>com.nortelnetworks.ims.cap.mediaportal.host</package>
<class>PortsOMCenturyPrecollector</class>
<collector>centurySecond</collector>
</precollector>
</row>
<documentation>
<description>
</description>
<rowKey>
</rowKey>
</documentation>
<generatedOn>
<list>
<ne>sessmgr</ne>
<ne>rtpportal</ne>
</list>
</generatedOn>
</group>
</omGroups>
下面的代码应该获取 GeneralName 并将其在 csv 文件中显示正确的次数,但我无法让它显示任何内容。
from xml.etree import ElementTree
import csv
from copy import copy
import lxml.etree
tree = ElementTree.parse('OM.xml')
sitescope_data = open('OMFileConverted.csv', 'w', newline='', encoding='utf-8')
csvwriter = csv.writer(sitescope_data)
#Create all needed columns here in order and writes them to excel file
col_names = ['name', 'OMRegister']
csvwriter.writerow(col_names)
def recurse(root, props):
for child in root:
if child.tag == '{urn:nortel:namespaces:mcp:oms}group':
p2 = copy(props)
for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'):
event_id = event.find('{urn:nortel:namespaces:mcp:oms}name')
if event_id != None:
p2['name'] = event_id.text
recurse(child, p2)
else:
recurse(child, props)
for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'):
event_data = [props.get('name')]
csvwriter.writerow(event_data)
root = tree.getroot()
recurse(root,{}) #root + empty dictionary
sitescope_data.close()
如果 xml_string
是问题中的 XML 片段,则此脚本:
import csv
from bs4 import BeautifulSoup
soup = BeautifulSoup(xml_string, 'html.parser')
with open('data.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
writer.writerow(['General name:SpecificName', 'RegisterType'])
for item in soup.select('row [name]'):
writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name])
生成 data.csv
(来自 LibreOffice 的屏幕截图):
编辑:要将 measures
标记放入列中,您可以执行以下操作:
import csv
from bs4 import BeautifulSoup
soup = BeautifulSoup(xml_string, 'html.parser')
with open('data.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
writer.writerow(['General name:SpecificName', 'RegisterType', 'Measures'])
for item in soup.select('row [name]'):
writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name, item.find('measures').get_text(strip=True)])
生产:
我无法以这种方式为转换为 CSV 的 XML 文件解析数据:
对于第一列,我想获取通用名称标签(recordingSystem、Ports 等)并将其与行标签(closedFileCount、processedFileCount 等)中的子名称连接起来
subName所在的标签一直在变化,可能是“usage”、“lwGuage”、“hwGauge”等等。我也要把那些收集起来放在旁边的栏里。
请参阅下面的示例 XML:
<?xml version="1.0" encoding="UTF-8"?>
<omGroups xmlns="urn:nortel:namespaces:mcp:oms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:oms OMSchema.xsd" >
<group>
<name>RecordingSystem</name>
<row>
<package>com.nortelnetworks.mcp.ne.base.recsystem.fw.system</package>
<class>RecSysFileOMRow</class>
<usage name="closedFileCount" hasThresholds="true">
<measures>
closed file count
</measures>
<description>
This register counts the number
of closed files in the spool directory of a
particular stream and a particular system.
Files in the spool directory store the raw
OAM records where they are sent to the
Element Manager for formatting.
</description>
<notes>
Minor and major alarms
when the value of closedFileCount
exceeds certain thresholds. Configure
the threshold values for minor and major
alarms for this OM through engineering
parameters for minorBackLogCount and
majorBackLogCount, respectively. These
engineering parameters are grouped under
the parameter group of Log, OM, and
Accounting for the logs’ corresponding
system.
</notes>
</usage>
<usage name="processedFileCount" hasThresholds="true">
<measures>
Processed file count
</measures>
<description>
The register counts the number
of processed files in the spool directory of
a particular stream and a particular system.
Files in the spool directory store the raw
OAM records and then send the records to
the Element Manager for formatting.
</description>
</usage>
</row>
<documentation>
<description>
Rows of this OM group provide a count of the number of files contained
within the directory (which is the OM row key value).
</description>
<rowKey>
The full name of the directory containing the files counted by this row.
</rowKey>
</documentation>
<generatedOn>
<all/>
</generatedOn>
</group>
<group traffic="true">
<name>Ports</name>
<row>
<package>com.nortelnetworks.ims.cap.mediaportal.host</package>
<class>PortsOMRow</class>
<usage name="rtpMpPortUsage">
<measures>
BCP port usage
</measures>
<description>
Meter showing number of ports in use.
</description>
</usage>
<lwGauge name="connMapEntriesLWM">
<measures>
Lowest simultaneous port usage
</measures>
<description>
Lowest number of
simultaneous ports detected to be in
use during the collection interval
</description>
</lwGauge>
<hwGauge name="connMapEntriesHWM">
<measures>
Highest simultaneous port usage
</measures>
<description>
Highest number of
simultaneous ports detected to be in
use during the collection interval.
</description>
</hwGauge>
<waterMark name="connMapEntries">
<measures>
Connections map entries
</measures>
<description>
Meter showing the number of connections in the host
CPU connection map.
</description>
<bwg lwref="connMapEntriesLWM" hwref="connMapEntriesHWM"/>
</waterMark>
<counter name="portUsageSampleCnt">
<measures>
Usage sample count
</measures>
<description>
The number of 100-second samples taken during the
collection interval contributing to the average report.
</description>
</counter>
<counter name="sampledRtpMpPortUsage">
<measures>
In-use ports usage
</measures>
<description>
Provides the sum of the in-use ports every 100 seconds.
</description>
</counter>
<precollector>
<package>com.nortelnetworks.ims.cap.mediaportal.host</package>
<class>PortsOMCenturyPrecollector</class>
<collector>centurySecond</collector>
</precollector>
</row>
<documentation>
<description>
</description>
<rowKey>
</rowKey>
</documentation>
<generatedOn>
<list>
<ne>sessmgr</ne>
<ne>rtpportal</ne>
</list>
</generatedOn>
</group>
</omGroups>
下面的代码应该获取 GeneralName 并将其在 csv 文件中显示正确的次数,但我无法让它显示任何内容。
from xml.etree import ElementTree
import csv
from copy import copy
import lxml.etree
tree = ElementTree.parse('OM.xml')
sitescope_data = open('OMFileConverted.csv', 'w', newline='', encoding='utf-8')
csvwriter = csv.writer(sitescope_data)
#Create all needed columns here in order and writes them to excel file
col_names = ['name', 'OMRegister']
csvwriter.writerow(col_names)
def recurse(root, props):
for child in root:
if child.tag == '{urn:nortel:namespaces:mcp:oms}group':
p2 = copy(props)
for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'):
event_id = event.find('{urn:nortel:namespaces:mcp:oms}name')
if event_id != None:
p2['name'] = event_id.text
recurse(child, p2)
else:
recurse(child, props)
for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'):
event_data = [props.get('name')]
csvwriter.writerow(event_data)
root = tree.getroot()
recurse(root,{}) #root + empty dictionary
sitescope_data.close()
如果 xml_string
是问题中的 XML 片段,则此脚本:
import csv
from bs4 import BeautifulSoup
soup = BeautifulSoup(xml_string, 'html.parser')
with open('data.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
writer.writerow(['General name:SpecificName', 'RegisterType'])
for item in soup.select('row [name]'):
writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name])
生成 data.csv
(来自 LibreOffice 的屏幕截图):
编辑:要将 measures
标记放入列中,您可以执行以下操作:
import csv
from bs4 import BeautifulSoup
soup = BeautifulSoup(xml_string, 'html.parser')
with open('data.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
writer.writerow(['General name:SpecificName', 'RegisterType', 'Measures'])
for item in soup.select('row [name]'):
writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name, item.find('measures').get_text(strip=True)])
生产: