使用 python 将 xml 个文件批量导出到 csv
Batch export xml files to csv using python
我是 python 的新手,所以请多多包涵我的愚蠢问题
我有以下格式的多个 xml,我想在这些 xml 中提取某些标签并将它们导出到单个 csv 文件。
这里是 xml (c:\xml.xml)
的例子
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="emotionStyleSheet_template.xsl"?>
<EmotionReport>
<VersionInformation>
<Version>8.2.0</Version>
</VersionInformation>
<DateTime>
<Date>18-10-2021</Date>
<Time>14-12-26</Time>
</DateTime>
<SourceInformation>
<File>
<FilePath>//nas/emotionxml</FilePath>
<FileName>file001.mxf</FileName>
<FileSize>9972536969</FileSize>
<FileAudioInformation>
<AudioDuration>1345.0</AudioDuration>
<SampleRate>48000</SampleRate>
<NumChannels>8</NumChannels>
<BitsPerSample>24</BitsPerSample>
<AudioSampleGroups>64560000</AudioSampleGroups>
<NumStreams>8</NumStreams>
<Container>Undefined Sound</Container>
<Description>IMC Nexio
</Description>
<StreamInformation>
<Stream>
<StreamNumber>1</StreamNumber>
<NumChannelsInStream>1</NumChannelsInStream>
<Channel>
<ChannelNumber>1</ChannelNumber>
<ChannelEncoding>PCM</ChannelEncoding>
</Channel>
</Stream>
<Stream>
<StreamNumber>2</StreamNumber>
<NumChannelsInStream>1</NumChannelsInStream>
<Channel>
<ChannelNumber>1</ChannelNumber>
<ChannelEncoding>PCM</ChannelEncoding>
</Channel>
</Stream>
</StreamInformation>
<FileTimecodeInformation>
<FrameRate>25.00</FrameRate>
<DropFrame>false</DropFrame>
<StartTimecode>00:00:00:00</StartTimecode>
</FileTimecodeInformation>
</FileAudioInformation>
</File>
</SourceInformation>
</EmotionReport>
期待输出结果(EmotionData.csv)
,Date,Time,FileName,Description,FileSize,FilePath
0,18-10-2021,14-12-26,file001.mxf,IMC Nexio,9972536969,//nas/emotionxml
1,13-10-2021,08-12-26,file002.mxf,IMC Nexio,3566536770,//nas/emotionxml
2,03-10-2021,02-09-21,file003.mxf,IMC Nexio,46357672,//nas/emotionxml
....
这是我根据从网上资源 (emotion_xml_parser.py) 学到的知识编写的代码:
import xml.etree.ElementTree as ET
import glob2
import pandas as pd
cols = ["Date", "Time", "FileName", "Description", "FileSize", "FilePath"]
rows = []
for filename in glob2.glob(r'C:\xml\*.xml'):
xmlData = ET.parse(filename)
rootXML = xmlData.getroot()
for i in rootXML:
Date = i.findall("Date").text
Time = i.findall("Time").text
FileName = i.findall("FileName").text
Description = i.findall("Description").text
FileSize = i.findall("FileSize").text
FilePath = i.findall("FilePath").text
row.append({"Date": Date,
"Time": Time,
"FileName": FileName,
"Description": Description,
"FileSize": FileSize,
"FilePath": FilePath,})
df = pd.DataFrame(rows,columns = cols)
# Write dataframe to csv
df.to_csv("EmotionData.csv")
我在 运行 脚本
时收到以下错误
File "c:\emtion_xml_parser.py", line 14, in <module>
Date = i.findall("Date").text
AttributeError: 'list' object has no attribute 'text'
TIA!
更好的方法是为您需要的每个元素提供完整路径,例如:
import xml.etree.ElementTree as ET
import glob2
import pandas as pd
cols = ["Date", "Time", "FileName", "Description", "FileSize", "FilePath"]
rows = []
for filename in glob2.glob(r'*.xml'):
xmlData = ET.parse(filename)
root = xmlData.getroot()
row = {
'Date' : root.findtext('DateTime/Date'),
'Time' : root.findtext('DateTime/Time'),
'FileName' : root.findtext('SourceInformation/File/FileName'),
'Description' : root.findtext('SourceInformation/File/FileAudioInformation/Description').strip(),
'FileSize' : root.findtext('SourceInformation/File/FileSize'),
'FilePath' : root.findtext('SourceInformation/File/FilePath')
}
rows.append(row)
df = pd.DataFrame(rows, columns=cols)
# Write dataframe to csv
df.to_csv("EmotionData.csv")
给你:
,Date,Time,FileName,Description,FileSize,FilePath
0,18-10-2021,14-12-26,file001.mxf,IMC Nexio,9972536969,//nas/emotionxml
我是 python 的新手,所以请多多包涵我的愚蠢问题 我有以下格式的多个 xml,我想在这些 xml 中提取某些标签并将它们导出到单个 csv 文件。
这里是 xml (c:\xml.xml)
的例子<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="emotionStyleSheet_template.xsl"?>
<EmotionReport>
<VersionInformation>
<Version>8.2.0</Version>
</VersionInformation>
<DateTime>
<Date>18-10-2021</Date>
<Time>14-12-26</Time>
</DateTime>
<SourceInformation>
<File>
<FilePath>//nas/emotionxml</FilePath>
<FileName>file001.mxf</FileName>
<FileSize>9972536969</FileSize>
<FileAudioInformation>
<AudioDuration>1345.0</AudioDuration>
<SampleRate>48000</SampleRate>
<NumChannels>8</NumChannels>
<BitsPerSample>24</BitsPerSample>
<AudioSampleGroups>64560000</AudioSampleGroups>
<NumStreams>8</NumStreams>
<Container>Undefined Sound</Container>
<Description>IMC Nexio
</Description>
<StreamInformation>
<Stream>
<StreamNumber>1</StreamNumber>
<NumChannelsInStream>1</NumChannelsInStream>
<Channel>
<ChannelNumber>1</ChannelNumber>
<ChannelEncoding>PCM</ChannelEncoding>
</Channel>
</Stream>
<Stream>
<StreamNumber>2</StreamNumber>
<NumChannelsInStream>1</NumChannelsInStream>
<Channel>
<ChannelNumber>1</ChannelNumber>
<ChannelEncoding>PCM</ChannelEncoding>
</Channel>
</Stream>
</StreamInformation>
<FileTimecodeInformation>
<FrameRate>25.00</FrameRate>
<DropFrame>false</DropFrame>
<StartTimecode>00:00:00:00</StartTimecode>
</FileTimecodeInformation>
</FileAudioInformation>
</File>
</SourceInformation>
</EmotionReport>
期待输出结果(EmotionData.csv)
,Date,Time,FileName,Description,FileSize,FilePath
0,18-10-2021,14-12-26,file001.mxf,IMC Nexio,9972536969,//nas/emotionxml
1,13-10-2021,08-12-26,file002.mxf,IMC Nexio,3566536770,//nas/emotionxml
2,03-10-2021,02-09-21,file003.mxf,IMC Nexio,46357672,//nas/emotionxml
....
这是我根据从网上资源 (emotion_xml_parser.py) 学到的知识编写的代码:
import xml.etree.ElementTree as ET
import glob2
import pandas as pd
cols = ["Date", "Time", "FileName", "Description", "FileSize", "FilePath"]
rows = []
for filename in glob2.glob(r'C:\xml\*.xml'):
xmlData = ET.parse(filename)
rootXML = xmlData.getroot()
for i in rootXML:
Date = i.findall("Date").text
Time = i.findall("Time").text
FileName = i.findall("FileName").text
Description = i.findall("Description").text
FileSize = i.findall("FileSize").text
FilePath = i.findall("FilePath").text
row.append({"Date": Date,
"Time": Time,
"FileName": FileName,
"Description": Description,
"FileSize": FileSize,
"FilePath": FilePath,})
df = pd.DataFrame(rows,columns = cols)
# Write dataframe to csv
df.to_csv("EmotionData.csv")
我在 运行 脚本
时收到以下错误 File "c:\emtion_xml_parser.py", line 14, in <module>
Date = i.findall("Date").text
AttributeError: 'list' object has no attribute 'text'
TIA!
更好的方法是为您需要的每个元素提供完整路径,例如:
import xml.etree.ElementTree as ET
import glob2
import pandas as pd
cols = ["Date", "Time", "FileName", "Description", "FileSize", "FilePath"]
rows = []
for filename in glob2.glob(r'*.xml'):
xmlData = ET.parse(filename)
root = xmlData.getroot()
row = {
'Date' : root.findtext('DateTime/Date'),
'Time' : root.findtext('DateTime/Time'),
'FileName' : root.findtext('SourceInformation/File/FileName'),
'Description' : root.findtext('SourceInformation/File/FileAudioInformation/Description').strip(),
'FileSize' : root.findtext('SourceInformation/File/FileSize'),
'FilePath' : root.findtext('SourceInformation/File/FilePath')
}
rows.append(row)
df = pd.DataFrame(rows, columns=cols)
# Write dataframe to csv
df.to_csv("EmotionData.csv")
给你:
,Date,Time,FileName,Description,FileSize,FilePath
0,18-10-2021,14-12-26,file001.mxf,IMC Nexio,9972536969,//nas/emotionxml