使用 etree 解析 xml 时无法 return 子值
Unable to return child values when parsing xml using etree
我有一个 xml 文件看起来像
<?xml version="1.0" encoding="UTF-8" ?>
<FullReport
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<firm>1426</firm>
<reportDate>07FEB2020_18:00:00.000000</reportDate>
<rooms>
<room>
<roomID>PCHAT-0x0000000000000637</roomID>
<roomTitle>FX - WBB - CTON</roomTitle>
<description>global chat</description>
<creationDate></creationDate>
<removalDate></removalDate>
<lastActivityDate>02/07/2020 12:26:24</lastActivityDate>
<status>Active</status>
<membership>Bilateral</membership>
<isAnonymous>false</isAnonymous>
<hasActiveAdmins>true</hasActiveAdmins>
<activeUserCount>17</activeUserCount>
<distinctFirmsInRoom>2</distinctFirmsInRoom>
<isInternalOnly>false</isInternalOnly>
<isIncognitoForum>false</isIncognitoForum>
</room>
<users>
<uuid>6820</uuid>
<bbgEmail>SJONES@Bloomberg.net</bbgEmail>
<fullName>SEAN JONES</fullName>
<firmName>BANK OF TEST</firmName>
<firmNumber>1400</firmNumber>
<accountNumber>51067</accountNumber>
<accountName>BANK OF TEST</accountName>
<inviteDate>01/07/2013 22:00:39</inviteDate>
<isDeleted>false</isDeleted>
<isAdmin>false</isAdmin>
<isCreator>false</isCreator>
<roomAlias>CTON</roomAlias>
<corpEmail>sean.jones@botest.com</corpEmail>
<city>LONDON</city>
</users>
<users>
<uuid>6820</uuid>
<bbgEmail>SSMITH@Bloomberg.net</bbgEmail>
<fullName>SEAN SMITH</fullName>
<firmName>BANK OF TEST</firmName>
<firmNumber>1400</firmNumber>
<accountNumber>51067</accountNumber>
<accountName>BANK OF TEST</accountName>
<inviteDate>01/07/2013 22:00:39</inviteDate>
<isDeleted>false</isDeleted>
<isAdmin>false</isAdmin>
<isCreator>false</isCreator>
<roomAlias>CTON</roomAlias>
<corpEmail>sean.smith@botest.com</corpEmail>
<city>LONDON</city>
</users>
</FullReport>
</FullReport>
我正在使用以下代码解析数据:
import xml.etree.cElementTree as et
tree = et.parse('test.xml')
print (tree.getroot())
root = tree.getroot()
print ("tag=%s, attrib=%s" % (root.tag, root.attrib))
for child in root:
print (child.tag, child.attrib)
if child.tag == "room":
for step_child in child:
print (step_child.tag)
# get the information via the children!
print ("-" * 40)
print ("Iterating using a getchildren()")
print ("-" * 40)
rooms = root.getchildren()
for room in rooms:
room_children = room.getchildren()
for room_child in room_children:
print ("%s=%s" % (room_child.tag, room_child.text))
当我打印 room_child.tag 和 room_child.text 时,我没有看到这些值?对此很陌生,所以我不确定我错过了什么?
返回的结果是:
我的最终目标是遍历每个值并转换为 CSV,但我无法访问这些值并且不确定为什么没有数据返回。我希望最终的 csv 看起来类似于 csv
中每个用户的一行
当你 运行 for child in root:
时,这个循环只迭代
FullReport 的直接后代,在你的例子中:firm,
reportDate和rooms,所以它没有机会到达room,这是
位于低一级。
从你的代码(第一个循环)我看出你真的很感兴趣
在 FullReport/rooms/room.
的直系后代中
要打印他们的标签名称和文本内容,您可以运行例如:
for child in root.iter('room'):
for step_child in child:
print(f'{step_child.tag:20} {step_child.text}')
对于您的样本输入,结果是:
roomID PCHAT-0x0000000000000637
roomTitle FX - WBB - CTON
description global chat
creationDate None
removalDate None
lastActivityDate 02/07/2020 12:26:24
status Active
membership Bilateral
isAnonymous false
hasActiveAdmins true
activeUserCount 17
distinctFirmsInRoom 2
isInternalOnly false
isIncognitoForum false
就你的最终任务而言,你可以运行:
rows = []
for child in root.iter('rooms'):
roomId, roomTitle = 'id', 'ttl'
for it in child:
if it.tag == 'room':
roomId = it.findtext('roomID')
roomTitle = it.findtext('roomTitle')
elif it.tag == 'users':
rows.append([roomId, roomTitle, it.findtext('uuid'), it.findtext('bbgEmail'),
it.findtext('fullName'), it.findtext('firmName')])
df = pd.DataFrame(rows, columns=['roomId', 'roomTitle', 'uuid', 'bbgEmail',
'fullName', 'firmName'])
上面的代码是基于 room 元素先出现的假设
在它们之后是 users 个元素。
并添加有关 post 中未显示的任何其他列的代码。
我有一个 xml 文件看起来像
<?xml version="1.0" encoding="UTF-8" ?>
<FullReport
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<firm>1426</firm>
<reportDate>07FEB2020_18:00:00.000000</reportDate>
<rooms>
<room>
<roomID>PCHAT-0x0000000000000637</roomID>
<roomTitle>FX - WBB - CTON</roomTitle>
<description>global chat</description>
<creationDate></creationDate>
<removalDate></removalDate>
<lastActivityDate>02/07/2020 12:26:24</lastActivityDate>
<status>Active</status>
<membership>Bilateral</membership>
<isAnonymous>false</isAnonymous>
<hasActiveAdmins>true</hasActiveAdmins>
<activeUserCount>17</activeUserCount>
<distinctFirmsInRoom>2</distinctFirmsInRoom>
<isInternalOnly>false</isInternalOnly>
<isIncognitoForum>false</isIncognitoForum>
</room>
<users>
<uuid>6820</uuid>
<bbgEmail>SJONES@Bloomberg.net</bbgEmail>
<fullName>SEAN JONES</fullName>
<firmName>BANK OF TEST</firmName>
<firmNumber>1400</firmNumber>
<accountNumber>51067</accountNumber>
<accountName>BANK OF TEST</accountName>
<inviteDate>01/07/2013 22:00:39</inviteDate>
<isDeleted>false</isDeleted>
<isAdmin>false</isAdmin>
<isCreator>false</isCreator>
<roomAlias>CTON</roomAlias>
<corpEmail>sean.jones@botest.com</corpEmail>
<city>LONDON</city>
</users>
<users>
<uuid>6820</uuid>
<bbgEmail>SSMITH@Bloomberg.net</bbgEmail>
<fullName>SEAN SMITH</fullName>
<firmName>BANK OF TEST</firmName>
<firmNumber>1400</firmNumber>
<accountNumber>51067</accountNumber>
<accountName>BANK OF TEST</accountName>
<inviteDate>01/07/2013 22:00:39</inviteDate>
<isDeleted>false</isDeleted>
<isAdmin>false</isAdmin>
<isCreator>false</isCreator>
<roomAlias>CTON</roomAlias>
<corpEmail>sean.smith@botest.com</corpEmail>
<city>LONDON</city>
</users>
</FullReport>
</FullReport>
我正在使用以下代码解析数据:
import xml.etree.cElementTree as et
tree = et.parse('test.xml')
print (tree.getroot())
root = tree.getroot()
print ("tag=%s, attrib=%s" % (root.tag, root.attrib))
for child in root:
print (child.tag, child.attrib)
if child.tag == "room":
for step_child in child:
print (step_child.tag)
# get the information via the children!
print ("-" * 40)
print ("Iterating using a getchildren()")
print ("-" * 40)
rooms = root.getchildren()
for room in rooms:
room_children = room.getchildren()
for room_child in room_children:
print ("%s=%s" % (room_child.tag, room_child.text))
当我打印 room_child.tag 和 room_child.text 时,我没有看到这些值?对此很陌生,所以我不确定我错过了什么?
返回的结果是:
我的最终目标是遍历每个值并转换为 CSV,但我无法访问这些值并且不确定为什么没有数据返回。我希望最终的 csv 看起来类似于 csv
中每个用户的一行当你 运行 for child in root:
时,这个循环只迭代
FullReport 的直接后代,在你的例子中:firm,
reportDate和rooms,所以它没有机会到达room,这是
位于低一级。
从你的代码(第一个循环)我看出你真的很感兴趣 在 FullReport/rooms/room.
的直系后代中要打印他们的标签名称和文本内容,您可以运行例如:
for child in root.iter('room'):
for step_child in child:
print(f'{step_child.tag:20} {step_child.text}')
对于您的样本输入,结果是:
roomID PCHAT-0x0000000000000637
roomTitle FX - WBB - CTON
description global chat
creationDate None
removalDate None
lastActivityDate 02/07/2020 12:26:24
status Active
membership Bilateral
isAnonymous false
hasActiveAdmins true
activeUserCount 17
distinctFirmsInRoom 2
isInternalOnly false
isIncognitoForum false
就你的最终任务而言,你可以运行:
rows = []
for child in root.iter('rooms'):
roomId, roomTitle = 'id', 'ttl'
for it in child:
if it.tag == 'room':
roomId = it.findtext('roomID')
roomTitle = it.findtext('roomTitle')
elif it.tag == 'users':
rows.append([roomId, roomTitle, it.findtext('uuid'), it.findtext('bbgEmail'),
it.findtext('fullName'), it.findtext('firmName')])
df = pd.DataFrame(rows, columns=['roomId', 'roomTitle', 'uuid', 'bbgEmail',
'fullName', 'firmName'])
上面的代码是基于 room 元素先出现的假设 在它们之后是 users 个元素。
并添加有关 post 中未显示的任何其他列的代码。