如何使用命名空间从 XML 中提取嵌套值?
How to extract nested values from XML with namespaces?
我正在尝试从以下 XML 文件中提取一些数据。
<?xml version="1.0" encoding="utf-8"?>
<go-home-1:GOHOMEV1 xmlns:go-home-1="https://sample.com/GO-HOME-V1">
<HOMEV1FileHeader>
<FileCreationTimestamp>2020-02-15T08:29:22+01:00</FileCreationTimestamp>
<FileType>AB716</FileType>
<SGO>YIFG</SGO>
</HOMEV1FileHeader>
<OI>
<ON>YIFG4</ON>
<CI>HYU</CI>
<NL>
<NT>
<GOCode>HYU34</GOCode>
<NTName>HYUFFT - 11</NTName>
<NTData>
<RIS>
<RI>
<EDC>2020-01-18</EDC>
<E4NS>
<GNS>
<RD>
<NR>
<CC>9012</CC>
<NDC>411</NDC>
<SRng>
<SRngStart>000</SRngStart>
<SRngStop>999</SRngStop>
</SRng>
</NR>
</RD>
<RD>
<NR>
<CC>834</CC>
<NDC>101</NDC>
<SRng>
<SRngStart>150</SRngStart>
<SRngStop>295</SRngStop>
</SRng>
</NR>
</RD>
</GNS>
</E4NS>
<E2NS>
<MCC>111</MCC>
<MNC>222</MNC>
</E2NS>
<E2G>
<MGT_CC>9012</MGT_CC>
<MGT_NC>4113</MGT_NC>
</E2G>
</RI>
</RIS>
</NTData>
</NT>
</NL>
</OI>
</go-home-1:GOHOMEV1>
我的预期输出如下,第一个字段是 SGO。
我的尝试如下(借鉴这里的想法)
但我遇到了一些错误或空列表(对于 sgo = root.find()...
和 A = root.findall()...
),我被卡住了。感谢您的帮助。
import xml.etree.ElementTree as ET
import glob, os
filename = "file.xml"
namespaces = {
"go-home-1": "https://sample.com/GO-HOME-V1"
}
root = ET.parse(filename).getroot()
# For this sgo = root.find()... I get ERROR << AttributeError: 'NoneType' object has no attribute 'text'>>
sgo = root.find("go-home-1:HOMEV1FileHeader/"
"go-home-1:SGO", namespaces).text
### For below I'm getting empty list A = [] and I don't know why.
A = root.findall(
"go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
"go-home-1:GNS/"
"go-home-1:RD/"
"go-home-1:NR", namespaces)
for item1 in A:
Result = [sgo]
cc = item1.find("go-home-1:CC", namespaces).text
ndc = item1.find("go-home-1:NDC", namespaces).text
Result.append(cc)
Result.append(ndc)
B = item1.findall(
"go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
"go-home-1:GNS/"
"go-home-1:RD/"
"go-home-1:NR/"
"go-home-1:SRng", namespaces)
for item2 in B:
RngStart = item2.find("go-home-1:SRngStart", namespaces).text
RngStop = item2.find("go-home-1:SRngStop", namespaces).text
Result.append(RngStart)
Result.append(RngStop)
print(Result)
在这个特定的 xml 中,考虑到预期的输出,命名空间并不是真正必要的。此外,我认为呈现输出的最佳方式是使用数据框。
import pandas as pd
columns = ['SGO', 'MCC','MNC','MGT_CC','MGT_NC','CC','NDC','SRngStart','SRngStop']
sgo = root.find('.//SGO').text
mcc = root.find('.//MCC').text
mnc = root.find('.//MNC').text
mgt_cc = root.find('.//MGT_CC').text
mgt_nc = root.find('.//MGT_NC').text
rows = []
for entry in root.findall('.//RD'):
row = []
cc = entry.find('.//CC').text
ndc = entry.find('.//NDC').text
srngstart = entry.find('.//SRngStart').text
srngstop = entry.find('.//SRngStop').text
row.extend([sgo,mcc,mnc,mgt_cc,mgt_nc,cc,ndc,srngstart,srngstop])
rows.append(row)
df = pd.DataFrame(rows, columns=columns)
df
输出:
SGO MCC MNC MGT_CC MGT_NC CC NDC SRngStart SRngStop
0 YIFG 111 222 9012 4113 9012 411 000 999
1 YIFG 111 222 9012 4113 834 101 150 295
我正在尝试从以下 XML 文件中提取一些数据。
<?xml version="1.0" encoding="utf-8"?>
<go-home-1:GOHOMEV1 xmlns:go-home-1="https://sample.com/GO-HOME-V1">
<HOMEV1FileHeader>
<FileCreationTimestamp>2020-02-15T08:29:22+01:00</FileCreationTimestamp>
<FileType>AB716</FileType>
<SGO>YIFG</SGO>
</HOMEV1FileHeader>
<OI>
<ON>YIFG4</ON>
<CI>HYU</CI>
<NL>
<NT>
<GOCode>HYU34</GOCode>
<NTName>HYUFFT - 11</NTName>
<NTData>
<RIS>
<RI>
<EDC>2020-01-18</EDC>
<E4NS>
<GNS>
<RD>
<NR>
<CC>9012</CC>
<NDC>411</NDC>
<SRng>
<SRngStart>000</SRngStart>
<SRngStop>999</SRngStop>
</SRng>
</NR>
</RD>
<RD>
<NR>
<CC>834</CC>
<NDC>101</NDC>
<SRng>
<SRngStart>150</SRngStart>
<SRngStop>295</SRngStop>
</SRng>
</NR>
</RD>
</GNS>
</E4NS>
<E2NS>
<MCC>111</MCC>
<MNC>222</MNC>
</E2NS>
<E2G>
<MGT_CC>9012</MGT_CC>
<MGT_NC>4113</MGT_NC>
</E2G>
</RI>
</RIS>
</NTData>
</NT>
</NL>
</OI>
</go-home-1:GOHOMEV1>
我的预期输出如下,第一个字段是 SGO。
我的尝试如下(借鉴这里的想法sgo = root.find()...
和 A = root.findall()...
),我被卡住了。感谢您的帮助。
import xml.etree.ElementTree as ET
import glob, os
filename = "file.xml"
namespaces = {
"go-home-1": "https://sample.com/GO-HOME-V1"
}
root = ET.parse(filename).getroot()
# For this sgo = root.find()... I get ERROR << AttributeError: 'NoneType' object has no attribute 'text'>>
sgo = root.find("go-home-1:HOMEV1FileHeader/"
"go-home-1:SGO", namespaces).text
### For below I'm getting empty list A = [] and I don't know why.
A = root.findall(
"go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
"go-home-1:GNS/"
"go-home-1:RD/"
"go-home-1:NR", namespaces)
for item1 in A:
Result = [sgo]
cc = item1.find("go-home-1:CC", namespaces).text
ndc = item1.find("go-home-1:NDC", namespaces).text
Result.append(cc)
Result.append(ndc)
B = item1.findall(
"go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
"go-home-1:GNS/"
"go-home-1:RD/"
"go-home-1:NR/"
"go-home-1:SRng", namespaces)
for item2 in B:
RngStart = item2.find("go-home-1:SRngStart", namespaces).text
RngStop = item2.find("go-home-1:SRngStop", namespaces).text
Result.append(RngStart)
Result.append(RngStop)
print(Result)
在这个特定的 xml 中,考虑到预期的输出,命名空间并不是真正必要的。此外,我认为呈现输出的最佳方式是使用数据框。
import pandas as pd
columns = ['SGO', 'MCC','MNC','MGT_CC','MGT_NC','CC','NDC','SRngStart','SRngStop']
sgo = root.find('.//SGO').text
mcc = root.find('.//MCC').text
mnc = root.find('.//MNC').text
mgt_cc = root.find('.//MGT_CC').text
mgt_nc = root.find('.//MGT_NC').text
rows = []
for entry in root.findall('.//RD'):
row = []
cc = entry.find('.//CC').text
ndc = entry.find('.//NDC').text
srngstart = entry.find('.//SRngStart').text
srngstop = entry.find('.//SRngStop').text
row.extend([sgo,mcc,mnc,mgt_cc,mgt_nc,cc,ndc,srngstart,srngstop])
rows.append(row)
df = pd.DataFrame(rows, columns=columns)
df
输出:
SGO MCC MNC MGT_CC MGT_NC CC NDC SRngStart SRngStop
0 YIFG 111 222 9012 4113 9012 411 000 999
1 YIFG 111 222 9012 4113 834 101 150 295