我想知道为什么我的代码在 XML 信息之间生成空白行?
I want to know Why my code makes blank rows between XMLinformation?
XML每条信息出现16个空格
我正在使用 python 3.7.1。我在 "for k in tree.iter('bar'):" 中添加了 "row += 1"
但是,第二行仅显示最后 XML 条信息。
XML代码(地产(2)。xml):
<bar>
<F169 id="F169.777568">
<pos>193239.0950999996 456314.7006000001</pos>
<X_CRDNT>193239.0951</X_CRDNT>
<Y_CRDNT>456314.7006</Y_CRDNT>
<PNU>1138010600100330007</PNU>
<LD_CPSG_CODE>11380</LD_CPSG_CODE>
<LD_EMD_LI_CODE>10600</LD_EMD_LI_CODE>
<REGSTR_SE_CODE>1</REGSTR_SE_CODE>
<MNNM>0033</MNNM>
<SLNO>0007</SLNO>
<SYS_REGIST_NO>113802018000058</SYS_REGIST_NO>
<STTUS_SE_CODE>1</STTUS_SE_CODE>
<LAST_SEQ_NO>1</LAST_SEQ_NO>
<BSNM_CMPNM>½ºÅ¸°øÀÎÁß°³»ç»ç¹«¼Ò</BSNM_CMPNM>
<EMPLYM_CO>1</EMPLYM_CO>
<FRST_REGIST_DT>2018-11-04T09:59:00</FRST_REGIST_DT>
</F169>
<F169 id="F169.777569">
<pos>193239.0950999996 456314.7006000001</pos>
<X_CRDNT>193239.0952</X_CRDNT>
<Y_CRDNT>456314.7007</Y_CRDNT>
<PNU>1138010600100330007</PNU>
<LD_CPSG_CODE>11380</LD_CPSG_CODE>
<LD_EMD_LI_CODE>10600</LD_EMD_LI_CODE>
<REGSTR_SE_CODE>1</REGSTR_SE_CODE>
<MNNM>0033</MNNM>
<SLNO>0007</SLNO>
<SYS_REGIST_NO>113802018000058</SYS_REGIST_NO>
<STTUS_SE_CODE>1</STTUS_SE_CODE>
<LAST_SEQ_NO>1</LAST_SEQ_NO>
<BSNM_CMPNM>½ºÅ¸°øÀÎÁß°³»ç»ç¹«¼Ò</BSNM_CMPNM>
<EMPLYM_CO>1</EMPLYM_CO>
<FRST_REGIST_DT>2018-11-04T09:59:00</FRST_REGIST_DT>
</F169>
</bar>
python代码:
import xml.etree.ElementTree as ET
import xlsxwriter
workbook = xlsxwriter.Workbook("parse.xlsx")
worksheet = workbook.add_worksheet()
bold = workbook.add_format({"bold":1})
tree = ET.parse('estate(2).xml')
col = 0
i=0
row = 0
plus_row = 1
print(tree.getiterator())
for k in tree.iter('bar'):
for j in k.iter():
print(j.text)
worksheet.write(row, col, j.findtext("pos"))
worksheet.write(row, col+1, j.findtext("X_CRDNT"))
worksheet.write(row, col+2, j.findtext("Y_CRDNT"))
worksheet.write(row, col+3, j.findtext("PNU"))
worksheet.write(row, col+4, j.findtext("LD_CPSG_CODE"))
worksheet.write(row, col+5, j.findtext("LD_EMD_LI_CODE"))
worksheet.write(row, col+6, j.findtext("REGSTR_SE_CODE"))
worksheet.write(row, col+7, j.findtext("MNNM"))
worksheet.write(row, col+8, j.findtext("SLNO"))
worksheet.write(row, col+9, j.findtext("SYS_REGIST_NO"))
worksheet.write(row, col+10, j.findtext("BSNM_CMPNM"))
worksheet.write(row, col+11, j.findtext("EMPLYM_CO"))
worksheet.write(row, col+12, j.findtext("FRST_REGIST_DT"))
if j.get("ETC_ADRES") is not "true":
worksheet.write(row, col+13, j.findtext("ETC_ADRES"))
row += 1
"""
j = 0
if i is 0:
row += 1
i+=1
elif i >= 1 and i<=16:
continue
elif i > 16:
i = 0
"""
print(k.iter())
#convert to .xlsx
worksheet.write("A1", 'pos', bold)
worksheet.write("B1", 'X_CRDNT', bold)
worksheet.write("C1", 'Y_CRDNT', bold)
worksheet.write("D1", 'PNU', bold)
worksheet.write("E1", 'LD_CPSG_CODE', bold)
worksheet.write("F1", 'LD_EMD_LI_CODE', bold)
worksheet.write("G1", 'REGSTR_SE_CODE', bold)
worksheet.write("H1", 'MNNM', bold)
worksheet.write("I1", 'SLNO', bold)
worksheet.write("J1", 'SYS_REGIST_NO', bold)
worksheet.write("K1", 'BSNM_CMPNM', bold)
worksheet.write("L1", 'EMPLYM_CO', bold)
worksheet.write("M1", 'FRST_REGIST_DT', bold)
workbook.close()
此python代码将写入parse.xlsx estate(2) 的信息。xml
XML 中的信息出现在 Excel 文件中,没有空白。
excel file with blanks
你写的代码有两个问题:
在下面的嵌套循环中,k
遍历所有 bar
元素(其中只有一个),然后 j
遍历k
:
的所有后代元素
for k in tree.iter('bar'):
for j in k.iter():
print(j.text)
# ...
当 j
是 <f169>
元素之一时,从其 child 元素读取一行数据。但是,j
也贯穿 <bar>
元素的后代元素(<pos>
、<X_CRDNT>
、<Y_CRDNT>
)。这些其他元素没有任何 child 元素,因此如果 j
是其中之一,则 j.findtext('MNNM')
将 return 什么都没有。因此,对于 <f169>
.
的每个后代元素,您都会得到一个空白行
修复方法是将 for j in k.iter():
替换为 for j in k.iter('f169'):
。这样,j
仅遍历 <f169>
个元素。
最后,您在数据的第一行上写上 headers。通过在开始时设置 row = 1
而不是 row = 0
来避免此问题。
XML每条信息出现16个空格
我正在使用 python 3.7.1。我在 "for k in tree.iter('bar'):" 中添加了 "row += 1" 但是,第二行仅显示最后 XML 条信息。
XML代码(地产(2)。xml):
<bar>
<F169 id="F169.777568">
<pos>193239.0950999996 456314.7006000001</pos>
<X_CRDNT>193239.0951</X_CRDNT>
<Y_CRDNT>456314.7006</Y_CRDNT>
<PNU>1138010600100330007</PNU>
<LD_CPSG_CODE>11380</LD_CPSG_CODE>
<LD_EMD_LI_CODE>10600</LD_EMD_LI_CODE>
<REGSTR_SE_CODE>1</REGSTR_SE_CODE>
<MNNM>0033</MNNM>
<SLNO>0007</SLNO>
<SYS_REGIST_NO>113802018000058</SYS_REGIST_NO>
<STTUS_SE_CODE>1</STTUS_SE_CODE>
<LAST_SEQ_NO>1</LAST_SEQ_NO>
<BSNM_CMPNM>½ºÅ¸°øÀÎÁß°³»ç»ç¹«¼Ò</BSNM_CMPNM>
<EMPLYM_CO>1</EMPLYM_CO>
<FRST_REGIST_DT>2018-11-04T09:59:00</FRST_REGIST_DT>
</F169>
<F169 id="F169.777569">
<pos>193239.0950999996 456314.7006000001</pos>
<X_CRDNT>193239.0952</X_CRDNT>
<Y_CRDNT>456314.7007</Y_CRDNT>
<PNU>1138010600100330007</PNU>
<LD_CPSG_CODE>11380</LD_CPSG_CODE>
<LD_EMD_LI_CODE>10600</LD_EMD_LI_CODE>
<REGSTR_SE_CODE>1</REGSTR_SE_CODE>
<MNNM>0033</MNNM>
<SLNO>0007</SLNO>
<SYS_REGIST_NO>113802018000058</SYS_REGIST_NO>
<STTUS_SE_CODE>1</STTUS_SE_CODE>
<LAST_SEQ_NO>1</LAST_SEQ_NO>
<BSNM_CMPNM>½ºÅ¸°øÀÎÁß°³»ç»ç¹«¼Ò</BSNM_CMPNM>
<EMPLYM_CO>1</EMPLYM_CO>
<FRST_REGIST_DT>2018-11-04T09:59:00</FRST_REGIST_DT>
</F169>
</bar>
python代码:
import xml.etree.ElementTree as ET
import xlsxwriter
workbook = xlsxwriter.Workbook("parse.xlsx")
worksheet = workbook.add_worksheet()
bold = workbook.add_format({"bold":1})
tree = ET.parse('estate(2).xml')
col = 0
i=0
row = 0
plus_row = 1
print(tree.getiterator())
for k in tree.iter('bar'):
for j in k.iter():
print(j.text)
worksheet.write(row, col, j.findtext("pos"))
worksheet.write(row, col+1, j.findtext("X_CRDNT"))
worksheet.write(row, col+2, j.findtext("Y_CRDNT"))
worksheet.write(row, col+3, j.findtext("PNU"))
worksheet.write(row, col+4, j.findtext("LD_CPSG_CODE"))
worksheet.write(row, col+5, j.findtext("LD_EMD_LI_CODE"))
worksheet.write(row, col+6, j.findtext("REGSTR_SE_CODE"))
worksheet.write(row, col+7, j.findtext("MNNM"))
worksheet.write(row, col+8, j.findtext("SLNO"))
worksheet.write(row, col+9, j.findtext("SYS_REGIST_NO"))
worksheet.write(row, col+10, j.findtext("BSNM_CMPNM"))
worksheet.write(row, col+11, j.findtext("EMPLYM_CO"))
worksheet.write(row, col+12, j.findtext("FRST_REGIST_DT"))
if j.get("ETC_ADRES") is not "true":
worksheet.write(row, col+13, j.findtext("ETC_ADRES"))
row += 1
"""
j = 0
if i is 0:
row += 1
i+=1
elif i >= 1 and i<=16:
continue
elif i > 16:
i = 0
"""
print(k.iter())
#convert to .xlsx
worksheet.write("A1", 'pos', bold)
worksheet.write("B1", 'X_CRDNT', bold)
worksheet.write("C1", 'Y_CRDNT', bold)
worksheet.write("D1", 'PNU', bold)
worksheet.write("E1", 'LD_CPSG_CODE', bold)
worksheet.write("F1", 'LD_EMD_LI_CODE', bold)
worksheet.write("G1", 'REGSTR_SE_CODE', bold)
worksheet.write("H1", 'MNNM', bold)
worksheet.write("I1", 'SLNO', bold)
worksheet.write("J1", 'SYS_REGIST_NO', bold)
worksheet.write("K1", 'BSNM_CMPNM', bold)
worksheet.write("L1", 'EMPLYM_CO', bold)
worksheet.write("M1", 'FRST_REGIST_DT', bold)
workbook.close()
此python代码将写入parse.xlsx estate(2) 的信息。xml XML 中的信息出现在 Excel 文件中,没有空白。 excel file with blanks
你写的代码有两个问题:
在下面的嵌套循环中,
的所有后代元素k
遍历所有bar
元素(其中只有一个),然后j
遍历k
:for k in tree.iter('bar'): for j in k.iter(): print(j.text) # ...
当
的每个后代元素,您都会得到一个空白行j
是<f169>
元素之一时,从其 child 元素读取一行数据。但是,j
也贯穿<bar>
元素的后代元素(<pos>
、<X_CRDNT>
、<Y_CRDNT>
)。这些其他元素没有任何 child 元素,因此如果j
是其中之一,则j.findtext('MNNM')
将 return 什么都没有。因此,对于<f169>
.修复方法是将
for j in k.iter():
替换为for j in k.iter('f169'):
。这样,j
仅遍历<f169>
个元素。最后,您在数据的第一行上写上 headers。通过在开始时设置
row = 1
而不是row = 0
来避免此问题。