将 XML 转换为 CSV。 Pandas to_csv 不是在写一些行,而是在写其他行。看不出来
Converting XML to CSV. Pandas to_csv is not writing some rows, but is writing others. Can't spot it
我正在将 XML 转换为 CSV。
对于某些行,它写得很好,但对于其他人,它什么也没写。
下面是我的代码。正在写入的行是:
- 品牌
- 行
- 姓名
其他行不是。您会在 productType 中看到我进行检查并打印到控制台 retail
或 usage
,这确实会为每个项目打印到控制台。它的工作。但是,它只是不写入数据。我完全被难住了。任何建议将不胜感激。谢谢。
代码如下:
# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
productColumns = ["brand", "line", "name", "purpose", "retailPrice"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('xmlimportdata.xml')
root = xmlparse.getroot()
products = root.findall("productTableData")
for product in products:
# make sure a product and not a service (P vs S)
if product.find("productType").text == "P":
productBrand = product.find("Make")
if productBrand is not None:
productBrand = productBrand.text
else:
productBrand = "No Data"
productLine = product.find("Category")
if productLine is not None:
productLine = productLine.text
else:
productLine = "No Data"
productName = product.find("Name")
if productName is not None:
productName = productName.text
else:
productName = "No Data"
productType = product.find("usageType")
if productType is not None:
if productType.text == "true":
print('usage')
productType = "usage"
else:
productType = "retail"
print('retail')
else:
productType = "No Data"
productSize = product.find("Size")
if productSize is not None:
productSize = productSize.text
else:
productSize = "No Data"
productPrice = product.find("Retail")
if productPrice is not None:
productPrice = productPrice.text
else:
productPrice = "No Data"
productId = product.find("ID")
if productId is not None:
productId = productId.text
else:
productId = "No Data"
rows.append({"brand": productBrand,
"line": productLine,
"name": productName,
"type": productType,
"size": productSize,
"price": productPrice
})
# add data to pandas dataframe
df = pd.DataFrame(rows, columns=productColumns)
# Writing dataframe to csv
# note the columns = This little ripper will auto sort columns for us and place in correct order
df.to_csv('Converted-Products.csv', columns = productColumns, index = False)
更新!我找到了。 rows.append 没有附加与开始时在 productColumns 中声明的行相同的名称。一旦我使它们匹配,我们就得到了输出。
您不需要 pandas
进行此转换。离开大图书馆进行数据分析。只需 open
一个文本文件并使用 csv
封装到 writerows
。并使用定义的方法让你的代码保持干燥(Don't Repeat Y我们自己)使用三元运算符(即 if
和 else
在同一行)。
import csv
import xml.etree.ElementTree as Xet
elementColumns = ["ID", "Make", "Category", "Name", "usageType", "Size", "Retail"]
productColumns = ["id", "brand", "line", "name", "purpose", "size", "retailPrice"]
def get_text(elem, colname):
colElem = elem.find(colname)
colText = colElem.text if colElem is Not None else "No Data"
if col == "usageType" and colElem is Not None:
colText = "usage" if colElem.text == "true" else "retail"
return colText
# PARSE XML FILE
xmlparse = Xet.parse('xmlimportdata.xml')
root = xmlparse.getroot()
products = root.findall("productTableData")
# OPEN CSV FOR WRITING
with open("Output.csv", "wb") as f:
writer = csv.writer(f)
# HEADERS
writer.writerow(productColumns)
# ROWS
for product in products:
if product.find("productType").text == "P":
writer.writerow([
get_text(product, col)
for col in elementColumns
])
我正在将 XML 转换为 CSV。
对于某些行,它写得很好,但对于其他人,它什么也没写。
下面是我的代码。正在写入的行是:
- 品牌
- 行
- 姓名
其他行不是。您会在 productType 中看到我进行检查并打印到控制台 retail
或 usage
,这确实会为每个项目打印到控制台。它的工作。但是,它只是不写入数据。我完全被难住了。任何建议将不胜感激。谢谢。
代码如下:
# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
productColumns = ["brand", "line", "name", "purpose", "retailPrice"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('xmlimportdata.xml')
root = xmlparse.getroot()
products = root.findall("productTableData")
for product in products:
# make sure a product and not a service (P vs S)
if product.find("productType").text == "P":
productBrand = product.find("Make")
if productBrand is not None:
productBrand = productBrand.text
else:
productBrand = "No Data"
productLine = product.find("Category")
if productLine is not None:
productLine = productLine.text
else:
productLine = "No Data"
productName = product.find("Name")
if productName is not None:
productName = productName.text
else:
productName = "No Data"
productType = product.find("usageType")
if productType is not None:
if productType.text == "true":
print('usage')
productType = "usage"
else:
productType = "retail"
print('retail')
else:
productType = "No Data"
productSize = product.find("Size")
if productSize is not None:
productSize = productSize.text
else:
productSize = "No Data"
productPrice = product.find("Retail")
if productPrice is not None:
productPrice = productPrice.text
else:
productPrice = "No Data"
productId = product.find("ID")
if productId is not None:
productId = productId.text
else:
productId = "No Data"
rows.append({"brand": productBrand,
"line": productLine,
"name": productName,
"type": productType,
"size": productSize,
"price": productPrice
})
# add data to pandas dataframe
df = pd.DataFrame(rows, columns=productColumns)
# Writing dataframe to csv
# note the columns = This little ripper will auto sort columns for us and place in correct order
df.to_csv('Converted-Products.csv', columns = productColumns, index = False)
更新!我找到了。 rows.append 没有附加与开始时在 productColumns 中声明的行相同的名称。一旦我使它们匹配,我们就得到了输出。
您不需要 pandas
进行此转换。离开大图书馆进行数据分析。只需 open
一个文本文件并使用 csv
封装到 writerows
。并使用定义的方法让你的代码保持干燥(Don't Repeat Y我们自己)使用三元运算符(即 if
和 else
在同一行)。
import csv
import xml.etree.ElementTree as Xet
elementColumns = ["ID", "Make", "Category", "Name", "usageType", "Size", "Retail"]
productColumns = ["id", "brand", "line", "name", "purpose", "size", "retailPrice"]
def get_text(elem, colname):
colElem = elem.find(colname)
colText = colElem.text if colElem is Not None else "No Data"
if col == "usageType" and colElem is Not None:
colText = "usage" if colElem.text == "true" else "retail"
return colText
# PARSE XML FILE
xmlparse = Xet.parse('xmlimportdata.xml')
root = xmlparse.getroot()
products = root.findall("productTableData")
# OPEN CSV FOR WRITING
with open("Output.csv", "wb") as f:
writer = csv.writer(f)
# HEADERS
writer.writerow(productColumns)
# ROWS
for product in products:
if product.find("productType").text == "P":
writer.writerow([
get_text(product, col)
for col in elementColumns
])