将 XML 转换为 CSV。 Pandas to_csv 不是在写一些行,而是在写其他行。看不出来

Converting XML to CSV. Pandas to_csv is not writing some rows, but is writing others. Can't spot it

我正在将 XML 转换为 CSV。

对于某些行,它写得很好,但对于其他人,它什么也没写。

下面是我的代码。正在写入的行是:

其他行不是。您会在 productType 中看到我进行检查并打印到控制台 retailusage,这确实会为每个项目打印到控制台。它的工作。但是,它只是不写入数据。我完全被难住了。任何建议将不胜感激。谢谢。

代码如下:

# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
  
productColumns = ["brand", "line", "name", "purpose", "retailPrice"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('xmlimportdata.xml')
root = xmlparse.getroot()

products = root.findall("productTableData")

for product in products:

    # make sure a product and not a service (P vs S)
    if product.find("productType").text == "P":        

        productBrand = product.find("Make")
        if productBrand is not None:
            productBrand = productBrand.text
        else:
            productBrand = "No Data"
        
        productLine = product.find("Category")
        if productLine is not None:
            productLine = productLine.text
        else:
            productLine = "No Data"
        
        productName = product.find("Name")
        if productName is not None:
            productName = productName.text
        else:
            productName = "No Data"

        productType = product.find("usageType")
        if productType is not None:
            if productType.text == "true":
                print('usage')
                productType = "usage"
            else:
                productType = "retail"
                print('retail')
        else:
            productType = "No Data"
        
        productSize = product.find("Size")
        if productSize is not None:
            productSize = productSize.text
        else:
            productSize = "No Data"
        
        productPrice = product.find("Retail")
        if productPrice is not None:
            productPrice = productPrice.text
        else:
            productPrice = "No Data"

        productId = product.find("ID")
        if productId is not None:
            productId = productId.text
        else:
            productId = "No Data"

        rows.append({"brand": productBrand,
                     "line": productLine,
                     "name": productName,
                     "type": productType,
                     "size": productSize,
                     "price": productPrice
                     })


# add data to pandas dataframe
df = pd.DataFrame(rows, columns=productColumns)
  
# Writing dataframe to csv
# note the columns =  This little ripper will auto sort columns for us and place in correct order
df.to_csv('Converted-Products.csv', columns = productColumns, index = False)

更新!我找到了。 rows.append 没有附加与开始时在 productColumns 中声明的行相同的名称。一旦我使它们匹配,我们就得到了输出。

您不需要 pandas 进行此转换。离开大图书馆进行数据分析。只需 open 一个文本文件并使用 csv 封装到 writerows。并使用定义的方法让你的代码保持干燥(Don't Repeat Y我们自己)使用三元运算符(即 ifelse 在同一行)。

import csv
import xml.etree.ElementTree as Xet

elementColumns = ["ID", "Make", "Category", "Name", "usageType", "Size", "Retail"]
productColumns = ["id", "brand", "line", "name", "purpose", "size", "retailPrice"]

def get_text(elem, colname):
    colElem = elem.find(colname)

    colText = colElem.text if colElem is Not None else "No Data"

    if col == "usageType" and colElem is Not None:
        colText = "usage" if colElem.text == "true" else "retail"
        
    return colText

# PARSE XML FILE
xmlparse = Xet.parse('xmlimportdata.xml')
root = xmlparse.getroot()
products = root.findall("productTableData")

# OPEN CSV FOR WRITING
with open("Output.csv", "wb") as f:
    writer = csv.writer(f)

    # HEADERS
    writer.writerow(productColumns)

    # ROWS
    for product in products:
        if product.find("productType").text == "P":
            writer.writerow([
                get_text(product, col) 
                for col in elementColumns
            ])