如何将其转换为数据框并将其另存为 csv？

Question

此数据之前作为 .txt 文件提供。我将其转换为 .csv 格式并尝试将其排序为所需的形式，但失败了。我正在尝试找到转换此数据结构的方法（如下所示）：

bakeryA
77300 Baker Street
bun: [10,20,30,10]
donut: [20,10,40,0]
bread: [0,10,15,10]
bakery B
78100 Cerabut St
data not available
bakery C
80300 Sulkeh St
bun: [29,50,20,30]
donut: [10,10,30,10]
bread: [10,15,10,20]

进入此数据框：

Name	Address	type	salt	sugar	water	flour
Bakery A	77300 Baker Street	bun	10	20	30	10
Bakery A	77300 Baker Street	donut	20	10	40	0
Bakery A	77300 Baker Street	bread	0	10	15	10
Bakery B	78100 Cerabut St	Nan	Nan	Nan	Nan	Nan
Bakery C	80300 Sulkeh St	bun	29	50	20	30
Bakery C	80300 Sulkeh St	donut	10	10	30	10
Bakery C	80300 Sulkeh St	bread	10	15	10	20

谢谢！

Answer 1

这与 pandas 关系不大，更多的是将非结构化源解析为结构化数据。试试这个：

from ast import literal_eval
from enum import IntEnum

class LineType(IntEnum):
    BakeryName = 1
    Address = 2
    Ingredients = 3

data = []
with open('data.txt') as fp:
    line_type = LineType.BakeryName
    for line in fp:
        line = line.strip()

        if line_type == LineType.BakeryName:
            name = line # the current line contains the Bakery Name
            line_type = LineType.Address # the next line is the Bakery Address
        elif line_type == LineType.Address:
            address = line # the current line contains the Bakery Address
            line_type = LineType.Ingredients # the next line contains the Ingredients
        elif line_type == LineType.Ingredients and line == 'data not available':
            data.append({
                'Name': name,
                'Address': address
            }) # no Ingredients info available
            line_type = LineType.BakeryName # next line is Bakery Name
        elif line_type == LineType.Ingredients:
            # if the line does not follow the ingredient's format, we
            # overstepped into the Bakery Name line. Then the next line
            # is Bakery Address
            try:
                bakery_type, ingredients = line.split(':')
                ingredients = literal_eval(ingredients.strip())
                data.append({
                    'Name': name,
                    'Address': address,
                    'type': bakery_type,
                    'salt': ingredients[0],
                    'sugar': ingredients[1],
                    'water': ingredients[2],
                    'flour': ingredients[3],
                })
            except:
                name = line
                line_type = LineType.Address

df = pd.DataFrame(data)

假设您的数据文件采用所示格式。稍有偏差（例如空行）就会导致错误。

如何将其转换为数据框并将其另存为 csv？

How to transform this into a dataframe and save it as a csv?

python

dataframe

pandas

data-wrangling