如何将其转换为数据框并将其另存为 csv?

How to transform this into a dataframe and save it as a csv?

此数据之前作为 .txt 文件提供。我将其转换为 .csv 格式并尝试将其排序为所需的形式,但失败了。我正在尝试找到转换此数据结构的方法(如下所示):

bakeryA
77300 Baker Street
bun: [10,20,30,10]
donut: [20,10,40,0]
bread: [0,10,15,10]
bakery B
78100 Cerabut St
data not available
bakery C
80300 Sulkeh St
bun: [29,50,20,30]
donut: [10,10,30,10]
bread: [10,15,10,20]

进入此数据框:

Name Address type salt sugar water flour
Bakery A 77300 Baker Street bun 10 20 30 10
Bakery A 77300 Baker Street donut 20 10 40 0
Bakery A 77300 Baker Street bread 0 10 15 10
Bakery B 78100 Cerabut St Nan Nan Nan Nan Nan
Bakery C 80300 Sulkeh St bun 29 50 20 30
Bakery C 80300 Sulkeh St donut 10 10 30 10
Bakery C 80300 Sulkeh St bread 10 15 10 20

谢谢!

这与 pandas 关系不大,更多的是将非结构化源解析为结构化数据。试试这个:

from ast import literal_eval
from enum import IntEnum

class LineType(IntEnum):
    BakeryName = 1
    Address = 2
    Ingredients = 3

data = []
with open('data.txt') as fp:
    line_type = LineType.BakeryName
    for line in fp:
        line = line.strip()

        if line_type == LineType.BakeryName:
            name = line # the current line contains the Bakery Name
            line_type = LineType.Address # the next line is the Bakery Address
        elif line_type == LineType.Address:
            address = line # the current line contains the Bakery Address
            line_type = LineType.Ingredients # the next line contains the Ingredients
        elif line_type == LineType.Ingredients and line == 'data not available':
            data.append({
                'Name': name,
                'Address': address
            }) # no Ingredients info available
            line_type = LineType.BakeryName # next line is Bakery Name
        elif line_type == LineType.Ingredients:
            # if the line does not follow the ingredient's format, we
            # overstepped into the Bakery Name line. Then the next line
            # is Bakery Address
            try:
                bakery_type, ingredients = line.split(':')
                ingredients = literal_eval(ingredients.strip())
                data.append({
                    'Name': name,
                    'Address': address,
                    'type': bakery_type,
                    'salt': ingredients[0],
                    'sugar': ingredients[1],
                    'water': ingredients[2],
                    'flour': ingredients[3],
                })
            except:
                name = line
                line_type = LineType.Address

df = pd.DataFrame(data)

假设您的数据文件采用所示格式。稍有偏差(例如空行)就会导致错误。