使用 headers?#Python#Jupyter Notebook 将单行形式的记录条目转换为 jupyter notebook 中的不同列

convert record entries in form of single rows into different columns in jupyter notebook with headers?#Python#Jupyter Notebook

我有非结构化的文本文件。我将它导入到 jupyter notebook 中,我试图在 jupyter notebook 中使用 pandas 从这一行中创建 10 列。例如:
0 product/productId: B000GKXY4S
1 product/title:疯狂剪刀套装
2 product/price:未知
3 review/userId:A1QA985ULVCQOB
4 review/profileName:Carleen M. Amadio“女士博士...
5 review/helpfulness: 2/2
6 review/score: 5.0
7 review/time: 1314057600
8 review/summary: 成人也很有趣!
9 review/text:我真的很喜欢这些剪刀...
10
11 product/productId: B000GKXY4S
12 product/title:疯狂造型剪刀套装
13 product/price:未知
14 review/userId:ALCX2ELNHLQA7
15 review/profileName:芭芭拉
16 review/helpfulness: 0/0
17 review/score: 5.0
18 review/time: 1328659200
19 review/summary:晋级!
20 review/text: 在艺术供应中四处寻找......
21
22 product/productId:B000140KIW
23 product/title:菲斯卡斯软触 Multi-Purpose ...
24 product/price:未知
25 review/userId:A2M2M4R1KG5WOL
26 review/profileName:L·海明威
27 review/helpfulness: 1/1
28 review/score: 5.0
29 review/time: 1156636800
30 review/summary:菲斯卡斯软触 Multi-Purpose...
31 review/text:这些是我拥有的最好的剪刀...
32

输出:我想要 10 列,每个列的值都作为行

我们可以 split 您在 : 上的数据框,然后将其扩展为列并使用 groupby 获取每列的值。最后,我们通过使用以下值压缩列名来构建一个 pd.DataFrame 的数据框:

m = df['COL'].str.split(':', expand=True)\
              .groupby(0)[1].apply(list).reset_index()

df = pd.DataFrame(dict(zip(m[0], m[1])))
# print first 6 columns, cause rest doesnt fit on screen
print(df.iloc[:, :6])


  product/price product/productId                        product/title review/helpfulness              review/profileName review/score
0       unknown        B000GKXY4S              Crazy Shape Scissor Set                2/2   Carleen M. Amadio "Lady Dr...          5.0
1       unknown        B000GKXY4S              Crazy Shape Scissor Set                0/0                         Barbara          5.0
2       unknown        B000140KIW   Fiskars Softouch Multi-Purpose ...                1/1                     L. Heminway          5.0