使用 headers?#Python#Jupyter Notebook 将单行形式的记录条目转换为 jupyter notebook 中的不同列
convert record entries in form of single rows into different columns in jupyter notebook with headers?#Python#Jupyter Notebook
我有非结构化的文本文件。我将它导入到 jupyter notebook 中,我试图在 jupyter notebook 中使用 pandas 从这一行中创建 10 列。例如:
0 product/productId: B000GKXY4S
1 product/title:疯狂剪刀套装
2 product/price:未知
3 review/userId:A1QA985ULVCQOB
4 review/profileName:Carleen M. Amadio“女士博士...
5 review/helpfulness: 2/2
6 review/score: 5.0
7 review/time: 1314057600
8 review/summary: 成人也很有趣!
9 review/text:我真的很喜欢这些剪刀...
10
11 product/productId: B000GKXY4S
12 product/title:疯狂造型剪刀套装
13 product/price:未知
14 review/userId:ALCX2ELNHLQA7
15 review/profileName:芭芭拉
16 review/helpfulness: 0/0
17 review/score: 5.0
18 review/time: 1328659200
19 review/summary:晋级!
20 review/text: 在艺术供应中四处寻找......
21
22 product/productId:B000140KIW
23 product/title:菲斯卡斯软触 Multi-Purpose ...
24 product/price:未知
25 review/userId:A2M2M4R1KG5WOL
26 review/profileName:L·海明威
27 review/helpfulness: 1/1
28 review/score: 5.0
29 review/time: 1156636800
30 review/summary:菲斯卡斯软触 Multi-Purpose...
31 review/text:这些是我拥有的最好的剪刀...
32
输出:我想要 10 列,每个列的值都作为行
我们可以 split
您在 :
上的数据框,然后将其扩展为列并使用 groupby
获取每列的值。最后,我们通过使用以下值压缩列名来构建一个 pd.DataFrame
的数据框:
m = df['COL'].str.split(':', expand=True)\
.groupby(0)[1].apply(list).reset_index()
df = pd.DataFrame(dict(zip(m[0], m[1])))
# print first 6 columns, cause rest doesnt fit on screen
print(df.iloc[:, :6])
product/price product/productId product/title review/helpfulness review/profileName review/score
0 unknown B000GKXY4S Crazy Shape Scissor Set 2/2 Carleen M. Amadio "Lady Dr... 5.0
1 unknown B000GKXY4S Crazy Shape Scissor Set 0/0 Barbara 5.0
2 unknown B000140KIW Fiskars Softouch Multi-Purpose ... 1/1 L. Heminway 5.0
我有非结构化的文本文件。我将它导入到 jupyter notebook 中,我试图在 jupyter notebook 中使用 pandas 从这一行中创建 10 列。例如:
0 product/productId: B000GKXY4S
1 product/title:疯狂剪刀套装
2 product/price:未知
3 review/userId:A1QA985ULVCQOB
4 review/profileName:Carleen M. Amadio“女士博士...
5 review/helpfulness: 2/2
6 review/score: 5.0
7 review/time: 1314057600
8 review/summary: 成人也很有趣!
9 review/text:我真的很喜欢这些剪刀...
10
11 product/productId: B000GKXY4S
12 product/title:疯狂造型剪刀套装
13 product/price:未知
14 review/userId:ALCX2ELNHLQA7
15 review/profileName:芭芭拉
16 review/helpfulness: 0/0
17 review/score: 5.0
18 review/time: 1328659200
19 review/summary:晋级!
20 review/text: 在艺术供应中四处寻找......
21
22 product/productId:B000140KIW
23 product/title:菲斯卡斯软触 Multi-Purpose ...
24 product/price:未知
25 review/userId:A2M2M4R1KG5WOL
26 review/profileName:L·海明威
27 review/helpfulness: 1/1
28 review/score: 5.0
29 review/time: 1156636800
30 review/summary:菲斯卡斯软触 Multi-Purpose...
31 review/text:这些是我拥有的最好的剪刀...
32
输出:我想要 10 列,每个列的值都作为行
我们可以 split
您在 :
上的数据框,然后将其扩展为列并使用 groupby
获取每列的值。最后,我们通过使用以下值压缩列名来构建一个 pd.DataFrame
的数据框:
m = df['COL'].str.split(':', expand=True)\
.groupby(0)[1].apply(list).reset_index()
df = pd.DataFrame(dict(zip(m[0], m[1])))
# print first 6 columns, cause rest doesnt fit on screen
print(df.iloc[:, :6])
product/price product/productId product/title review/helpfulness review/profileName review/score
0 unknown B000GKXY4S Crazy Shape Scissor Set 2/2 Carleen M. Amadio "Lady Dr... 5.0
1 unknown B000GKXY4S Crazy Shape Scissor Set 0/0 Barbara 5.0
2 unknown B000140KIW Fiskars Softouch Multi-Purpose ... 1/1 L. Heminway 5.0