Python - 管理嵌套的 JSON 到 DataFrame
Python - Manage nested JSON to DataFrame
我正处于数据科学家之旅的开始阶段,我正在为 JSON 和 Python 苦苦挣扎。即使我知道 DataFrame 操作和 JSON 格式操作的基础知识,我也只有一个包含以下数据的 JSON 文件:
{"Orders":
[
{
"OrderID":"1000004209",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-02 14:05:16",
"OrderStatus":"wc-failed",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
}
},
{
"OrderID":"1000004210",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"GE",
"OrderDate":"2019-05-02 14:17:32",
"OrderStatus":"wc-cancelled",
"OrderTotal":"9.00",
"TotalDiscount":"0",
"OrderSubTotal":"9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Sapone Marsiglia 200 g",
"Sku":"01026",
"Quantity":"1",
"ItemCost":"4.10",
"ItemTotal":"4.1",
"Category":"MARSEILLE;SAPONETTE"
}
}
},
{
"OrderID":"1000004211",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"GE",
"OrderDate":"2019-05-02 14:21:42",
"OrderStatus":"wc-cancelled",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
}
},
{
"OrderID":"1000004235",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-03 09:37:06",
"OrderStatus":"wc-cancelled",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": [
{
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
},
{
"ProductName":"Sapone Vegetale Lavanda Officinalis Bio",
"Sku":"01049",
"Quantity":"1",
"ItemCost":"4.90",
"ItemTotal":"4.9",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
]
}
},
{
"OrderID":"1000004292",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-06 08:52:47",
"OrderStatus":"wc-failed",
"OrderTotal":"64.90",
"TotalDiscount":"0",
"OrderSubTotal":"64.9",
"Coupon":"",
"OrderItems": {
"Item": [
{
"ProductName":"Schiuma da Barba Pour Homme",
"Sku":"45396",
"Quantity":"2",
"ItemCost":"12.00",
"ItemTotal":"24",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Detergente Intimo Delicato Mamma",
"Sku":"38420",
"Quantity":"1",
"ItemCost":"11.00",
"ItemTotal":"11",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Spray per Ambiente - Preziosa",
"Sku":"44231",
"Quantity":"2",
"ItemCost":"10.00",
"ItemTotal":"20",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Cuscinetti Profumati - Preziosa",
"Sku":"45491",
"Quantity":"1",
"ItemCost":"9.90",
"ItemTotal":"9.9",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
}
]
}
},
]
}
我想做的是用 Pandas 构建一个 DataFrame,以便操作数据和收集信息。
起初,我尝试使用 Pandas 中的 pd.read_json('path_to_file')
函数,但得到的结果是:
Orders
0 {'OrderID': '1000004209', 'Email': 'name@ma...
1 {'OrderID': '1000004210', 'Email': 'name@ma...
2 {'OrderID': '1000004211', 'Email': 'name@ma...
3 {'OrderID': '1000004235', 'Email': 'name@ma...
4 {'OrderID': '1000004292', 'Email': 'name@ma...
我尝试使用 pd.DataFrame(df['Orders'])
从每一行获取一个 DatFrame,但它 returns 是同一个 DataFrame。我尝试使用 for 循环在新的 DataFrame 中附加单行,但我也走到了死胡同。
我在 Whosebug 上查看了所有与此相关的主题,却没有找到解决我的问题的方法。
实际上,我需要创建一个 DataFrame,每个主要值(如“OrderID”、“Email”、“AnnoNascita”等)都有一个列,还有一个名为“OrderItems”的列,其数组为“项目”中的所有值。我在想类似的事情:
OrderID Email AnnoNascita Age Gender Provincia OrderDate OrderStatus OrderTotal Coupon OrderItems
0 1000004209 name@mail.com - - - CR 2019-05-02 14:05:16 wc-failed 31.90 {"Items":[{"ProductName":"Schiuma da Barba Pour Homme","Sku":"45396","Quantity":"2","ItemCost":"12.00","ItemTotal":"24","Category":"POUR HOMME;LINEE UOMO;RASATURA"},{"ProductName":"Detergente Intimo Delicato Mamma","Sku":"38420","Quantity":"1","ItemCost":"11.00","ItemTotal":"11","Category":"POUR HOMME;LINEE UOMO;RASATURA"}]}
如果您对如何构建更好的 DataFrame 有任何建议,而不是我的想法,我很高兴阅读它并改变我的想法。正如我所说,我刚开始,我真的很感激任何建议。
PS:如果你也能解释一下你提供的解决方案,我会很高兴,因为我实际上正在尝试学习数据操作,不仅有一个解决方案,它真的很有帮助,还要了解一下。
感谢所有愿意花 his/her 时间帮助我的人!
使用pd.json_normalize()
,如下:
假设您的 json 文件名为 data
:
df = pd.json_normalize(data['Orders'])
pd.json_normalize()
将半结构化 JSON 数据标准化为平面 table。由于它展平了嵌套结构,您可以访问 JSON.
中的所有字段
您需要指定第一个标签 Orders
才能访问和展开其中的列内容。否则,你只会得到一列 Orders
.
结果:
print(df)
OrderID Email AnnoNascita Age Gender Provincia OrderDate OrderStatus OrderTotal TotalDiscount OrderSubTotal Coupon OrderItems.Item.ProductName OrderItems.Item.Sku OrderItems.Item.Quantity OrderItems.Item.ItemCost OrderItems.Item.ItemTotal OrderItems.Item.Category OrderItems.Item
0 1000004209 name@mail.com - - - CR 2019-05-02 14:05:16 wc-failed 31.90 0 31.9 Eau de Parfum Zafferano 44160 1 27.00 27 ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO) NaN
1 1000004210 name@mail.com - - - GE 2019-05-02 14:17:32 wc-cancelled 9.00 0 9 Sapone Marsiglia 200 g 01026 1 4.10 4.1 MARSEILLE;SAPONETTE NaN
2 1000004211 name@mail.com - - - GE 2019-05-02 14:21:42 wc-cancelled 31.90 0 31.9 Eau de Parfum Zafferano 44160 1 27.00 27 ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO) NaN
3 1000004235 name@mail.com - - - CR 2019-05-03 09:37:06 wc-cancelled 31.90 0 31.9 NaN NaN NaN NaN NaN NaN [{'ProductName': 'Eau de Parfum Zafferano', 'Sku': '44160', 'Quantity': '1', 'ItemCost': '27.00', 'ItemTotal': '27', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}, {'ProductName': 'Sapone Vegetale Lavanda Officinalis Bio', 'Sku': '01049', 'Quantity': '1', 'ItemCost': '4.90', 'ItemTotal': '4.9', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}]
4 1000004292 name@mail.com - - - CR 2019-05-06 08:52:47 wc-failed 64.90 0 64.9 NaN NaN NaN NaN NaN NaN [{'ProductName': 'Schiuma da Barba Pour Homme', 'Sku': '45396', 'Quantity': '2', 'ItemCost': '12.00', 'ItemTotal': '24', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Detergente Intimo Delicato Mamma', 'Sku': '38420', 'Quantity': '1', 'ItemCost': '11.00', 'ItemTotal': '11', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Spray per Ambiente - Preziosa', 'Sku': '44231', 'Quantity': '2', 'ItemCost': '10.00', 'ItemTotal': '20', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Cuscinetti Profumati - Preziosa', 'Sku': '45491', 'Quantity': '1', 'ItemCost': '9.90', 'ItemTotal': '9.9', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}]
我正处于数据科学家之旅的开始阶段,我正在为 JSON 和 Python 苦苦挣扎。即使我知道 DataFrame 操作和 JSON 格式操作的基础知识,我也只有一个包含以下数据的 JSON 文件:
{"Orders":
[
{
"OrderID":"1000004209",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-02 14:05:16",
"OrderStatus":"wc-failed",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
}
},
{
"OrderID":"1000004210",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"GE",
"OrderDate":"2019-05-02 14:17:32",
"OrderStatus":"wc-cancelled",
"OrderTotal":"9.00",
"TotalDiscount":"0",
"OrderSubTotal":"9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Sapone Marsiglia 200 g",
"Sku":"01026",
"Quantity":"1",
"ItemCost":"4.10",
"ItemTotal":"4.1",
"Category":"MARSEILLE;SAPONETTE"
}
}
},
{
"OrderID":"1000004211",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"GE",
"OrderDate":"2019-05-02 14:21:42",
"OrderStatus":"wc-cancelled",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
}
},
{
"OrderID":"1000004235",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-03 09:37:06",
"OrderStatus":"wc-cancelled",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": [
{
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
},
{
"ProductName":"Sapone Vegetale Lavanda Officinalis Bio",
"Sku":"01049",
"Quantity":"1",
"ItemCost":"4.90",
"ItemTotal":"4.9",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
]
}
},
{
"OrderID":"1000004292",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-06 08:52:47",
"OrderStatus":"wc-failed",
"OrderTotal":"64.90",
"TotalDiscount":"0",
"OrderSubTotal":"64.9",
"Coupon":"",
"OrderItems": {
"Item": [
{
"ProductName":"Schiuma da Barba Pour Homme",
"Sku":"45396",
"Quantity":"2",
"ItemCost":"12.00",
"ItemTotal":"24",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Detergente Intimo Delicato Mamma",
"Sku":"38420",
"Quantity":"1",
"ItemCost":"11.00",
"ItemTotal":"11",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Spray per Ambiente - Preziosa",
"Sku":"44231",
"Quantity":"2",
"ItemCost":"10.00",
"ItemTotal":"20",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Cuscinetti Profumati - Preziosa",
"Sku":"45491",
"Quantity":"1",
"ItemCost":"9.90",
"ItemTotal":"9.9",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
}
]
}
},
]
}
我想做的是用 Pandas 构建一个 DataFrame,以便操作数据和收集信息。
起初,我尝试使用 Pandas 中的 pd.read_json('path_to_file')
函数,但得到的结果是:
Orders
0 {'OrderID': '1000004209', 'Email': 'name@ma...
1 {'OrderID': '1000004210', 'Email': 'name@ma...
2 {'OrderID': '1000004211', 'Email': 'name@ma...
3 {'OrderID': '1000004235', 'Email': 'name@ma...
4 {'OrderID': '1000004292', 'Email': 'name@ma...
我尝试使用 pd.DataFrame(df['Orders'])
从每一行获取一个 DatFrame,但它 returns 是同一个 DataFrame。我尝试使用 for 循环在新的 DataFrame 中附加单行,但我也走到了死胡同。
我在 Whosebug 上查看了所有与此相关的主题,却没有找到解决我的问题的方法。
实际上,我需要创建一个 DataFrame,每个主要值(如“OrderID”、“Email”、“AnnoNascita”等)都有一个列,还有一个名为“OrderItems”的列,其数组为“项目”中的所有值。我在想类似的事情:
OrderID Email AnnoNascita Age Gender Provincia OrderDate OrderStatus OrderTotal Coupon OrderItems
0 1000004209 name@mail.com - - - CR 2019-05-02 14:05:16 wc-failed 31.90 {"Items":[{"ProductName":"Schiuma da Barba Pour Homme","Sku":"45396","Quantity":"2","ItemCost":"12.00","ItemTotal":"24","Category":"POUR HOMME;LINEE UOMO;RASATURA"},{"ProductName":"Detergente Intimo Delicato Mamma","Sku":"38420","Quantity":"1","ItemCost":"11.00","ItemTotal":"11","Category":"POUR HOMME;LINEE UOMO;RASATURA"}]}
如果您对如何构建更好的 DataFrame 有任何建议,而不是我的想法,我很高兴阅读它并改变我的想法。正如我所说,我刚开始,我真的很感激任何建议。
PS:如果你也能解释一下你提供的解决方案,我会很高兴,因为我实际上正在尝试学习数据操作,不仅有一个解决方案,它真的很有帮助,还要了解一下。
感谢所有愿意花 his/her 时间帮助我的人!
使用pd.json_normalize()
,如下:
假设您的 json 文件名为 data
:
df = pd.json_normalize(data['Orders'])
pd.json_normalize()
将半结构化 JSON 数据标准化为平面 table。由于它展平了嵌套结构,您可以访问 JSON.
您需要指定第一个标签 Orders
才能访问和展开其中的列内容。否则,你只会得到一列 Orders
.
结果:
print(df)
OrderID Email AnnoNascita Age Gender Provincia OrderDate OrderStatus OrderTotal TotalDiscount OrderSubTotal Coupon OrderItems.Item.ProductName OrderItems.Item.Sku OrderItems.Item.Quantity OrderItems.Item.ItemCost OrderItems.Item.ItemTotal OrderItems.Item.Category OrderItems.Item
0 1000004209 name@mail.com - - - CR 2019-05-02 14:05:16 wc-failed 31.90 0 31.9 Eau de Parfum Zafferano 44160 1 27.00 27 ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO) NaN
1 1000004210 name@mail.com - - - GE 2019-05-02 14:17:32 wc-cancelled 9.00 0 9 Sapone Marsiglia 200 g 01026 1 4.10 4.1 MARSEILLE;SAPONETTE NaN
2 1000004211 name@mail.com - - - GE 2019-05-02 14:21:42 wc-cancelled 31.90 0 31.9 Eau de Parfum Zafferano 44160 1 27.00 27 ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO) NaN
3 1000004235 name@mail.com - - - CR 2019-05-03 09:37:06 wc-cancelled 31.90 0 31.9 NaN NaN NaN NaN NaN NaN [{'ProductName': 'Eau de Parfum Zafferano', 'Sku': '44160', 'Quantity': '1', 'ItemCost': '27.00', 'ItemTotal': '27', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}, {'ProductName': 'Sapone Vegetale Lavanda Officinalis Bio', 'Sku': '01049', 'Quantity': '1', 'ItemCost': '4.90', 'ItemTotal': '4.9', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}]
4 1000004292 name@mail.com - - - CR 2019-05-06 08:52:47 wc-failed 64.90 0 64.9 NaN NaN NaN NaN NaN NaN [{'ProductName': 'Schiuma da Barba Pour Homme', 'Sku': '45396', 'Quantity': '2', 'ItemCost': '12.00', 'ItemTotal': '24', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Detergente Intimo Delicato Mamma', 'Sku': '38420', 'Quantity': '1', 'ItemCost': '11.00', 'ItemTotal': '11', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Spray per Ambiente - Preziosa', 'Sku': '44231', 'Quantity': '2', 'ItemCost': '10.00', 'ItemTotal': '20', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Cuscinetti Profumati - Preziosa', 'Sku': '45491', 'Quantity': '1', 'ItemCost': '9.90', 'ItemTotal': '9.9', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}]