Python - 管理嵌套的 JSON 到 DataFrame

Python - Manage nested JSON to DataFrame

我正处于数据科学家之旅的开始阶段,我正在为 JSON 和 Python 苦苦挣扎。即使我知道 DataFrame 操作和 JSON 格式操作的基础知识,我也只有一个包含以下数据的 JSON 文件:

{"Orders":
    [
        {
            "OrderID":"1000004209",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-02 14:05:16",
            "OrderStatus":"wc-failed",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Eau de Parfum Zafferano",
                    "Sku":"44160",
                    "Quantity":"1",
                    "ItemCost":"27.00",
                    "ItemTotal":"27",
                    "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                }
            }
        },
        {
            "OrderID":"1000004210",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"GE",
            "OrderDate":"2019-05-02 14:17:32",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"9.00",
            "TotalDiscount":"0",
            "OrderSubTotal":"9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Sapone Marsiglia 200 g",
                    "Sku":"01026",
                    "Quantity":"1",
                    "ItemCost":"4.10",
                    "ItemTotal":"4.1",
                    "Category":"MARSEILLE;SAPONETTE"
                }
            }
        },
        {
            "OrderID":"1000004211",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"GE",
            "OrderDate":"2019-05-02 14:21:42",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Eau de Parfum Zafferano",
                    "Sku":"44160",
                    "Quantity":"1",
                    "ItemCost":"27.00",
                    "ItemTotal":"27",
                    "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                }
            }
        },
        {
            "OrderID":"1000004235",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-03 09:37:06",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": [
                    {
                        "ProductName":"Eau de Parfum Zafferano",
                        "Sku":"44160",
                        "Quantity":"1",
                        "ItemCost":"27.00",
                        "ItemTotal":"27",
                        "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                    },
                    {
                        "ProductName":"Sapone Vegetale Lavanda Officinalis Bio",
                        "Sku":"01049",
                        "Quantity":"1",
                        "ItemCost":"4.90",
                        "ItemTotal":"4.9",
                        "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                    }
                ]
            }
        },
        {
            "OrderID":"1000004292",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-06 08:52:47",
            "OrderStatus":"wc-failed",
            "OrderTotal":"64.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"64.9",
            "Coupon":"",
            "OrderItems": {
                "Item": [
                    {
                        "ProductName":"Schiuma da Barba Pour Homme",
                        "Sku":"45396",
                        "Quantity":"2",
                        "ItemCost":"12.00",
                        "ItemTotal":"24",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Detergente Intimo Delicato Mamma",
                        "Sku":"38420",
                        "Quantity":"1",
                        "ItemCost":"11.00",
                        "ItemTotal":"11",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Spray per Ambiente - Preziosa",
                        "Sku":"44231",
                        "Quantity":"2",
                        "ItemCost":"10.00",
                        "ItemTotal":"20",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Cuscinetti Profumati - Preziosa",
                        "Sku":"45491",
                        "Quantity":"1",
                        "ItemCost":"9.90",
                        "ItemTotal":"9.9",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    }
                ]
            }
        },
    ]
}

我想做的是用 Pandas 构建一个 DataFrame,以便操作数据和收集信息。

起初,我尝试使用 Pandas 中的 pd.read_json('path_to_file') 函数,但得到的结果是:

                                            Orders
0   {'OrderID': '1000004209', 'Email': 'name@ma...
1   {'OrderID': '1000004210', 'Email': 'name@ma...
2   {'OrderID': '1000004211', 'Email': 'name@ma...
3   {'OrderID': '1000004235', 'Email': 'name@ma...
4   {'OrderID': '1000004292', 'Email': 'name@ma...

我尝试使用 pd.DataFrame(df['Orders']) 从每一行获取一个 DatFrame,但它 returns 是同一个 DataFrame。我尝试使用 for 循环在新的 DataFrame 中附加单行,但我也走到了死胡同。

我在 Whosebug 上查看了所有与此相关的主题,却没有找到解决我的问题的方法。

实际上,我需要创建一个 DataFrame,每个主要值(如“OrderID”、“Email”、“AnnoNascita”等)都有一个列,还有一个名为“OrderItems”的列,其数组为“项目”中的所有值。我在想类似的事情:

        OrderID             Email   AnnoNascita   Age     Gender     Provincia               OrderDate     OrderStatus     OrderTotal     Coupon     OrderItems
0    1000004209     name@mail.com             -     -          -            CR     2019-05-02 14:05:16       wc-failed          31.90                {"Items":[{"ProductName":"Schiuma da Barba Pour Homme","Sku":"45396","Quantity":"2","ItemCost":"12.00","ItemTotal":"24","Category":"POUR HOMME;LINEE UOMO;RASATURA"},{"ProductName":"Detergente Intimo Delicato Mamma","Sku":"38420","Quantity":"1","ItemCost":"11.00","ItemTotal":"11","Category":"POUR HOMME;LINEE UOMO;RASATURA"}]}

如果您对如何构建更好的 DataFrame 有任何建议,而不是我的想法,我很高兴阅读它并改变我的想法。正如我所说,我刚开始,我真的很感激任何建议。

PS:如果你也能解释一下你提供的解决方案,我会很高兴,因为我实际上正在尝试学习数据操作,不仅有一个解决方案,它真的很有帮助,还要了解一下。

感谢所有愿意花 his/her 时间帮助我的人!

使用pd.json_normalize(),如下:

假设您的 json 文件名为 data:

df = pd.json_normalize(data['Orders'])

pd.json_normalize() 将半结构化 JSON 数据标准化为平面 table。由于它展平了嵌套结构,您可以访问 JSON.

中的所有字段

您需要指定第一个标签 Orders 才能访问和展开其中的列内容。否则,你只会得到一列 Orders.

结果:

print(df)

      OrderID          Email AnnoNascita Age Gender Provincia            OrderDate   OrderStatus OrderTotal TotalDiscount OrderSubTotal Coupon OrderItems.Item.ProductName OrderItems.Item.Sku OrderItems.Item.Quantity OrderItems.Item.ItemCost OrderItems.Item.ItemTotal                           OrderItems.Item.Category                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          OrderItems.Item
0  1000004209  name@mail.com           -   -      -        CR  2019-05-02 14:05:16     wc-failed      31.90             0          31.9            Eau de Parfum Zafferano               44160                        1                    27.00                        27  ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
1  1000004210  name@mail.com           -   -      -        GE  2019-05-02 14:17:32  wc-cancelled       9.00             0             9             Sapone Marsiglia 200 g               01026                        1                     4.10                       4.1                                MARSEILLE;SAPONETTE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
2  1000004211  name@mail.com           -   -      -        GE  2019-05-02 14:21:42  wc-cancelled      31.90             0          31.9            Eau de Parfum Zafferano               44160                        1                    27.00                        27  ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
3  1000004235  name@mail.com           -   -      -        CR  2019-05-03 09:37:06  wc-cancelled      31.90             0          31.9                                NaN                 NaN                      NaN                      NaN                       NaN                                                NaN                                                                                                                                                                                                                                                                                                             [{'ProductName': 'Eau de Parfum Zafferano', 'Sku': '44160', 'Quantity': '1', 'ItemCost': '27.00', 'ItemTotal': '27', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}, {'ProductName': 'Sapone Vegetale Lavanda Officinalis Bio', 'Sku': '01049', 'Quantity': '1', 'ItemCost': '4.90', 'ItemTotal': '4.9', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}]
4  1000004292  name@mail.com           -   -      -        CR  2019-05-06 08:52:47     wc-failed      64.90             0          64.9                                NaN                 NaN                      NaN                      NaN                       NaN                                                NaN  [{'ProductName': 'Schiuma da Barba Pour Homme', 'Sku': '45396', 'Quantity': '2', 'ItemCost': '12.00', 'ItemTotal': '24', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Detergente Intimo Delicato Mamma', 'Sku': '38420', 'Quantity': '1', 'ItemCost': '11.00', 'ItemTotal': '11', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Spray per Ambiente - Preziosa', 'Sku': '44231', 'Quantity': '2', 'ItemCost': '10.00', 'ItemTotal': '20', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Cuscinetti Profumati - Preziosa', 'Sku': '45491', 'Quantity': '1', 'ItemCost': '9.90', 'ItemTotal': '9.9', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}]