如何处理 python 中的可变大小 json 文件以使用 pandas 创建 DataFrame

Question

我正在尝试使用 pandas 构建 DataFrame，但是当我得到的 JSON 块大小可变时，我无法处理这种情况。

例如：第一个块：

{'ad': 0,
 'country': 'US',
 'ver': '1.0',
 'adIdType': 2,
 'adValue': '5',
 'data': {'eventId': 99,
  'clickId': '',
  'eventType': 'PURCHASEMADE',
  'tms': '2019-12-25T09:57:04+0000',
  'productDetails': {'currency': 'DLR',
   'productList': [
    {'segment': 'Girls',
     'vertical': 'Fashion Jewellery',
     'brickname': 'Traditional Jewellery',
     'price': 8,
     'quantity': 10}]},
  'transactionId': '1254'},
 'appName': 'xer.tt',
 'appId': 'XR',
 'sdkVer': '1.0.0',
 'language': 'en',
 'tms': '2022-04-25T09:57:04+0000',
 'tid': '124'}

第二块：

{'ad': 0,
 'country': 'US',
 'ver': '1.0',
 'adIdType': 2,
 'adValue': '78',
 'data': {'eventId': 7,
  'clickId': '',
  'eventType': 'PURCHASEMADE',
  'tms': '20219-02-25T09:57:04+0000',
  'productDetails': {'currency': 'DLR',
   'productList': [{'segment': 'Boys',
     'vertical': 'Fashion',
     'brickname': 'Casuals',
     'price': 10,
     'quantity': 5},
    {'segment': 'Girls',
     'vertical': 'Fashion Jewellery',
     'brickname': 'Traditional Jewellery',
     'price': 8,
     'quantity': 10}]},
  'transactionId': '3258'},
 'appName': 'xer.tt',
 'appId': 'XR',
 'sdkVer': '1.0.0',
 'language': 'en',
 'tms': '2029-02-25T09:57:04+0000',
 'tid': '124'}

现在，在 ProductDetails 中，产品的数量正在发生变化，在第一个块中，我们只列出了 1 个产品，而且很详细，但在第二个块中，我们列出了 2 个产品，而且很详细，对于更多的块，我们可以也有任何数量的产品用于其他块。（即块~记录）

我尝试通过编写一些 python 脚本来做到这一点，但未能找到任何好的解决方案。

PS：如果需要更多详细信息，请在评论中告诉我。

谢谢！

Answer 1

您可以做的是使用 pd.json_normalize 并将最“内部”词典作为您的 record_path 并将您感兴趣的所有其他数据作为您的 meta 。这是一个 in-depth 示例，您可以如何构建它：

在您的情况下，例如（对于单个对象）：

df = pd.json_normalize(obj, 
                         record_path=["data", "productDetails", "productList"], 
                         meta=([
                             ["data", "productDetails", "currency"],
                             ["data", "transactionId"],
                             ["data", "clickId"],
                             ["data", "eventType"],
                             ["data", "tms"],
                             "ad",
                             "country"
                             ])
)

如何处理 python 中的可变大小 json 文件以使用 pandas 创建 DataFrame

How to handle the variable size json file in python to create DataFrame using pandas

parsing

json

dataframe

python-3.x

pandas