我需要将 jsonline 转换为 df

Question

下面是我的jsonline数据的例子

{"allocation":{"duration":1,"rewards":[{"expiryDate":"2021-01-01 23:59:59","rewardSubType":"Boxes"}]},"MODULE":"a","APPLICATION":"g","GUID2":"gh-gh-gh","METHOD":"Expensive","timestamp":"2021-01-01 07:00:00.497"}
{"allocation":{"duration":2,"rewards":[{"expiryDate":"2021-01-02 23:59:59","rewardSubType":"Keys"}]},"MODULE":"a","APPLICATION":"b","GUID":"gh-gh-gh","METHOD":"Free","timestamp":"2021-01-02 07:00:00.497"}

我有运行下面的代码

import pandas as pd
import json
from pandas import json_normalize

with open('C:\Vodacom\KPI Dashboard\ussd_vservices_audit.jsonl') as f:
    lines = f.read().splitlines()

df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']
df_inter['json_element'].apply(json.loads)
df_final = pd.json_normalize(df_inter['json_element'].apply(json.loads))
df_final

结果是

但是，我还需要将新创建的 allocation.rewards（旧的“奖励”）列进一步拆分为单独的列。它现在也有方括号和单引号，所以我不知道该怎么做。

请协助谢谢！

非常棒的回答，谢谢 - 正是我所需要的！但是现在我发现当运行对我的完整数据集进行测试时，我有一些数据行没有奖励 section/column，如下所示：

{"offer":null,"MODULE":"purchase","APPLICATION":"new","GUID":"cf-83-11-001","METHOD":"Upsize","timestamp":"2021-02-01 06:00:01.158"}

这导致我出现错误类型错误：'float' 类型的对象没有 len()

Answer 1

您可以使用以下代码继续您的脚本并获得所需的结果：

for index, row in df_final.iterrows():
  value = row["allocation.rewards"]
  if value == value and len(value) == 1:
    for key, keyValue in value[0].items():
      df_final.loc[index, key] = keyValue
df_final.drop(columns=["allocation.rewards"], inplace=True)
df_final

输出

	MODULE	APPLICATION	GUID2	METHOD	timestamp	allocation.duration	GUID	offer	expiryDate	rewardSubType
0	a	g	gh-gh-gh	Expensive	2021-01-01 07:00:00.497	1	nan	nan	2021-01-01 23:59:59	Boxes
1	a	b	nan	Free	2021-01-02 07:00:00.497	2	gh-gh-gh	nan	2021-01-02 23:59:59	Keys
2	purchase	new	nan	Upsize	2021-02-01 06:00:01.158	nan	cf-83-11-001	nan	nan	nan

Answer 2

如果您的 rewards 值只包含列表中的一个元素，您可以分解 df_final

中的 allocation.rewards 列

df_final['allocation.rewards'].apply(lambda row: pd.Series(row[0]))

            expiryDate rewardSubType
0  2021-01-01 23:59:59         Boxes
1  2021-01-02 23:59:59          Keys

然后将其连接到 df_final 并删除 allocation.rewards 列

df_ = pd.concat([df_final, df_final['allocation.rewards'].apply(lambda row: pd.Series(row[0]))], axis=1)
df_ = df_.drop('allocation.rewards', 1)

  MODULE APPLICATION     GUID2     METHOD                timestamp  allocation.duration      GUID           expiryDate rewardSubType
0      a           g  gh-gh-gh  Expensive  2021-01-01 07:00:00.497                    1       NaN  2021-01-01 23:59:59         Boxes
1      a           b       NaN       Free  2021-01-02 07:00:00.497                    2  gh-gh-gh  2021-01-02 23:59:59          Keys

我需要将 jsonline 转换为 df

I need to transform the jsonline to df

python

json

pandas

jsonlines

输出