我需要将 jsonline 转换为 df
I need to transform the jsonline to df
下面是我的jsonline数据的例子
{"allocation":{"duration":1,"rewards":[{"expiryDate":"2021-01-01 23:59:59","rewardSubType":"Boxes"}]},"MODULE":"a","APPLICATION":"g","GUID2":"gh-gh-gh","METHOD":"Expensive","timestamp":"2021-01-01 07:00:00.497"}
{"allocation":{"duration":2,"rewards":[{"expiryDate":"2021-01-02 23:59:59","rewardSubType":"Keys"}]},"MODULE":"a","APPLICATION":"b","GUID":"gh-gh-gh","METHOD":"Free","timestamp":"2021-01-02 07:00:00.497"}
我有运行下面的代码
import pandas as pd
import json
from pandas import json_normalize
with open('C:\Vodacom\KPI Dashboard\ussd_vservices_audit.jsonl') as f:
lines = f.read().splitlines()
df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']
df_inter['json_element'].apply(json.loads)
df_final = pd.json_normalize(df_inter['json_element'].apply(json.loads))
df_final
结果是
但是,我还需要将新创建的 allocation.rewards(旧的“奖励”)列进一步拆分为单独的列。它现在也有方括号和单引号,所以我不知道该怎么做。
请协助
谢谢!
非常棒的回答,谢谢 - 正是我所需要的!但是现在我发现当 运行 对我的完整数据集进行测试时,我有一些数据行没有奖励 section/column,如下所示:
{"offer":null,"MODULE":"purchase","APPLICATION":"new","GUID":"cf-83-11-001","METHOD":"Upsize","timestamp":"2021-02-01 06:00:01.158"}
这导致我出现错误
类型错误:'float' 类型的对象没有 len()
您可以使用以下代码继续您的脚本并获得所需的结果:
for index, row in df_final.iterrows():
value = row["allocation.rewards"]
if value == value and len(value) == 1:
for key, keyValue in value[0].items():
df_final.loc[index, key] = keyValue
df_final.drop(columns=["allocation.rewards"], inplace=True)
df_final
输出
MODULE
APPLICATION
GUID2
METHOD
timestamp
allocation.duration
GUID
offer
expiryDate
rewardSubType
0
a
g
gh-gh-gh
Expensive
2021-01-01 07:00:00.497
1
nan
nan
2021-01-01 23:59:59
Boxes
1
a
b
nan
Free
2021-01-02 07:00:00.497
2
gh-gh-gh
nan
2021-01-02 23:59:59
Keys
2
purchase
new
nan
Upsize
2021-02-01 06:00:01.158
nan
cf-83-11-001
nan
nan
nan
如果您的 rewards
值只包含列表中的一个元素,您可以分解 df_final
中的 allocation.rewards
列
df_final['allocation.rewards'].apply(lambda row: pd.Series(row[0]))
expiryDate rewardSubType
0 2021-01-01 23:59:59 Boxes
1 2021-01-02 23:59:59 Keys
然后将其连接到 df_final 并删除 allocation.rewards
列
df_ = pd.concat([df_final, df_final['allocation.rewards'].apply(lambda row: pd.Series(row[0]))], axis=1)
df_ = df_.drop('allocation.rewards', 1)
MODULE APPLICATION GUID2 METHOD timestamp allocation.duration GUID expiryDate rewardSubType
0 a g gh-gh-gh Expensive 2021-01-01 07:00:00.497 1 NaN 2021-01-01 23:59:59 Boxes
1 a b NaN Free 2021-01-02 07:00:00.497 2 gh-gh-gh 2021-01-02 23:59:59 Keys
下面是我的jsonline数据的例子
{"allocation":{"duration":1,"rewards":[{"expiryDate":"2021-01-01 23:59:59","rewardSubType":"Boxes"}]},"MODULE":"a","APPLICATION":"g","GUID2":"gh-gh-gh","METHOD":"Expensive","timestamp":"2021-01-01 07:00:00.497"}
{"allocation":{"duration":2,"rewards":[{"expiryDate":"2021-01-02 23:59:59","rewardSubType":"Keys"}]},"MODULE":"a","APPLICATION":"b","GUID":"gh-gh-gh","METHOD":"Free","timestamp":"2021-01-02 07:00:00.497"}
我有运行下面的代码
import pandas as pd
import json
from pandas import json_normalize
with open('C:\Vodacom\KPI Dashboard\ussd_vservices_audit.jsonl') as f:
lines = f.read().splitlines()
df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']
df_inter['json_element'].apply(json.loads)
df_final = pd.json_normalize(df_inter['json_element'].apply(json.loads))
df_final
结果是
但是,我还需要将新创建的 allocation.rewards(旧的“奖励”)列进一步拆分为单独的列。它现在也有方括号和单引号,所以我不知道该怎么做。
请协助 谢谢!
非常棒的回答,谢谢 - 正是我所需要的!但是现在我发现当 运行 对我的完整数据集进行测试时,我有一些数据行没有奖励 section/column,如下所示:
{"offer":null,"MODULE":"purchase","APPLICATION":"new","GUID":"cf-83-11-001","METHOD":"Upsize","timestamp":"2021-02-01 06:00:01.158"}
这导致我出现错误 类型错误:'float' 类型的对象没有 len()
您可以使用以下代码继续您的脚本并获得所需的结果:
for index, row in df_final.iterrows():
value = row["allocation.rewards"]
if value == value and len(value) == 1:
for key, keyValue in value[0].items():
df_final.loc[index, key] = keyValue
df_final.drop(columns=["allocation.rewards"], inplace=True)
df_final
输出
MODULE | APPLICATION | GUID2 | METHOD | timestamp | allocation.duration | GUID | offer | expiryDate | rewardSubType | |
---|---|---|---|---|---|---|---|---|---|---|
0 | a | g | gh-gh-gh | Expensive | 2021-01-01 07:00:00.497 | 1 | nan | nan | 2021-01-01 23:59:59 | Boxes |
1 | a | b | nan | Free | 2021-01-02 07:00:00.497 | 2 | gh-gh-gh | nan | 2021-01-02 23:59:59 | Keys |
2 | purchase | new | nan | Upsize | 2021-02-01 06:00:01.158 | nan | cf-83-11-001 | nan | nan | nan |
如果您的 rewards
值只包含列表中的一个元素,您可以分解 df_final
allocation.rewards
列
df_final['allocation.rewards'].apply(lambda row: pd.Series(row[0]))
expiryDate rewardSubType
0 2021-01-01 23:59:59 Boxes
1 2021-01-02 23:59:59 Keys
然后将其连接到 df_final 并删除 allocation.rewards
列
df_ = pd.concat([df_final, df_final['allocation.rewards'].apply(lambda row: pd.Series(row[0]))], axis=1)
df_ = df_.drop('allocation.rewards', 1)
MODULE APPLICATION GUID2 METHOD timestamp allocation.duration GUID expiryDate rewardSubType
0 a g gh-gh-gh Expensive 2021-01-01 07:00:00.497 1 NaN 2021-01-01 23:59:59 Boxes
1 a b NaN Free 2021-01-02 07:00:00.497 2 gh-gh-gh 2021-01-02 23:59:59 Keys