如何将字典(pandas 系列)中的键拉到它自己的行?

How to pull a key from a dict (pandas series) to its own row?

这是我的示例数据,其中有两个字段,其中最后一个 [outbreak] 是 pandas 系列。

开始:

目标(Excel 模型):

复制代码:

import pandas as pd
import json

d = {'report_id': [100, 101], 'outbreak': [
    '{"outbreak_100":{"name":"Chris","disease":"A-Pox"},"outbreak_101":{"name":"Stacy","disease": "H-Pox"}}', 
    '{"outbreak_200":{"name":"Brandon","disease":"C-Pox"},"outbreak_201":{"name":"Karen","disease": "G-Pox"},"outbreak_202":{"name":"Tim","disease": "Z-Pox"}}']}

df = pd.DataFrame(data=d)
print(type(df['outbreak']))
display(df)

#Ignore
df = pd.json_normalize(df['outbreak'].apply(json.loads), max_level=0)
display(df)

尝试: 我考虑过使用 json_normalize() 将每个 [outbreak_id] 转换为它自己的字段,然后使用 pandas.wide_to_long() 来获得我的最终输出。它在测试中有效,但我担心的是我的实际生产数据 太长且嵌套 以至于它最终在旋转之前生成了数十万个字段。这对我来说听起来不太好,为什么我也希望避免循环迭代。

我也考虑过使用 df = df.explode('outbreak') 但我得到一个 KeyError: 0

也许有人比我有更好的主意?谢谢。

你可以尝试用ast转换成dict格式,然后我们做转换

import ast 
out = df.pop('outbreak').map(ast.literal_eval).apply(pd.Series).stack().reset_index(level=1).join(df)
out.columns = ['outbreak_id','outbreak_value','report_id']
Out[157]: 
        level_1                                        0  report_id
0  outbreak_100    {'name': 'Chris', 'disease': 'A-Pox'}        100
0  outbreak_101    {'name': 'Stacy', 'disease': 'H-Pox'}        100
1  outbreak_200  {'name': 'Brandon', 'disease': 'C-Pox'}        101
1  outbreak_201    {'name': 'Karen', 'disease': 'G-Pox'}        101
1  outbreak_202      {'name': 'Tim', 'disease': 'Z-Pox'}        101

一种方法是将每次爆发的 json 转换成字典,列出所有字典 key/value 对,然后分解该列表并将值转换成两个所需的列:

df['outbreak'] = df['outbreak'].apply(lambda v:json.loads(v).items())
df = df.explode('outbreak')
df[['outbreak_id', 'outbreak_value']] = pd.DataFrame(df.pop('outbreak').tolist(), index=df.index)

输出(对于您的示例数据):

   report_id   outbreak_id                           outbreak_value
0        100  outbreak_100    {'name': 'Chris', 'disease': 'A-Pox'}
0        100  outbreak_101    {'name': 'Stacy', 'disease': 'H-Pox'}
1        101  outbreak_200  {'name': 'Brandon', 'disease': 'C-Pox'}
1        101  outbreak_201    {'name': 'Karen', 'disease': 'G-Pox'}
1        101  outbreak_202      {'name': 'Tim', 'disease': 'Z-Pox'}

注意:如果 outbreak 值已经是 dicts,而不是 JSON,请将此代码的第一行更改为:

df['outbreak'] = df['outbreak'].apply(dict.items)

试试这个

import json
d = {'report_id': [100, 101], 'outbreak': [
    '{"outbreak_100":{"name":"Chris","disease":"A-Pox"},"outbreak_101":{"name":"Stacy","disease": "H-Pox"}}', 
    '{"outbreak_200":{"name":"Brandon","disease":"C-Pox"},"outbreak_201":{"name":"Karen","disease": "G-Pox"},"outbreak_202":{"name":"Tim","disease": "Z-Pox"}}']}

df = pd.DataFrame(data=d)
# use json.loads to parse the json and construct df from it
df = pd.DataFrame(df.set_index('report_id')['outbreak'].map(json.loads).to_dict()).stack().rename_axis(['outbreak_id', 'report_id'], axis=0).reset_index(name='outbreak_value')
print(df)
    outbreak_id  report_id                           outbreak_value
0  outbreak_100        100    {'name': 'Chris', 'disease': 'A-Pox'}
1  outbreak_101        100    {'name': 'Stacy', 'disease': 'H-Pox'}
2  outbreak_200        101  {'name': 'Brandon', 'disease': 'C-Pox'}
3  outbreak_201        101    {'name': 'Karen', 'disease': 'G-Pox'}
4  outbreak_202        101      {'name': 'Tim', 'disease': 'Z-Pox'}