如何将字典(pandas 系列)中的键拉到它自己的行?
How to pull a key from a dict (pandas series) to its own row?
这是我的示例数据,其中有两个字段,其中最后一个 [outbreak] 是 pandas 系列。
开始:
目标(Excel 模型):
复制代码:
import pandas as pd
import json
d = {'report_id': [100, 101], 'outbreak': [
'{"outbreak_100":{"name":"Chris","disease":"A-Pox"},"outbreak_101":{"name":"Stacy","disease": "H-Pox"}}',
'{"outbreak_200":{"name":"Brandon","disease":"C-Pox"},"outbreak_201":{"name":"Karen","disease": "G-Pox"},"outbreak_202":{"name":"Tim","disease": "Z-Pox"}}']}
df = pd.DataFrame(data=d)
print(type(df['outbreak']))
display(df)
#Ignore
df = pd.json_normalize(df['outbreak'].apply(json.loads), max_level=0)
display(df)
尝试:
我考虑过使用 json_normalize() 将每个 [outbreak_id] 转换为它自己的字段,然后使用 pandas.wide_to_long() 来获得我的最终输出。它在测试中有效,但我担心的是我的实际生产数据 太长且嵌套 以至于它最终在旋转之前生成了数十万个字段。这对我来说听起来不太好,为什么我也希望避免循环迭代。
我也考虑过使用 df = df.explode('outbreak') 但我得到一个 KeyError: 0
也许有人比我有更好的主意?谢谢。
你可以尝试用ast
转换成dict
格式,然后我们做转换
import ast
out = df.pop('outbreak').map(ast.literal_eval).apply(pd.Series).stack().reset_index(level=1).join(df)
out.columns = ['outbreak_id','outbreak_value','report_id']
Out[157]:
level_1 0 report_id
0 outbreak_100 {'name': 'Chris', 'disease': 'A-Pox'} 100
0 outbreak_101 {'name': 'Stacy', 'disease': 'H-Pox'} 100
1 outbreak_200 {'name': 'Brandon', 'disease': 'C-Pox'} 101
1 outbreak_201 {'name': 'Karen', 'disease': 'G-Pox'} 101
1 outbreak_202 {'name': 'Tim', 'disease': 'Z-Pox'} 101
一种方法是将每次爆发的 json 转换成字典,列出所有字典 key/value 对,然后分解该列表并将值转换成两个所需的列:
df['outbreak'] = df['outbreak'].apply(lambda v:json.loads(v).items())
df = df.explode('outbreak')
df[['outbreak_id', 'outbreak_value']] = pd.DataFrame(df.pop('outbreak').tolist(), index=df.index)
输出(对于您的示例数据):
report_id outbreak_id outbreak_value
0 100 outbreak_100 {'name': 'Chris', 'disease': 'A-Pox'}
0 100 outbreak_101 {'name': 'Stacy', 'disease': 'H-Pox'}
1 101 outbreak_200 {'name': 'Brandon', 'disease': 'C-Pox'}
1 101 outbreak_201 {'name': 'Karen', 'disease': 'G-Pox'}
1 101 outbreak_202 {'name': 'Tim', 'disease': 'Z-Pox'}
注意:如果 outbreak
值已经是 dicts
,而不是 JSON
,请将此代码的第一行更改为:
df['outbreak'] = df['outbreak'].apply(dict.items)
试试这个
import json
d = {'report_id': [100, 101], 'outbreak': [
'{"outbreak_100":{"name":"Chris","disease":"A-Pox"},"outbreak_101":{"name":"Stacy","disease": "H-Pox"}}',
'{"outbreak_200":{"name":"Brandon","disease":"C-Pox"},"outbreak_201":{"name":"Karen","disease": "G-Pox"},"outbreak_202":{"name":"Tim","disease": "Z-Pox"}}']}
df = pd.DataFrame(data=d)
# use json.loads to parse the json and construct df from it
df = pd.DataFrame(df.set_index('report_id')['outbreak'].map(json.loads).to_dict()).stack().rename_axis(['outbreak_id', 'report_id'], axis=0).reset_index(name='outbreak_value')
print(df)
outbreak_id report_id outbreak_value
0 outbreak_100 100 {'name': 'Chris', 'disease': 'A-Pox'}
1 outbreak_101 100 {'name': 'Stacy', 'disease': 'H-Pox'}
2 outbreak_200 101 {'name': 'Brandon', 'disease': 'C-Pox'}
3 outbreak_201 101 {'name': 'Karen', 'disease': 'G-Pox'}
4 outbreak_202 101 {'name': 'Tim', 'disease': 'Z-Pox'}
这是我的示例数据,其中有两个字段,其中最后一个 [outbreak] 是 pandas 系列。
开始:
目标(Excel 模型):
复制代码:
import pandas as pd
import json
d = {'report_id': [100, 101], 'outbreak': [
'{"outbreak_100":{"name":"Chris","disease":"A-Pox"},"outbreak_101":{"name":"Stacy","disease": "H-Pox"}}',
'{"outbreak_200":{"name":"Brandon","disease":"C-Pox"},"outbreak_201":{"name":"Karen","disease": "G-Pox"},"outbreak_202":{"name":"Tim","disease": "Z-Pox"}}']}
df = pd.DataFrame(data=d)
print(type(df['outbreak']))
display(df)
#Ignore
df = pd.json_normalize(df['outbreak'].apply(json.loads), max_level=0)
display(df)
尝试: 我考虑过使用 json_normalize() 将每个 [outbreak_id] 转换为它自己的字段,然后使用 pandas.wide_to_long() 来获得我的最终输出。它在测试中有效,但我担心的是我的实际生产数据 太长且嵌套 以至于它最终在旋转之前生成了数十万个字段。这对我来说听起来不太好,为什么我也希望避免循环迭代。
我也考虑过使用 df = df.explode('outbreak') 但我得到一个 KeyError: 0
也许有人比我有更好的主意?谢谢。
你可以尝试用ast
转换成dict
格式,然后我们做转换
import ast
out = df.pop('outbreak').map(ast.literal_eval).apply(pd.Series).stack().reset_index(level=1).join(df)
out.columns = ['outbreak_id','outbreak_value','report_id']
Out[157]:
level_1 0 report_id
0 outbreak_100 {'name': 'Chris', 'disease': 'A-Pox'} 100
0 outbreak_101 {'name': 'Stacy', 'disease': 'H-Pox'} 100
1 outbreak_200 {'name': 'Brandon', 'disease': 'C-Pox'} 101
1 outbreak_201 {'name': 'Karen', 'disease': 'G-Pox'} 101
1 outbreak_202 {'name': 'Tim', 'disease': 'Z-Pox'} 101
一种方法是将每次爆发的 json 转换成字典,列出所有字典 key/value 对,然后分解该列表并将值转换成两个所需的列:
df['outbreak'] = df['outbreak'].apply(lambda v:json.loads(v).items())
df = df.explode('outbreak')
df[['outbreak_id', 'outbreak_value']] = pd.DataFrame(df.pop('outbreak').tolist(), index=df.index)
输出(对于您的示例数据):
report_id outbreak_id outbreak_value
0 100 outbreak_100 {'name': 'Chris', 'disease': 'A-Pox'}
0 100 outbreak_101 {'name': 'Stacy', 'disease': 'H-Pox'}
1 101 outbreak_200 {'name': 'Brandon', 'disease': 'C-Pox'}
1 101 outbreak_201 {'name': 'Karen', 'disease': 'G-Pox'}
1 101 outbreak_202 {'name': 'Tim', 'disease': 'Z-Pox'}
注意:如果 outbreak
值已经是 dicts
,而不是 JSON
,请将此代码的第一行更改为:
df['outbreak'] = df['outbreak'].apply(dict.items)
试试这个
import json
d = {'report_id': [100, 101], 'outbreak': [
'{"outbreak_100":{"name":"Chris","disease":"A-Pox"},"outbreak_101":{"name":"Stacy","disease": "H-Pox"}}',
'{"outbreak_200":{"name":"Brandon","disease":"C-Pox"},"outbreak_201":{"name":"Karen","disease": "G-Pox"},"outbreak_202":{"name":"Tim","disease": "Z-Pox"}}']}
df = pd.DataFrame(data=d)
# use json.loads to parse the json and construct df from it
df = pd.DataFrame(df.set_index('report_id')['outbreak'].map(json.loads).to_dict()).stack().rename_axis(['outbreak_id', 'report_id'], axis=0).reset_index(name='outbreak_value')
print(df)
outbreak_id report_id outbreak_value
0 outbreak_100 100 {'name': 'Chris', 'disease': 'A-Pox'}
1 outbreak_101 100 {'name': 'Stacy', 'disease': 'H-Pox'}
2 outbreak_200 101 {'name': 'Brandon', 'disease': 'C-Pox'}
3 outbreak_201 101 {'name': 'Karen', 'disease': 'G-Pox'}
4 outbreak_202 101 {'name': 'Tim', 'disease': 'Z-Pox'}