将嵌套 JSON 对象规范化为 Pandas 数据框
Normalizing nested JSON object into Pandas dataframe
背景: 我正在尝试规范化 json 文件,并保存到 pandas 数据帧中,但是我在导航 json 结构和我的代码没有按预期工作。
预期数据帧输出:给定以下示例json
文件(使用随机数据,但格式与真实文件完全相同),这是我的输出我正在尝试生产 -
New Entity Group
Entity ID
Adjusted Value
(1/31/2022, No Div, USD)
Adjusted TWR
(Current Quarter No Div, USD))
Adjusted TWR
(YTD, No Div, USD)
Annualized Adjusted TWR
(Since Inception, No Div, USD)
Inception Date
Risk Target
Portfolio_1
0,786
(44.55%)
(44.55%)
(44.55%) *
Apr 7, 2021
N/A
The FW Irrev Family Tr
9552252
0,786
0.00%
0.00%
0.00% *
Jan 11, 2022
N/A
Portfolio_2
,396,664
(5.78%)
(5.78%)
(5.47%) *
Sep 3, 2021
Growth
FW DAF
10946585
,396,664
(5.78%)
(5.78%)
(5.47%) *
Sep 3, 2021
Growth
Portfolio_3
,143,818
(4.42%)
(4.42%)
7.75% *
Dec 17, 2020
-
The FW Family Trust
13014080
5,356
(6.10%)
(6.10%)
(3.97%) *
Apr 9, 2021
Aggressive
FW Liquid Fund LP
13396796
,899,527
(4.15%)
(4.15%)
(4.15%) *
Dec 30, 2021
Aggressive
FW Holdings No. 2 LLC
8413655
,768,937
(0.77%)
(0.77%)
11.84% *
Mar 5, 2021
N/A
FW and FR Joint
9957007
()
-
-
- *
Dec 21, 2021
N/A
实际数据帧输出: 尽管我尽了最大努力,但我只能将粗体行映射到数据帧中:
New Entity Group
Entity ID
Adjusted Value
(1/31/2022, No Div, USD)
Adjusted TWR
(Current Quarter No Div, USD))
Adjusted TWR
(YTD, No Div, USD)
Annualized Adjusted TWR
(Since Inception, No Div, USD)
Inception Date
Risk Target
Portfolio_1
0,786
(44.55%)
(44.55%)
(44.55%) *
Apr 7, 2021
N/A
Portfolio_2
,396,664
(5.78%)
(5.78%)
(5.47%) *
Sep 3, 2021
Growth
Portfolio_3
,143,818
(4.42%)
(4.42%)
7.75% *
Dec 17, 2020
-
JSON 文件: 这是我要规范化并映射到数据帧中的文件:
{
"meta": {
"columns": [
{
"key": "node_id",
"display_name": "Entity ID",
"output_type": "Word"
},
{
"key": "value",
"display_name": "Adjusted Value (1/31/2022, No Div, USD)",
"output_type": "Number",
"currency": "USD"
},
{
"key": "time_weighted_return",
"display_name": "Adjusted TWR (Current Quarter, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "time_weighted_return_2",
"display_name": "Adjusted TWR (YTD, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "time_weighted_return_3",
"display_name": "Annualized Adjusted TWR (Since Inception, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "inception_event_date",
"display_name": "Inception Date",
"output_type": "Date"
},
{
"key": "_custom_portfolio_target_347209",
"display_name": "Risk Target",
"output_type": "Word"
}
],
"groupings": [
{
"key": "_custom_new_entity_group_453577",
"display_name": "NEW Entity Group"
},
{
"key": "top_level_legal_entity",
"display_name": "Top Level Legal Entity"
}
]
},
"data": {
"type": "portfolio_views",
"attributes": {
"total": {
"name": "Total",
"columns": {
"time_weighted_return": -0.05001974888806926,
"inception_event_date": "2020-12-17",
"_custom_portfolio_target_347209": null,
"time_weighted_return_3": 0.0678647066340392,
"time_weighted_return_2": -0.05001974888806926,
"value": 7.880126780581851E7,
"node_id": null
},
"children": [
{
"name": "Portfolio_3",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.04420061615233983,
"inception_event_date": "2020-12-17",
"_custom_portfolio_target_347209": null,
"time_weighted_return_3": 0.07748325432684622,
"time_weighted_return_2": -0.04420061615233983,
"value": 6.014381761929752E7,
"node_id": null
},
"children": [
{
"entity_id": 9957007,
"name": "FW and FR Joint",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": null,
"inception_event_date": "2021-12-21",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": null,
"time_weighted_return_2": null,
"value": -1.44,
"node_id": "9957007"
},
"children": []
},
{
"entity_id": 8413655,
"name": "FW Holdings No. 2 LLC",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.0077309266066708515,
"inception_event_date": "2021-03-05",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": 0.11844843557716445,
"time_weighted_return_2": -0.0077309266066708515,
"value": 6768936.74,
"node_id": "8413655"
},
"children": []
},
{
"entity_id": 13396796,
"name": "FW Liquid Fund LP",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.04149769229150746,
"inception_event_date": "2021-12-30",
"_custom_portfolio_target_347209": "Aggressive",
"time_weighted_return_3": -0.041497430478377395,
"time_weighted_return_2": -0.04149769229150746,
"value": 5.289952672686747E7,
"node_id": "13396796"
},
"children": []
},
{
"entity_id": 13014080,
"name": "The FW Family Trust",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.06102013456998856,
"inception_event_date": "2021-04-09",
"_custom_portfolio_target_347209": "Aggressive",
"time_weighted_return_3": -0.039685671858585514,
"time_weighted_return_2": -0.06102013456998856,
"value": 475355.59242999996,
"node_id": "13014080"
},
"children": []
}
]
},
{
"name": "Portfolio_1",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.44554958179309,
"inception_event_date": "2021-04-07",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": -0.44554958179309,
"time_weighted_return_2": -0.44554958179309,
"value": 260786.03,
"node_id": null
},
"children": [
{
"entity_id": 9552252,
"name": "The FW Irrev Family Tr",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": 0.0,
"inception_event_date": "2022-01-11",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": 0.0,
"time_weighted_return_2": 0.0,
"value": 260786.03,
"node_id": "9552252"
},
"children": []
}
]
},
{
"name": "Portfolio_2",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.05780354507057972,
"inception_event_date": "2021-09-03",
"_custom_portfolio_target_347209": "Growth",
"time_weighted_return_3": -0.05470214863844658,
"time_weighted_return_2": -0.05780354507057972,
"value": 1.8396664156520825E7,
"node_id": null
},
"children": [
{
"entity_id": 10946585,
"name": "FW DAF",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.05780354507057972,
"inception_event_date": "2021-09-03",
"_custom_portfolio_target_347209": "Growth",
"time_weighted_return_3": -0.05470214863844658,
"time_weighted_return_2": -0.05780354507057972,
"value": 1.8396664156520832E7,
"node_id": "10946585"
},
"children": []
}
]
}
]
}
}
},
"included": []
}
我的代码: 这是我构建的函数,用于尝试规范化 JSON 响应并保存在 pandas 数据帧中 -
def unpack_response():
while True:
try:
api_response = response_writer()
df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
df.columns = df.columns.str.replace(r'columns.', '', regex=False)
column_name_mapper = {column['key']: column['display_name'] for column in api_response['meta']['columns']}
df.rename(columns=column_name_mapper, inplace=True)
break
except KeyError:
print("-----------------------------------\n","API TIMEOUT ERROR: TRYING AGAIN...", "\n-----------------------------------\n")
df.rename(columns={'name': 'New Entity Group'}, inplace=True)
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date"]
df = df.reindex(columns=column_names)
return df
unpack_response()
评论我的代码:
- Portfolio_1、Portfolio_2、Portfolio_3 - 这些粗体行是
children
或 [=16= 的第一层] 并且似乎是唯一保存到 df
的行。我认为这是因为我的代码引用 df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
所以只查看这些列表。我尝试将 ['children']['children']
附加到该代码片段的末尾(假定 children
有 3 倍级别,但收到 TypeError: list indices must be integers or slices, not str
.
我将不胜感激任何关于如何改进或添加我的功能的建议,这样我就可以利用 key:pair 值,这是 children
水平的 2 倍。
由于您的 children
的 children
与 children
具有相同的结构,您可以尝试分别使用 json_normalize
两次并将其附加在一起。
# For first layer that includes Portfolio_1, Portfolio_2, Portfolio_3
df = pd.json_normalize(s, record_path=['data', 'attributes', 'total', 'children'])
# For second layer that includes The FW Irrev Family Tr, etc
# Use explode to convert list into rows
df_child = pd.json_normalize(df.explode('children').children)
# Combine both
df = pd.concat([df, df_child])
# You can use your column renaming and filtering
这看起来像是您正在尝试创建然后堆叠三个数据帧,您可能真的不想这样做,或者通过将每个 Porfolio_ 映射到每个相关行可能会更好地实现然后
import itertools
...
portfolio_views_children = response['data']['attributes']['total']['children']
portfolios = []
for portfolio in portfolio_views_children:
entity_columns = []
# include portfolio itself within an iterable so the total is the header
for entity in itertools.chain([portfolio], portfolio["children"]):
entity_data = entity["columns"].copy() # don't mutate original response
entity_data["portfolio"] = portfolio["name"] # from outer
entity_data["name"] = entity["name"]
entity_columns.append(entity_data)
df = pd.DataFrame(entity_columns)
portfolios.append(df)
# combine dataframes
df = pd.concat(portfolios)
# reorder and rename
column_ordering = {"portfolio": "portfolio", "name": "name"}
column_ordering.update({c["key"]: c["display_name"] for c in response["meta"]["columns"]})
df = df[column_ordering.keys()] # beware: un-named cols will be dropped
df = df.rename(columns=column_ordering)
print(df.to_markdown(index=False)) # create output below (requires tabulate)
portfolio
name
Entity ID
Adjusted Value (1/31/2022, No Div, USD)
Adjusted TWR (Current Quarter, No Div, USD)
Adjusted TWR (YTD, No Div, USD)
Annualized Adjusted TWR (Since Inception, No Div, USD)
Inception Date
Risk Target
Portfolio_3
Portfolio_3
6.01438e+07
-0.0442006
-0.0442006
0.0774833
2020-12-17
Portfolio_3
FW and FR Joint
9957007
-1.44
nan
nan
nan
2021-12-21
N/A
Portfolio_3
FW Holdings No. 2 LLC
8413655
6.76894e+06
-0.00773093
-0.00773093
0.118448
2021-03-05
N/A
Portfolio_3
FW Liquid Fund LP
13396796
5.28995e+07
-0.0414977
-0.0414977
-0.0414974
2021-12-30
Aggressive
Portfolio_3
The FW Family Trust
13014080
475356
-0.0610201
-0.0610201
-0.0396857
2021-04-09
Aggressive
Portfolio_1
Portfolio_1
260786
-0.44555
-0.44555
-0.44555
2021-04-07
N/A
Portfolio_1
The FW Irrev Family Tr
9552252
260786
0
0
0
2022-01-11
N/A
Portfolio_2
Portfolio_2
1.83967e+07
-0.0578035
-0.0578035
-0.0547021
2021-09-03
Growth
Portfolio_2
FW DAF
10946585
1.83967e+07
-0.0578035
-0.0578035
-0.0547021
2021-09-03
Growth
就个人而言,我不会在这种情况下使用 pd.json_normalize
。您的 JSON 非常复杂,除非您真的有使用 json_normalize
的经验,否则对于普通开发人员而言,以下代码可能花费更少的时间来理解。事实上,您甚至不需要查看 JSON 即可准确理解这段代码的作用(尽管它肯定会有所帮助 ;)。
首先,我们可以将 JSON 中的对象(投资组合及其子项)提取到列表中,并使用一系列步骤以正确的形式和顺序获取它们:
def prep_obj(o):
"""Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
return {
'New Entity Group': o['name'],
} | o['columns']
# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]
# Sort the portfolio groups by their number:
groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))
# Reverse the children of each portfolio group:
groups = [[g[0]] + g[1:][::-1] for g in groups]
# Flatten out the groups into one large list of objects:
objects = [obj for group in groups for obj in group]
# The above is exactly equivalent to the following:
# objects = []
# for group in groups:
# for obj in group:
# objects.append(obj)
接下来,创建数据框:
# Create a mapping for column names so that their display names can be used:
mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}
# Create a dataframe from the list of objects:
df = pd.DataFrame(objects)
# Correct column names:
df = df.rename(mapping, axis=1)
# Reorder columns:
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df[column_names]
和格式:
def format_twr_col(col):
return (
col
.abs()
.mul(100)
.round(2)
.pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
.pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
.fillna('-')
)
def format_value_col(col):
positive_mask = col.ge(0)
col[positive_mask] = (
col[positive_mask]
.round()
.astype(int)
.map('${:,}'.format)
)
col[~positive_mask] = (
col[~positive_mask]
.astype(float)
.round()
.astype(int)
.abs()
.map('(${:,})'.format)
)
return col
df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])
df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())
df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')
df['Entity ID'] = df['Entity ID'].fillna('')
然后……瞧:
>>> pd.options.display.max_columns = None
>>> df
New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date Risk Target
0 Portfolio_1 0,786 (44.55%) (44.55%) (44.55%) Apr 07, 2021 N/A
1 The FW Irrev Family Tr 9552252 0,786 0.0% 0.0% 0.0% Jan 11, 2022 N/A
2 Portfolio_2 ,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth
3 FW DAF 10946585 ,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth
4 Portfolio_3 ,143,818 (4.42%) (4.42%) (7.75%) Dec 17, 2020 NaN
5 The FW Family Trust 13014080 5,356 (6.1%) (6.1%) (3.97%) Apr 09, 2021 Aggressive
6 FW Liquid Fund LP 13396796 ,899,527 (4.15%) (4.15%) (4.15%) Dec 30, 2021 Aggressive
7 FW Holdings No. 2 LLC 8413655 ,768,937 (0.77%) (0.77%) (11.84%) Mar 05, 2021 N/A
8 FW and FR Joint 9957007 () - - - Dec 21, 2021 N/A
我更喜欢使用 json_normalize
。以下代码不处理错误处理、详细格式化等,但我认为您最想做的事情的本质已经包含在内。
代码:
import json
import pandas as pd
# You have to change this path according to the actual json file location.
with open('./api_response.json', 'r') as f:
api_response = json.load(f)
def unpack_response(r):
df = pd.DataFrame()
df_src = pd.json_normalize(r, record_path=['data', 'attributes', 'total', 'children'])
for _, row in df_src.sort_values('name').iterrows():
df_p = pd.DataFrame(row).T
df_c = pd.json_normalize(row.children)
# I'm not sure what your expected sorting order is. Perhaps you might want to delete the next line.
df_c = df_c.sort_values(['columns._custom_portfolio_target_347209', 'columns.inception_event_date'])
df = pd.concat([df, df_p, df_c], axis=0, ignore_index=True)
column_name_mapper = {'columns.' + column['key']: column['display_name'] for column in api_response['meta']['columns']}
column_name_mapper.update({'name': 'New Entity Group'})
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df.rename(columns=column_name_mapper).reindex(columns=column_names)
return df
df = unpack_response(api_response)
输出:
New Entity Group
Entity ID
Adjusted Value (1/31/2022, No Div, USD)
Adjusted TWR (Current Quarter, No Div, USD)
Adjusted TWR (YTD, No Div, USD)
Annualized Adjusted TWR (Since Inception, No Div, USD)
Inception Date
Risk Target
Portfolio_1
260786
-0.44555
-0.44555
-0.44555
2021-04-07
N/A
The FW Irrev Family Tr
9552252
260786
0
0
0
2022-01-11
N/A
Portfolio_2
1.83967e+07
-0.0578035
-0.0578035
-0.0547021
2021-09-03
Growth
FW DAF
10946585
1.83967e+07
-0.0578035
-0.0578035
-0.0547021
2021-09-03
Growth
Portfolio_3
6.01438e+07
-0.0442006
-0.0442006
0.0774833
2020-12-17
The FW Family Trust
13014080
475356
-0.0610201
-0.0610201
-0.0396857
2021-04-09
Aggressive
FW Liquid Fund LP
13396796
5.28995e+07
-0.0414977
-0.0414977
-0.0414974
2021-12-30
Aggressive
FW Holdings No. 2 LLC
8413655
6.76894e+06
-0.00773093
-0.00773093
0.118448
2021-03-05
N/A
FW and FR Joint
9957007
-1.44
nan
nan
nan
2021-12-21
N/A
背景: 我正在尝试规范化 json 文件,并保存到 pandas 数据帧中,但是我在导航 json 结构和我的代码没有按预期工作。
预期数据帧输出:给定以下示例json
文件(使用随机数据,但格式与真实文件完全相同),这是我的输出我正在尝试生产 -
New Entity Group | Entity ID | Adjusted Value (1/31/2022, No Div, USD) |
Adjusted TWR (Current Quarter No Div, USD)) |
Adjusted TWR (YTD, No Div, USD) |
Annualized Adjusted TWR (Since Inception, No Div, USD) |
Inception Date | Risk Target |
---|---|---|---|---|---|---|---|
Portfolio_1 | 0,786 | (44.55%) | (44.55%) | (44.55%) * | Apr 7, 2021 | N/A | |
The FW Irrev Family Tr | 9552252 | 0,786 | 0.00% | 0.00% | 0.00% * | Jan 11, 2022 | N/A |
Portfolio_2 | ,396,664 | (5.78%) | (5.78%) | (5.47%) * | Sep 3, 2021 | Growth | |
FW DAF | 10946585 | ,396,664 | (5.78%) | (5.78%) | (5.47%) * | Sep 3, 2021 | Growth |
Portfolio_3 | ,143,818 | (4.42%) | (4.42%) | 7.75% * | Dec 17, 2020 | - | |
The FW Family Trust | 13014080 | 5,356 | (6.10%) | (6.10%) | (3.97%) * | Apr 9, 2021 | Aggressive |
FW Liquid Fund LP | 13396796 | ,899,527 | (4.15%) | (4.15%) | (4.15%) * | Dec 30, 2021 | Aggressive |
FW Holdings No. 2 LLC | 8413655 | ,768,937 | (0.77%) | (0.77%) | 11.84% * | Mar 5, 2021 | N/A |
FW and FR Joint | 9957007 | () | - | - | - * | Dec 21, 2021 | N/A |
实际数据帧输出: 尽管我尽了最大努力,但我只能将粗体行映射到数据帧中:
New Entity Group | Entity ID | Adjusted Value (1/31/2022, No Div, USD) |
Adjusted TWR (Current Quarter No Div, USD)) |
Adjusted TWR (YTD, No Div, USD) |
Annualized Adjusted TWR (Since Inception, No Div, USD) |
Inception Date | Risk Target |
---|---|---|---|---|---|---|---|
Portfolio_1 | 0,786 | (44.55%) | (44.55%) | (44.55%) * | Apr 7, 2021 | N/A | |
Portfolio_2 | ,396,664 | (5.78%) | (5.78%) | (5.47%) * | Sep 3, 2021 | Growth | |
Portfolio_3 | ,143,818 | (4.42%) | (4.42%) | 7.75% * | Dec 17, 2020 | - |
JSON 文件: 这是我要规范化并映射到数据帧中的文件:
{
"meta": {
"columns": [
{
"key": "node_id",
"display_name": "Entity ID",
"output_type": "Word"
},
{
"key": "value",
"display_name": "Adjusted Value (1/31/2022, No Div, USD)",
"output_type": "Number",
"currency": "USD"
},
{
"key": "time_weighted_return",
"display_name": "Adjusted TWR (Current Quarter, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "time_weighted_return_2",
"display_name": "Adjusted TWR (YTD, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "time_weighted_return_3",
"display_name": "Annualized Adjusted TWR (Since Inception, No Div, USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "inception_event_date",
"display_name": "Inception Date",
"output_type": "Date"
},
{
"key": "_custom_portfolio_target_347209",
"display_name": "Risk Target",
"output_type": "Word"
}
],
"groupings": [
{
"key": "_custom_new_entity_group_453577",
"display_name": "NEW Entity Group"
},
{
"key": "top_level_legal_entity",
"display_name": "Top Level Legal Entity"
}
]
},
"data": {
"type": "portfolio_views",
"attributes": {
"total": {
"name": "Total",
"columns": {
"time_weighted_return": -0.05001974888806926,
"inception_event_date": "2020-12-17",
"_custom_portfolio_target_347209": null,
"time_weighted_return_3": 0.0678647066340392,
"time_weighted_return_2": -0.05001974888806926,
"value": 7.880126780581851E7,
"node_id": null
},
"children": [
{
"name": "Portfolio_3",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.04420061615233983,
"inception_event_date": "2020-12-17",
"_custom_portfolio_target_347209": null,
"time_weighted_return_3": 0.07748325432684622,
"time_weighted_return_2": -0.04420061615233983,
"value": 6.014381761929752E7,
"node_id": null
},
"children": [
{
"entity_id": 9957007,
"name": "FW and FR Joint",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": null,
"inception_event_date": "2021-12-21",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": null,
"time_weighted_return_2": null,
"value": -1.44,
"node_id": "9957007"
},
"children": []
},
{
"entity_id": 8413655,
"name": "FW Holdings No. 2 LLC",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.0077309266066708515,
"inception_event_date": "2021-03-05",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": 0.11844843557716445,
"time_weighted_return_2": -0.0077309266066708515,
"value": 6768936.74,
"node_id": "8413655"
},
"children": []
},
{
"entity_id": 13396796,
"name": "FW Liquid Fund LP",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.04149769229150746,
"inception_event_date": "2021-12-30",
"_custom_portfolio_target_347209": "Aggressive",
"time_weighted_return_3": -0.041497430478377395,
"time_weighted_return_2": -0.04149769229150746,
"value": 5.289952672686747E7,
"node_id": "13396796"
},
"children": []
},
{
"entity_id": 13014080,
"name": "The FW Family Trust",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.06102013456998856,
"inception_event_date": "2021-04-09",
"_custom_portfolio_target_347209": "Aggressive",
"time_weighted_return_3": -0.039685671858585514,
"time_weighted_return_2": -0.06102013456998856,
"value": 475355.59242999996,
"node_id": "13014080"
},
"children": []
}
]
},
{
"name": "Portfolio_1",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.44554958179309,
"inception_event_date": "2021-04-07",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": -0.44554958179309,
"time_weighted_return_2": -0.44554958179309,
"value": 260786.03,
"node_id": null
},
"children": [
{
"entity_id": 9552252,
"name": "The FW Irrev Family Tr",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": 0.0,
"inception_event_date": "2022-01-11",
"_custom_portfolio_target_347209": "N/A",
"time_weighted_return_3": 0.0,
"time_weighted_return_2": 0.0,
"value": 260786.03,
"node_id": "9552252"
},
"children": []
}
]
},
{
"name": "Portfolio_2",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.05780354507057972,
"inception_event_date": "2021-09-03",
"_custom_portfolio_target_347209": "Growth",
"time_weighted_return_3": -0.05470214863844658,
"time_weighted_return_2": -0.05780354507057972,
"value": 1.8396664156520825E7,
"node_id": null
},
"children": [
{
"entity_id": 10946585,
"name": "FW DAF",
"grouping": "top_level_legal_entity",
"columns": {
"time_weighted_return": -0.05780354507057972,
"inception_event_date": "2021-09-03",
"_custom_portfolio_target_347209": "Growth",
"time_weighted_return_3": -0.05470214863844658,
"time_weighted_return_2": -0.05780354507057972,
"value": 1.8396664156520832E7,
"node_id": "10946585"
},
"children": []
}
]
}
]
}
}
},
"included": []
}
我的代码: 这是我构建的函数,用于尝试规范化 JSON 响应并保存在 pandas 数据帧中 -
def unpack_response():
while True:
try:
api_response = response_writer()
df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
df.columns = df.columns.str.replace(r'columns.', '', regex=False)
column_name_mapper = {column['key']: column['display_name'] for column in api_response['meta']['columns']}
df.rename(columns=column_name_mapper, inplace=True)
break
except KeyError:
print("-----------------------------------\n","API TIMEOUT ERROR: TRYING AGAIN...", "\n-----------------------------------\n")
df.rename(columns={'name': 'New Entity Group'}, inplace=True)
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date"]
df = df.reindex(columns=column_names)
return df
unpack_response()
评论我的代码:
- Portfolio_1、Portfolio_2、Portfolio_3 - 这些粗体行是
children
或 [=16= 的第一层] 并且似乎是唯一保存到df
的行。我认为这是因为我的代码引用df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
所以只查看这些列表。我尝试将['children']['children']
附加到该代码片段的末尾(假定children
有 3 倍级别,但收到TypeError: list indices must be integers or slices, not str
.
我将不胜感激任何关于如何改进或添加我的功能的建议,这样我就可以利用 key:pair 值,这是 children
水平的 2 倍。
由于您的 children
的 children
与 children
具有相同的结构,您可以尝试分别使用 json_normalize
两次并将其附加在一起。
# For first layer that includes Portfolio_1, Portfolio_2, Portfolio_3
df = pd.json_normalize(s, record_path=['data', 'attributes', 'total', 'children'])
# For second layer that includes The FW Irrev Family Tr, etc
# Use explode to convert list into rows
df_child = pd.json_normalize(df.explode('children').children)
# Combine both
df = pd.concat([df, df_child])
# You can use your column renaming and filtering
这看起来像是您正在尝试创建然后堆叠三个数据帧,您可能真的不想这样做,或者通过将每个 Porfolio_ 映射到每个相关行可能会更好地实现然后
import itertools
...
portfolio_views_children = response['data']['attributes']['total']['children']
portfolios = []
for portfolio in portfolio_views_children:
entity_columns = []
# include portfolio itself within an iterable so the total is the header
for entity in itertools.chain([portfolio], portfolio["children"]):
entity_data = entity["columns"].copy() # don't mutate original response
entity_data["portfolio"] = portfolio["name"] # from outer
entity_data["name"] = entity["name"]
entity_columns.append(entity_data)
df = pd.DataFrame(entity_columns)
portfolios.append(df)
# combine dataframes
df = pd.concat(portfolios)
# reorder and rename
column_ordering = {"portfolio": "portfolio", "name": "name"}
column_ordering.update({c["key"]: c["display_name"] for c in response["meta"]["columns"]})
df = df[column_ordering.keys()] # beware: un-named cols will be dropped
df = df.rename(columns=column_ordering)
print(df.to_markdown(index=False)) # create output below (requires tabulate)
portfolio | name | Entity ID | Adjusted Value (1/31/2022, No Div, USD) | Adjusted TWR (Current Quarter, No Div, USD) | Adjusted TWR (YTD, No Div, USD) | Annualized Adjusted TWR (Since Inception, No Div, USD) | Inception Date | Risk Target |
---|---|---|---|---|---|---|---|---|
Portfolio_3 | Portfolio_3 | 6.01438e+07 | -0.0442006 | -0.0442006 | 0.0774833 | 2020-12-17 | ||
Portfolio_3 | FW and FR Joint | 9957007 | -1.44 | nan | nan | nan | 2021-12-21 | N/A |
Portfolio_3 | FW Holdings No. 2 LLC | 8413655 | 6.76894e+06 | -0.00773093 | -0.00773093 | 0.118448 | 2021-03-05 | N/A |
Portfolio_3 | FW Liquid Fund LP | 13396796 | 5.28995e+07 | -0.0414977 | -0.0414977 | -0.0414974 | 2021-12-30 | Aggressive |
Portfolio_3 | The FW Family Trust | 13014080 | 475356 | -0.0610201 | -0.0610201 | -0.0396857 | 2021-04-09 | Aggressive |
Portfolio_1 | Portfolio_1 | 260786 | -0.44555 | -0.44555 | -0.44555 | 2021-04-07 | N/A | |
Portfolio_1 | The FW Irrev Family Tr | 9552252 | 260786 | 0 | 0 | 0 | 2022-01-11 | N/A |
Portfolio_2 | Portfolio_2 | 1.83967e+07 | -0.0578035 | -0.0578035 | -0.0547021 | 2021-09-03 | Growth | |
Portfolio_2 | FW DAF | 10946585 | 1.83967e+07 | -0.0578035 | -0.0578035 | -0.0547021 | 2021-09-03 | Growth |
就个人而言,我不会在这种情况下使用 pd.json_normalize
。您的 JSON 非常复杂,除非您真的有使用 json_normalize
的经验,否则对于普通开发人员而言,以下代码可能花费更少的时间来理解。事实上,您甚至不需要查看 JSON 即可准确理解这段代码的作用(尽管它肯定会有所帮助 ;)。
首先,我们可以将 JSON 中的对象(投资组合及其子项)提取到列表中,并使用一系列步骤以正确的形式和顺序获取它们:
def prep_obj(o):
"""Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
return {
'New Entity Group': o['name'],
} | o['columns']
# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]
# Sort the portfolio groups by their number:
groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))
# Reverse the children of each portfolio group:
groups = [[g[0]] + g[1:][::-1] for g in groups]
# Flatten out the groups into one large list of objects:
objects = [obj for group in groups for obj in group]
# The above is exactly equivalent to the following:
# objects = []
# for group in groups:
# for obj in group:
# objects.append(obj)
接下来,创建数据框:
# Create a mapping for column names so that their display names can be used:
mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}
# Create a dataframe from the list of objects:
df = pd.DataFrame(objects)
# Correct column names:
df = df.rename(mapping, axis=1)
# Reorder columns:
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df[column_names]
和格式:
def format_twr_col(col):
return (
col
.abs()
.mul(100)
.round(2)
.pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
.pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
.fillna('-')
)
def format_value_col(col):
positive_mask = col.ge(0)
col[positive_mask] = (
col[positive_mask]
.round()
.astype(int)
.map('${:,}'.format)
)
col[~positive_mask] = (
col[~positive_mask]
.astype(float)
.round()
.astype(int)
.abs()
.map('(${:,})'.format)
)
return col
df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])
df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())
df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')
df['Entity ID'] = df['Entity ID'].fillna('')
然后……瞧:
>>> pd.options.display.max_columns = None
>>> df
New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date Risk Target
0 Portfolio_1 0,786 (44.55%) (44.55%) (44.55%) Apr 07, 2021 N/A
1 The FW Irrev Family Tr 9552252 0,786 0.0% 0.0% 0.0% Jan 11, 2022 N/A
2 Portfolio_2 ,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth
3 FW DAF 10946585 ,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth
4 Portfolio_3 ,143,818 (4.42%) (4.42%) (7.75%) Dec 17, 2020 NaN
5 The FW Family Trust 13014080 5,356 (6.1%) (6.1%) (3.97%) Apr 09, 2021 Aggressive
6 FW Liquid Fund LP 13396796 ,899,527 (4.15%) (4.15%) (4.15%) Dec 30, 2021 Aggressive
7 FW Holdings No. 2 LLC 8413655 ,768,937 (0.77%) (0.77%) (11.84%) Mar 05, 2021 N/A
8 FW and FR Joint 9957007 () - - - Dec 21, 2021 N/A
我更喜欢使用 json_normalize
。以下代码不处理错误处理、详细格式化等,但我认为您最想做的事情的本质已经包含在内。
代码:
import json
import pandas as pd
# You have to change this path according to the actual json file location.
with open('./api_response.json', 'r') as f:
api_response = json.load(f)
def unpack_response(r):
df = pd.DataFrame()
df_src = pd.json_normalize(r, record_path=['data', 'attributes', 'total', 'children'])
for _, row in df_src.sort_values('name').iterrows():
df_p = pd.DataFrame(row).T
df_c = pd.json_normalize(row.children)
# I'm not sure what your expected sorting order is. Perhaps you might want to delete the next line.
df_c = df_c.sort_values(['columns._custom_portfolio_target_347209', 'columns.inception_event_date'])
df = pd.concat([df, df_p, df_c], axis=0, ignore_index=True)
column_name_mapper = {'columns.' + column['key']: column['display_name'] for column in api_response['meta']['columns']}
column_name_mapper.update({'name': 'New Entity Group'})
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df.rename(columns=column_name_mapper).reindex(columns=column_names)
return df
df = unpack_response(api_response)
输出:
New Entity Group | Entity ID | Adjusted Value (1/31/2022, No Div, USD) | Adjusted TWR (Current Quarter, No Div, USD) | Adjusted TWR (YTD, No Div, USD) | Annualized Adjusted TWR (Since Inception, No Div, USD) | Inception Date | Risk Target |
---|---|---|---|---|---|---|---|
Portfolio_1 | 260786 | -0.44555 | -0.44555 | -0.44555 | 2021-04-07 | N/A | |
The FW Irrev Family Tr | 9552252 | 260786 | 0 | 0 | 0 | 2022-01-11 | N/A |
Portfolio_2 | 1.83967e+07 | -0.0578035 | -0.0578035 | -0.0547021 | 2021-09-03 | Growth | |
FW DAF | 10946585 | 1.83967e+07 | -0.0578035 | -0.0578035 | -0.0547021 | 2021-09-03 | Growth |
Portfolio_3 | 6.01438e+07 | -0.0442006 | -0.0442006 | 0.0774833 | 2020-12-17 | ||
The FW Family Trust | 13014080 | 475356 | -0.0610201 | -0.0610201 | -0.0396857 | 2021-04-09 | Aggressive |
FW Liquid Fund LP | 13396796 | 5.28995e+07 | -0.0414977 | -0.0414977 | -0.0414974 | 2021-12-30 | Aggressive |
FW Holdings No. 2 LLC | 8413655 | 6.76894e+06 | -0.00773093 | -0.00773093 | 0.118448 | 2021-03-05 | N/A |
FW and FR Joint | 9957007 | -1.44 | nan | nan | nan | 2021-12-21 | N/A |