将嵌套字典结构转换为 Pandas 数据框?
Converting a nested dictionary structure to a Pandas dataframe?
我有一个嵌套字典结构列表,如下所示:
{'1278.1':
{'Time Distribution': 'Exponential',
'Time Distribution Parameters': {'Equivalent Lambda': 950.486, 'Average Packet Lambda': 0.950486, 'Exponential Max Factor': 10.0},
'Size Distribution': 'Binomial', x
'Size Distribution Parameters': {'Average Packet Size': 1000.0, 'Packet Size 1': 300.0, 'Packet Size 2': 1700.0}}}
第一个数值(此处显示为 '1278.1'
)称为 max avg lambda
值。我想创建一个数据框,其格式为以下列:
Max Avg Lamba
Time Distribution
Equivalent Lambda
Average Packet Lambda
... Size Distribution
... Packet Size 2
这怎么可能?此外,我正在处理的数据并不总是具有相同的 Time Distribution Parameters
或 Size Distribution Parameters
。例如,有时可能会有 Packet Size 3
,但并非总是如此。当 Packet Size 3
之类的东西不存在时,如何创建其中一些值为空的数据框?
这可能已经有了答案here
上面link的回答说你可以直接输入字典到pd.DataFrame
函数,它会吐出输入字典的数据框。
下面的代码应该正确地格式化上面的字典并将其更改为允许 DataFrame 方法正确读取它的格式。
import copy
import pandas as pd
d = {
"1278.1": {"Time Distribution": "Exponential",
"Time Distribution Parameters": {"Equivalent Lambda": 950.486, "Average Packet Lambda": 0.950486, "Exponential Max Factor": 10.0
},
"Size Distribution": "Binomial",
"Size Distribution Parameters": {"Average Packet Size": 1000.0, "Packet Size 1": 300.0, "Packet Size 2": 1700.0
}
}
}
# Convert to list to get keys(max avg lambdas)
max_avg_lambdas = list(d)
list_of_dicts = []
# If there are more than 1 keys iterate and create new dict
for max_avg_lambda in max_avg_lambdas:
# Create new key/value pair of the max avg lambda inside of Time dist parameters
d[max_avg_lambda]["Time Distribution Parameters"]["Max Avg Lambda"] = max_avg_lambda
# Create a new dict with contents of max_avg_lambda key dict
fixed_dict = copy.deepcopy(d[max_avg_lambda])
# Append dict to a list of dicts
list_of_dicts.append(fixed_dict)
for info_dict in list_of_dicts:
df = pd.DataFrame(info_dict)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df)
print(fixed_dict)
输出字典
{
"Time Distribution": "Exponential",
"Time Distribution Parameters": {
"Max Avg Lambda": "1278.1",
"Equivalent Lambda": 950.486,
"Average Packet Lambda": 0.950486,
"Exponential Max Factor": 10.0
},
"Size Distribution": "Binomial",
"Size Distribution Parameters": {
"Average Packet Size": 1000.0,
"Packet Size 1": 300.0,
"Packet Size 2": 1700.0
}
}
输出:
Time Distribution Time Distribution Parameters \
Equivalent Lambda Exponential 950.486
Average Packet Lambda Exponential 0.950486
Exponential Max Factor Exponential 10.0
Max Avg Lambda Exponential 1278.1
Average Packet Size Exponential NaN
Packet Size 1 Exponential NaN
Packet Size 2 Exponential NaN
Size Distribution Size Distribution Parameters
Equivalent Lambda Binomial NaN
Average Packet Lambda Binomial NaN
Exponential Max Factor Binomial NaN
Max Avg Lambda Binomial NaN
Average Packet Size Binomial 1000.0
Packet Size 1 Binomial 300.0
Packet Size 2 Binomial 1700.0
pd.json_normalize() 允许将嵌套数据展平为 pandas 列。如果 Packet Size 3
在某些行中可用但在其他行中不可用,则缺失值将表示为 np.nan
。可能的工作流程是:
import pandas as pd
data = {'1278.1': {'Time Distribution': 'Exponential', 'Time Distribution Parameters': {'Equivalent Lambda': 950.486, 'Average Packet Lambda': 0.950486, 'Exponential Max Factor': 10.0}, 'Size Distribution': 'Binomial', 'Size Distribution Parameters': {'Average Packet Size': 1000.0, 'Packet Size 1': 300.0, 'Packet Size 2': 1700.0}}}
#read dataframe with Max Avg Lamba as index, then reset index to column
df = pd.DataFrame.from_dict(data,orient='index').reset_index().rename(columns={'index': 'Max Avg Lamba'})
#flatten Time Distribution Parameters and Size Distribution Parameters, join with dataframe
df = df.join(pd.json_normalize(df['Time Distribution Parameters']))
df = df.join(pd.json_normalize(df['Size Distribution Parameters']))
#remove redundant columns
df = df.drop(columns=['Time Distribution Parameters', 'Size Distribution Parameters'])
输出:
Max Avg Lamba
Time Distribution
Size Distribution
Equivalent Lambda
Average Packet Lambda
Exponential Max Factor
Average Packet Size
Packet Size 1
Packet Size 2
0
1278.1
Exponential
Binomial
950.486
0.950486
10
1000
300
1700
我有一个嵌套字典结构列表,如下所示:
{'1278.1':
{'Time Distribution': 'Exponential',
'Time Distribution Parameters': {'Equivalent Lambda': 950.486, 'Average Packet Lambda': 0.950486, 'Exponential Max Factor': 10.0},
'Size Distribution': 'Binomial', x
'Size Distribution Parameters': {'Average Packet Size': 1000.0, 'Packet Size 1': 300.0, 'Packet Size 2': 1700.0}}}
第一个数值(此处显示为 '1278.1'
)称为 max avg lambda
值。我想创建一个数据框,其格式为以下列:
Max Avg Lamba
Time Distribution
Equivalent Lambda
Average Packet Lambda
... Size Distribution
... Packet Size 2
这怎么可能?此外,我正在处理的数据并不总是具有相同的 Time Distribution Parameters
或 Size Distribution Parameters
。例如,有时可能会有 Packet Size 3
,但并非总是如此。当 Packet Size 3
之类的东西不存在时,如何创建其中一些值为空的数据框?
这可能已经有了答案here
上面link的回答说你可以直接输入字典到pd.DataFrame
函数,它会吐出输入字典的数据框。
下面的代码应该正确地格式化上面的字典并将其更改为允许 DataFrame 方法正确读取它的格式。
import copy
import pandas as pd
d = {
"1278.1": {"Time Distribution": "Exponential",
"Time Distribution Parameters": {"Equivalent Lambda": 950.486, "Average Packet Lambda": 0.950486, "Exponential Max Factor": 10.0
},
"Size Distribution": "Binomial",
"Size Distribution Parameters": {"Average Packet Size": 1000.0, "Packet Size 1": 300.0, "Packet Size 2": 1700.0
}
}
}
# Convert to list to get keys(max avg lambdas)
max_avg_lambdas = list(d)
list_of_dicts = []
# If there are more than 1 keys iterate and create new dict
for max_avg_lambda in max_avg_lambdas:
# Create new key/value pair of the max avg lambda inside of Time dist parameters
d[max_avg_lambda]["Time Distribution Parameters"]["Max Avg Lambda"] = max_avg_lambda
# Create a new dict with contents of max_avg_lambda key dict
fixed_dict = copy.deepcopy(d[max_avg_lambda])
# Append dict to a list of dicts
list_of_dicts.append(fixed_dict)
for info_dict in list_of_dicts:
df = pd.DataFrame(info_dict)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df)
print(fixed_dict)
输出字典
{
"Time Distribution": "Exponential",
"Time Distribution Parameters": {
"Max Avg Lambda": "1278.1",
"Equivalent Lambda": 950.486,
"Average Packet Lambda": 0.950486,
"Exponential Max Factor": 10.0
},
"Size Distribution": "Binomial",
"Size Distribution Parameters": {
"Average Packet Size": 1000.0,
"Packet Size 1": 300.0,
"Packet Size 2": 1700.0
}
}
输出:
Time Distribution Time Distribution Parameters \
Equivalent Lambda Exponential 950.486
Average Packet Lambda Exponential 0.950486
Exponential Max Factor Exponential 10.0
Max Avg Lambda Exponential 1278.1
Average Packet Size Exponential NaN
Packet Size 1 Exponential NaN
Packet Size 2 Exponential NaN
Size Distribution Size Distribution Parameters
Equivalent Lambda Binomial NaN
Average Packet Lambda Binomial NaN
Exponential Max Factor Binomial NaN
Max Avg Lambda Binomial NaN
Average Packet Size Binomial 1000.0
Packet Size 1 Binomial 300.0
Packet Size 2 Binomial 1700.0
pd.json_normalize() 允许将嵌套数据展平为 pandas 列。如果 Packet Size 3
在某些行中可用但在其他行中不可用,则缺失值将表示为 np.nan
。可能的工作流程是:
import pandas as pd
data = {'1278.1': {'Time Distribution': 'Exponential', 'Time Distribution Parameters': {'Equivalent Lambda': 950.486, 'Average Packet Lambda': 0.950486, 'Exponential Max Factor': 10.0}, 'Size Distribution': 'Binomial', 'Size Distribution Parameters': {'Average Packet Size': 1000.0, 'Packet Size 1': 300.0, 'Packet Size 2': 1700.0}}}
#read dataframe with Max Avg Lamba as index, then reset index to column
df = pd.DataFrame.from_dict(data,orient='index').reset_index().rename(columns={'index': 'Max Avg Lamba'})
#flatten Time Distribution Parameters and Size Distribution Parameters, join with dataframe
df = df.join(pd.json_normalize(df['Time Distribution Parameters']))
df = df.join(pd.json_normalize(df['Size Distribution Parameters']))
#remove redundant columns
df = df.drop(columns=['Time Distribution Parameters', 'Size Distribution Parameters'])
输出:
Max Avg Lamba | Time Distribution | Size Distribution | Equivalent Lambda | Average Packet Lambda | Exponential Max Factor | Average Packet Size | Packet Size 1 | Packet Size 2 | |
---|---|---|---|---|---|---|---|---|---|
0 | 1278.1 | Exponential | Binomial | 950.486 | 0.950486 | 10 | 1000 | 300 | 1700 |