如何在复杂数组字典数据列表中组合和处理数据字典
How to combine and handle data dictionary in list of complex array dict data
我有一个大问题。这是我的 data.The 数据结构,看起来像这样。包括metric_name、date_time和metric_name中date_time的数据。
{
"metric_name 1":{
"date_time 1": [{data server 1}, ... ,{data server n}],
"date_time 2": [{data server 1}, ... ,{data server n}],
...
},
"metric_name 2":{
"date_time 1": [{data server 1}, ... ,{data server n}],
"date_time 2": [{data server 1}, ... ,{data server n}],
...
},
...
}
- 下面的数据详情,我有3台服务器,2天就拿到了数据[2022-03-25, 2022-03-26]
data = {
"cpu": {
"2022-03-25": [
{
"cpu_usage": 0.2, "name": "server01", "timestamp": "2022-03-25"
},
{
"cpu_usage": 0.3, "name": "server02", "timestamp": "2022-03-25"
},
{
"cpu_usage": 0.25, "name": "server03", "timestamp": "2022-03-25"
},
],
"2022-03-26": [
{
"cpu_usage": 0.15, "name": "server01", "timestamp": "2022-03-26"
},
{
"cpu_usage": 0.2, "name": "server02", "timestamp": "2022-03-26"
},
{
"cpu_usage": 0.15, "name": "server03", "timestamp": "2022-03-26"
},
],
},
"ram": {
"2022-03-25": [
{
"ram_usage": 0.4, "name": "server01", "timestamp": "2022-03-25"
},
{
"ram_usage": 0.5, "name": "server02", "timestamp": "2022-03-25"
},
{
"ram_usage": 0.5, "name": "server03", "timestamp": "2022-03-25"
},
],
"2022-03-26": [
{
"ram_usage": 0.7, "name": "server01", "timestamp": "2022-03-26"
},
{
"ram_usage": 0.6, "name": "server02", "timestamp": "2022-03-26"
},
{
"ram_usage": 0.5, "name": "server03", "timestamp": "2022-03-26"
},
],
}
}
我将 for 循环此数据并比较 metric_name 的每个 date_time。每一个date_time都是列表数据,包括所有服务器的数据。
我想合并并平均每个 date_time 的数据。
示例:date_time metric_name 的“2022-03-25”是 cpu。我将平均 3 个服务器的数据 cpu_usage 并合并数据。此外,我删除键和值“name”
cpu_usage = (0.2+0.3+0.25)/3 = 0.25
看起来像
"cpu": {
"2022-03-25": [
{
"cpu_usage": 0.25, "timestamp": "2022-03-25"
}
],
想要的结果
output = {
"cpu": {
"2022-03-25": [
{
"cpu_usage": 0.25, "timestamp": "2022-03-25"
}
],
"2022-03-26": [
{
"cpu_usage": 0.166, "timestamp": "2022-03-26"
}
],
},
"ram": {
"2022-03-25": [
{
"ram_usage": 0.46, "timestamp": "2022-03-25"
}
],
"2022-03-26": [
{
"ram_usage": 0.6, "timestamp": "2022-03-26"
}
],
}
}
注:我用的是python3.9
希望有人能帮助我。非常感谢muck
备注:平均内存使用量格式化为小数点后3位,是一个字符串。如果将 f'{stat:0.3f}'
替换为 round(stat, 3)
,这将是一个浮点数。
def prittify(data):
new_data = {}
for memory_type, d in data.items():
new_data[memory_type] = {}
for date, d_list in d.items():
stat = sum(i[f'{memory_type}_usage'] for i in d_list) / len(l)
new_data[memory_type][date] = {f'{memory_type}_usage': f'{stat:0.3f}', 'timestamp': date}
return new_data
d = prittify(data)
print(d)
输出
{'cpu': {'2022-03-25': {'cpu_usage': '0.250', 'timestamp': '2022-03-25'}, '2022-03-26': {'cpu_usage': '0.167', 'timestamp': '2022-03-26'}}, 'ram': {'2022-03-25': {'ram_usage': '0.467', 'timestamp': '2022-03-25'}, '2022-03-26': {'ram_usage': '0.600', 'timestamp': '2022-03-26'}}}
我有一个大问题。这是我的 data.The 数据结构,看起来像这样。包括metric_name、date_time和metric_name中date_time的数据。
{
"metric_name 1":{
"date_time 1": [{data server 1}, ... ,{data server n}],
"date_time 2": [{data server 1}, ... ,{data server n}],
...
},
"metric_name 2":{
"date_time 1": [{data server 1}, ... ,{data server n}],
"date_time 2": [{data server 1}, ... ,{data server n}],
...
},
...
}
- 下面的数据详情,我有3台服务器,2天就拿到了数据[2022-03-25, 2022-03-26]
data = {
"cpu": {
"2022-03-25": [
{
"cpu_usage": 0.2, "name": "server01", "timestamp": "2022-03-25"
},
{
"cpu_usage": 0.3, "name": "server02", "timestamp": "2022-03-25"
},
{
"cpu_usage": 0.25, "name": "server03", "timestamp": "2022-03-25"
},
],
"2022-03-26": [
{
"cpu_usage": 0.15, "name": "server01", "timestamp": "2022-03-26"
},
{
"cpu_usage": 0.2, "name": "server02", "timestamp": "2022-03-26"
},
{
"cpu_usage": 0.15, "name": "server03", "timestamp": "2022-03-26"
},
],
},
"ram": {
"2022-03-25": [
{
"ram_usage": 0.4, "name": "server01", "timestamp": "2022-03-25"
},
{
"ram_usage": 0.5, "name": "server02", "timestamp": "2022-03-25"
},
{
"ram_usage": 0.5, "name": "server03", "timestamp": "2022-03-25"
},
],
"2022-03-26": [
{
"ram_usage": 0.7, "name": "server01", "timestamp": "2022-03-26"
},
{
"ram_usage": 0.6, "name": "server02", "timestamp": "2022-03-26"
},
{
"ram_usage": 0.5, "name": "server03", "timestamp": "2022-03-26"
},
],
}
}
我将 for 循环此数据并比较 metric_name 的每个 date_time。每一个date_time都是列表数据,包括所有服务器的数据。
我想合并并平均每个 date_time 的数据。
示例:date_time metric_name 的“2022-03-25”是 cpu。我将平均 3 个服务器的数据 cpu_usage 并合并数据。此外,我删除键和值“name”
cpu_usage = (0.2+0.3+0.25)/3 = 0.25
看起来像
"cpu": {
"2022-03-25": [
{
"cpu_usage": 0.25, "timestamp": "2022-03-25"
}
],
想要的结果
output = {
"cpu": {
"2022-03-25": [
{
"cpu_usage": 0.25, "timestamp": "2022-03-25"
}
],
"2022-03-26": [
{
"cpu_usage": 0.166, "timestamp": "2022-03-26"
}
],
},
"ram": {
"2022-03-25": [
{
"ram_usage": 0.46, "timestamp": "2022-03-25"
}
],
"2022-03-26": [
{
"ram_usage": 0.6, "timestamp": "2022-03-26"
}
],
}
}
注:我用的是python3.9
希望有人能帮助我。非常感谢muck
备注:平均内存使用量格式化为小数点后3位,是一个字符串。如果将 f'{stat:0.3f}'
替换为 round(stat, 3)
,这将是一个浮点数。
def prittify(data):
new_data = {}
for memory_type, d in data.items():
new_data[memory_type] = {}
for date, d_list in d.items():
stat = sum(i[f'{memory_type}_usage'] for i in d_list) / len(l)
new_data[memory_type][date] = {f'{memory_type}_usage': f'{stat:0.3f}', 'timestamp': date}
return new_data
d = prittify(data)
print(d)
输出
{'cpu': {'2022-03-25': {'cpu_usage': '0.250', 'timestamp': '2022-03-25'}, '2022-03-26': {'cpu_usage': '0.167', 'timestamp': '2022-03-26'}}, 'ram': {'2022-03-25': {'ram_usage': '0.467', 'timestamp': '2022-03-25'}, '2022-03-26': {'ram_usage': '0.600', 'timestamp': '2022-03-26'}}}