将摘要行添加到 Pandas
Add summary row to Pandas
我正在尝试总结 总 降雨量,但只需添加 平均 温度:
data = [{'year':2020,'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
{'year':2021,'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
{'year':2019,'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
{'year':2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
{'year':2021,'area': 'cape-town', 'rainfall': 80, 'temperature': 23},
{'year':2019,'area': 'cape-town', 'rainfall': 75, 'temperature': 24},
{'year':2019, 'area': 'mumbai', 'rainfall': 200, 'temperature': 37 },
{'year':2020, 'area': 'mumbai', 'rainfall': 170, 'temperature': 39 },
{'year':2021, 'area': 'mumbai', 'rainfall': 180, 'temperature': 38 },
]
这行得通,但我还需要显示平均温度,但我不知道如何添加它并将其保留在 相同 摘要行中。这只是一个例子,但我需要在真实世界的项目中使用相同的安排。
df = pd.DataFrame.from_dict(data)
container = []
for label, _df in df.groupby(['area']):
_df.loc['summary'] = _df[['rainfall']].sum() # <-How do I add 2nd column that's not another 'sum'
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
我需要的示例图像(我已经填写了绿色值以显示我需要代码执行的操作)。
谢谢。
我的代码在 GitHub 作为 jupyter 笔记本,如果你想使用的话。
Pandas Summary Jupyter Notebook
你可以试试这个:
import pandas as pd
data = [{'year':2020,'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
{'year':2021,'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
{'year':2019,'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
{'year':2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
{'year':2021,'area': 'cape-town', 'rainfall': 80, 'temperature': 23},
{'year':2019,'area': 'cape-town', 'rainfall': 75, 'temperature': 24},
{'year':2019, 'area': 'mumbai', 'rainfall': 200, 'temperature': 37},
{'year':2020, 'area': 'mumbai', 'rainfall': 170, 'temperature': 39},
{'year':2021, 'area': 'mumbai', 'rainfall': 180, 'temperature': 38 }]
df = pd.DataFrame.from_dict(data)
container = []
for label, _df in df.groupby(['area']):
_df.loc['summary'] = _df.agg({'rainfall': 'sum', 'temperature': 'mean'})
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
df
输出:
编辑
根据后续请求将温度平均值替换为常量,这里是修改后的代码:
import pandas as pd
data = [{'year': 2020, 'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
{'year': 2021, 'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
{'year': 2019, 'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
{'year': 2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
{'year': 2021, 'area': 'cape-town', 'rainfall': 80, 'temperature': 23},
{'year': 2019, 'area': 'cape-town', 'rainfall': 75, 'temperature': 24},
{'year': 2019, 'area': 'mumbai', 'rainfall': 200, 'temperature': 37},
{'year': 2020, 'area': 'mumbai', 'rainfall': 170, 'temperature': 39},
{'year': 2021, 'area': 'mumbai', 'rainfall': 180, 'temperature': 38}]
my_constants = [10, 20, 30]
def map_constant(x, v):
x.mean()
return v
df = pd.DataFrame.from_dict(data)
container = []
for i, group in enumerate(df.groupby(['area'])):
label, _df = group
_df.loc['summary'] = _df.agg({'rainfall': 'sum', 'temperature': (lambda x: map_constant(x, my_constants[i]))})
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
df
输出:
我正在尝试总结 总 降雨量,但只需添加 平均 温度:
data = [{'year':2020,'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
{'year':2021,'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
{'year':2019,'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
{'year':2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
{'year':2021,'area': 'cape-town', 'rainfall': 80, 'temperature': 23},
{'year':2019,'area': 'cape-town', 'rainfall': 75, 'temperature': 24},
{'year':2019, 'area': 'mumbai', 'rainfall': 200, 'temperature': 37 },
{'year':2020, 'area': 'mumbai', 'rainfall': 170, 'temperature': 39 },
{'year':2021, 'area': 'mumbai', 'rainfall': 180, 'temperature': 38 },
]
这行得通,但我还需要显示平均温度,但我不知道如何添加它并将其保留在 相同 摘要行中。这只是一个例子,但我需要在真实世界的项目中使用相同的安排。
df = pd.DataFrame.from_dict(data)
container = []
for label, _df in df.groupby(['area']):
_df.loc['summary'] = _df[['rainfall']].sum() # <-How do I add 2nd column that's not another 'sum'
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
我需要的示例图像(我已经填写了绿色值以显示我需要代码执行的操作)。
谢谢。
我的代码在 GitHub 作为 jupyter 笔记本,如果你想使用的话。 Pandas Summary Jupyter Notebook
你可以试试这个:
import pandas as pd
data = [{'year':2020,'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
{'year':2021,'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
{'year':2019,'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
{'year':2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
{'year':2021,'area': 'cape-town', 'rainfall': 80, 'temperature': 23},
{'year':2019,'area': 'cape-town', 'rainfall': 75, 'temperature': 24},
{'year':2019, 'area': 'mumbai', 'rainfall': 200, 'temperature': 37},
{'year':2020, 'area': 'mumbai', 'rainfall': 170, 'temperature': 39},
{'year':2021, 'area': 'mumbai', 'rainfall': 180, 'temperature': 38 }]
df = pd.DataFrame.from_dict(data)
container = []
for label, _df in df.groupby(['area']):
_df.loc['summary'] = _df.agg({'rainfall': 'sum', 'temperature': 'mean'})
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
df
输出:
编辑
根据后续请求将温度平均值替换为常量,这里是修改后的代码:
import pandas as pd
data = [{'year': 2020, 'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
{'year': 2021, 'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
{'year': 2019, 'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
{'year': 2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
{'year': 2021, 'area': 'cape-town', 'rainfall': 80, 'temperature': 23},
{'year': 2019, 'area': 'cape-town', 'rainfall': 75, 'temperature': 24},
{'year': 2019, 'area': 'mumbai', 'rainfall': 200, 'temperature': 37},
{'year': 2020, 'area': 'mumbai', 'rainfall': 170, 'temperature': 39},
{'year': 2021, 'area': 'mumbai', 'rainfall': 180, 'temperature': 38}]
my_constants = [10, 20, 30]
def map_constant(x, v):
x.mean()
return v
df = pd.DataFrame.from_dict(data)
container = []
for i, group in enumerate(df.groupby(['area'])):
label, _df = group
_df.loc['summary'] = _df.agg({'rainfall': 'sum', 'temperature': (lambda x: map_constant(x, my_constants[i]))})
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
df
输出: