Pandas 枢轴 table 与来自两个独立数据帧的聚合
Pandas pivot table with aggregates from two separate dataframes
我正在尝试构建一个 pivot_table
,其中包含两个数据帧(df1
和 df2
)的值差异以及聚合平均值和标准差。
包含df1 =
gender year statistics s_values
male 1999 cigarette use 100
male 1999 cellphone use 310
male 1999 internet use 101
male 1999 alcohol use 100
包含 df2 =
gender year statistics s_values
female 1999 cigarette use 156
female 1999 cellphone use 198
female 1999 internet use 232
female 1999 alcohol use 243
期望的输出是my_pivot_table =
year statistics male female difference mean std
1999 cigarette use 100.0 156.0 56.0 128.0 28.0
1999 cellphone use 310.0 198.0 112.0 254.0 56.0
1999 internet use 101.0 232.0 78.0 166.5 65.5
1999 alcohol use 100.0 243.0 143.0 171.5 50.0
我将 df1
和 df2
合并到一个名为 merged_df
的数据框中,我不确定这是否是构建所需枢轴 table 的正确步骤。 . 我不知道从这里去哪里 male
列包含 df1
中的 s_values
女性列包含 df2
中的 s_values
包含 merged_df =
gender year statistics s_values
male 1999 cigarette use 100
male 1999 cellphone use 310
male 1999 internet use 101
male 1999 alcohol use 100
female 1999 cigarette use 156
female 1999 cellphone use 198
female 1999 internet use 232
female 1999 alcohol use 243
尝试:
x = pd.concat([df1, df2]).pivot(
index=["year", "statistics"], columns="gender", values="s_values"
)
x["difference"] = (x["male"] - x["female"]).abs()
x["mean"] = x[["male", "female"]].mean(axis=1)
x["std"] = x[["female", "male"]].std(axis=1, ddof=0)
print(x.reset_index().rename_axis("", axis=1))
打印:
year statistics female male difference mean std
0 1999 alcohol use 243 100 143 171.5 71.5
1 1999 cellphone use 198 310 112 254.0 56.0
2 1999 cigarette use 156 100 56 128.0 28.0
3 1999 internet use 232 101 131 166.5 65.5
我正在尝试构建一个 pivot_table
,其中包含两个数据帧(df1
和 df2
)的值差异以及聚合平均值和标准差。
包含df1 =
gender year statistics s_values
male 1999 cigarette use 100
male 1999 cellphone use 310
male 1999 internet use 101
male 1999 alcohol use 100
包含 df2 =
gender year statistics s_values
female 1999 cigarette use 156
female 1999 cellphone use 198
female 1999 internet use 232
female 1999 alcohol use 243
期望的输出是my_pivot_table =
year statistics male female difference mean std
1999 cigarette use 100.0 156.0 56.0 128.0 28.0
1999 cellphone use 310.0 198.0 112.0 254.0 56.0
1999 internet use 101.0 232.0 78.0 166.5 65.5
1999 alcohol use 100.0 243.0 143.0 171.5 50.0
我将 df1
和 df2
合并到一个名为 merged_df
的数据框中,我不确定这是否是构建所需枢轴 table 的正确步骤。 . 我不知道从这里去哪里 male
列包含 df1
中的 s_values
女性列包含 df2
中的 s_values
包含 merged_df =
gender year statistics s_values
male 1999 cigarette use 100
male 1999 cellphone use 310
male 1999 internet use 101
male 1999 alcohol use 100
female 1999 cigarette use 156
female 1999 cellphone use 198
female 1999 internet use 232
female 1999 alcohol use 243
尝试:
x = pd.concat([df1, df2]).pivot(
index=["year", "statistics"], columns="gender", values="s_values"
)
x["difference"] = (x["male"] - x["female"]).abs()
x["mean"] = x[["male", "female"]].mean(axis=1)
x["std"] = x[["female", "male"]].std(axis=1, ddof=0)
print(x.reset_index().rename_axis("", axis=1))
打印:
year statistics female male difference mean std
0 1999 alcohol use 243 100 143 171.5 71.5
1 1999 cellphone use 198 310 112 254.0 56.0
2 1999 cigarette use 156 100 56 128.0 28.0
3 1999 internet use 232 101 131 166.5 65.5