如何将 pandas 多索引列数据框与单个索引数据框连接起来
How to concatenate a pandas multi index column dataframe with a single index dataframe
我有以下代码,我试图通过聚合在数据透视表 table 上执行分组,并将生成的聚合连接回数据透视表 table 数据帧。但我在加入 table 个不同级别时遇到问题。
import pandas as pd
data = [
["alice", "school 1", "math", 95],
["alice", "school 1", "science", 87],
["charlie", "school 1", "math", 72],
["charlie", "school 1", "science", 63],
["bob", "school 2", "math", 92],
["bob", "school 2", "science", 68],
["dale", "school 2", "math", 56],
["dale", "school 2", "science", 78],
]
df = pd.DataFrame(data, columns =["student_name", "school", "class", "class score"])
pvt = pd.pivot_table(df, index=["class"], columns=["school", "student_name"])
print(pvt)
print()
aggregate_sum = pvt.groupby(level=1, axis=1).sum()
print(aggregate_sum)
枢轴Table输出:
class score
school school 1 school 2
student_name alice charlie bob dale
class
math 95 72 92 56
science 87 63 68 78
总输出:
school school 1 school 2
class
math 167 148
science 150 146
如何将聚合输出连接到与学生姓名相同级别的主元table?
预期输出:
class score
school school 1 school 2
student_name alice charlie sum bob dale sum
class
math 95 72 167 92 56 148
science 87 63 150 68 78 176
与 merge
合并并更新 multi-column 名称,然后使用 pd.MultiIndex.from_tuples()
创建 multi-column 以更新合并后的 multi-column.
final = pvt.merge(aggregate_sum, on='class', how='inner')
final = final.rename(columns={'school 1':('class score','school 1','sum'), 'school 2':('class score','school 2','sum')})
cols = final.columns
index = pd.MultiIndex.from_tuples(cols)
final.columns = index
final = (final[[('class score','school 1','alice'),('class score', 'school 1', 'charlie'),
('class score','school 1','sum'),('class score', 'school 2','bob'),
('class score', 'school 2','dale'),('class score', 'school 2','sum')]])
final
class score
school 1 school 2
alice charlie sum bob dale sum
class
math 95 72 167 92 56 148
science 87 63 150 68 78 146
我有以下代码,我试图通过聚合在数据透视表 table 上执行分组,并将生成的聚合连接回数据透视表 table 数据帧。但我在加入 table 个不同级别时遇到问题。
import pandas as pd
data = [
["alice", "school 1", "math", 95],
["alice", "school 1", "science", 87],
["charlie", "school 1", "math", 72],
["charlie", "school 1", "science", 63],
["bob", "school 2", "math", 92],
["bob", "school 2", "science", 68],
["dale", "school 2", "math", 56],
["dale", "school 2", "science", 78],
]
df = pd.DataFrame(data, columns =["student_name", "school", "class", "class score"])
pvt = pd.pivot_table(df, index=["class"], columns=["school", "student_name"])
print(pvt)
print()
aggregate_sum = pvt.groupby(level=1, axis=1).sum()
print(aggregate_sum)
枢轴Table输出:
class score
school school 1 school 2
student_name alice charlie bob dale
class
math 95 72 92 56
science 87 63 68 78
总输出:
school school 1 school 2
class
math 167 148
science 150 146
如何将聚合输出连接到与学生姓名相同级别的主元table?
预期输出:
class score
school school 1 school 2
student_name alice charlie sum bob dale sum
class
math 95 72 167 92 56 148
science 87 63 150 68 78 176
与 merge
合并并更新 multi-column 名称,然后使用 pd.MultiIndex.from_tuples()
创建 multi-column 以更新合并后的 multi-column.
final = pvt.merge(aggregate_sum, on='class', how='inner')
final = final.rename(columns={'school 1':('class score','school 1','sum'), 'school 2':('class score','school 2','sum')})
cols = final.columns
index = pd.MultiIndex.from_tuples(cols)
final.columns = index
final = (final[[('class score','school 1','alice'),('class score', 'school 1', 'charlie'),
('class score','school 1','sum'),('class score', 'school 2','bob'),
('class score', 'school 2','dale'),('class score', 'school 2','sum')]])
final
class score
school 1 school 2
alice charlie sum bob dale sum
class
math 95 72 167 92 56 148
science 87 63 150 68 78 146