pandas 0.20:如何对具有多级索引的列进行自定义排序?
pandas 0.20: How do I do a custom sort of the columns with multi-level indexes?
pandas 的新手。使用 pandas 0.20,因此没有 CategoricalDtype。我想在将几个 df 与 concat 合并后对列进行自定义排序。合并后的 df 将具有多级索引列。
使用分类,它不适用于自定义排序。
dfs=[table2, table3,table4]
L_list=['A', 'B', 'C']
test=pd.Categorical(L_list, categories=['B', 'C','A'], ordered=True)
merged_df = pd.concat(
dfs, axis=1,
keys=pd.MultiIndex.from_arrays([
test, # Top Level Keys
['Cat1', 'Cat2', 'Cat1'] # Second Level Keys
], names=['Importance', 'Category'])
)
output=merged_df.sort_index(axis=1, level=[0])
当前状态
**Merged_df**
Importance| A | B | C |
Category | Cat1 | Cat2 | Cat1 |
|Total Assets| AUMs | Revenue |
Firm 1 | 100 | 300 | 300 |
Firm 2 | 200 | 3400 | 200 |
Firm 3 | 300 | 800 | 400 |
Firm 4 | NaN | 800 | 350 |
期望状态
**Merged_df**
Importance| B | C | A |
Category | Cat2 | Cat1 | Cat1 |
|AUMs | Revenue | Total Assets |
Firm 1 | 300 | 300 | 100 |
Firm 2 | 3400 | 200 | 200 |
Firm 3 | 800 | 400 | 300 |
Firm 4 | 800 | 350 | NaN |
不确定 0.20 的所有可能性,但一个想法是将多索引列转换为框架,将每个级别更改为分类数据(就像您在问题中对测试所做的那样),然后 sort_values数据框,保持列的索引按照您想要重新排列 merged_df 列的顺序排序。看这个例子:
# simple example
dfs=[
pd.DataFrame({'a':[0,1]}, index=[0,1]),
pd.DataFrame({'b':[0,1]}, index=[0,1]),
pd.DataFrame({'c':[0,1]}, index=[0,1]),
]
L_list=['A', 'B', 'B'] # changed C to have 2 B with 2 Cat
merged_df = pd.concat(
dfs, axis=1,
keys=pd.MultiIndex.from_arrays([
test, # Top Level Keys
['Cat1', 'Cat1', 'Cat2'] # Second Level Keys
], names=['Importance', 'Category'])
)
print(merged_df)
# A B
# Cat1 Cat1 Cat2
# a b c
# 0 0 0 0
# 1 1 1 1
所以你可以做到
# create dataframes of columns names
col_df = merged_df.columns.to_frame()
# change to catagorical each level you want
col_df[0] = pd.Categorical(col_df[0], categories=['B', 'C','A'], ordered=True)
col_df[1] = pd.Categorical(col_df[1], categories=['Cat2', 'Cat1'], ordered=True)
# sort values and get the index
print(col_df.sort_values(by=col_df.columns.tolist()).index)
# MultiIndex([('B', 'Cat2', 'c'), # B is before A and Cat2 before Cat1
# ('B', 'Cat1', 'b'),
# ('A', 'Cat1', 'a')],
# )
output = merged_df[col_df.sort_values(by=col_df.columns.tolist()).index]
print(output)
# B A
# Cat2 Cat1 Cat1
# c b a
# 0 0 0 0
# 1 1 1 1
pandas 的新手。使用 pandas 0.20,因此没有 CategoricalDtype。我想在将几个 df 与 concat 合并后对列进行自定义排序。合并后的 df 将具有多级索引列。
使用分类,它不适用于自定义排序。
dfs=[table2, table3,table4]
L_list=['A', 'B', 'C']
test=pd.Categorical(L_list, categories=['B', 'C','A'], ordered=True)
merged_df = pd.concat(
dfs, axis=1,
keys=pd.MultiIndex.from_arrays([
test, # Top Level Keys
['Cat1', 'Cat2', 'Cat1'] # Second Level Keys
], names=['Importance', 'Category'])
)
output=merged_df.sort_index(axis=1, level=[0])
当前状态
**Merged_df**
Importance| A | B | C |
Category | Cat1 | Cat2 | Cat1 |
|Total Assets| AUMs | Revenue |
Firm 1 | 100 | 300 | 300 |
Firm 2 | 200 | 3400 | 200 |
Firm 3 | 300 | 800 | 400 |
Firm 4 | NaN | 800 | 350 |
期望状态
**Merged_df**
Importance| B | C | A |
Category | Cat2 | Cat1 | Cat1 |
|AUMs | Revenue | Total Assets |
Firm 1 | 300 | 300 | 100 |
Firm 2 | 3400 | 200 | 200 |
Firm 3 | 800 | 400 | 300 |
Firm 4 | 800 | 350 | NaN |
不确定 0.20 的所有可能性,但一个想法是将多索引列转换为框架,将每个级别更改为分类数据(就像您在问题中对测试所做的那样),然后 sort_values数据框,保持列的索引按照您想要重新排列 merged_df 列的顺序排序。看这个例子:
# simple example
dfs=[
pd.DataFrame({'a':[0,1]}, index=[0,1]),
pd.DataFrame({'b':[0,1]}, index=[0,1]),
pd.DataFrame({'c':[0,1]}, index=[0,1]),
]
L_list=['A', 'B', 'B'] # changed C to have 2 B with 2 Cat
merged_df = pd.concat(
dfs, axis=1,
keys=pd.MultiIndex.from_arrays([
test, # Top Level Keys
['Cat1', 'Cat1', 'Cat2'] # Second Level Keys
], names=['Importance', 'Category'])
)
print(merged_df)
# A B
# Cat1 Cat1 Cat2
# a b c
# 0 0 0 0
# 1 1 1 1
所以你可以做到
# create dataframes of columns names
col_df = merged_df.columns.to_frame()
# change to catagorical each level you want
col_df[0] = pd.Categorical(col_df[0], categories=['B', 'C','A'], ordered=True)
col_df[1] = pd.Categorical(col_df[1], categories=['Cat2', 'Cat1'], ordered=True)
# sort values and get the index
print(col_df.sort_values(by=col_df.columns.tolist()).index)
# MultiIndex([('B', 'Cat2', 'c'), # B is before A and Cat2 before Cat1
# ('B', 'Cat1', 'b'),
# ('A', 'Cat1', 'a')],
# )
output = merged_df[col_df.sort_values(by=col_df.columns.tolist()).index]
print(output)
# B A
# Cat2 Cat1 Cat1
# c b a
# 0 0 0 0
# 1 1 1 1