旋转 MultiIndex 数据
Pivoting MultiIndex Data
我有一个 MultiIndex pandas 数据框,如下所示:
我想要不同的区域作为行而不是分层列,即长格式而不是这种宽格式。像这样(输出不必是多索引):
如何在 Pandas 中执行此操作?
编辑:
请求的示例输入文件:
样本数据(Pandas):
cols = pd.MultiIndex(levels=[['Center_Details', '2017-18:Q2', '2017-18:Q1'],
['State', 'District', 'Center', 'Offices', 'Deposit', 'Credit']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 1, 2, 3, 4, 5, 3, 4, 5]])
data = [['JAMMU & KASHMIR', 'KUPWARA', 'Drug Mulla (CT)', '3', '500', '600', '4', '500', '600'],
['JAMMU & KASHMIR', 'LEH LADAKH', 'Chuglamsar (CT)', '3', '500', '600', '4', '500', '600'],
['PUNJAB', 'PATHANKOT', 'Mamun (CT)', '3', '500', '600', '4', '500', '600'],
['PUNJAB', 'GURDASPUR', 'TIBRI', '3', '500', '600', '4', '500', '600']]
df = pd.DataFrame(data=data, columns=cols)
一种方法可能是将 MultiIndex
展平并使用 melt
和 pivot_table
,如下所示:
# Flatten the MultiIndex columns
df.columns = [' '.join(col).strip() for col in df.columns.values]
# Save some typing
idx = ['Center_Details State', 'Center_Details District', 'Center_Details Center']
# Create a long dataframe
long = pd.melt(df, id_vars = idx)
# Split the "variable" column at the space created when flattening the MultiIndex
long['QTR'], long['item'] = zip(*long['variable'].map(lambda x: x.split(' ')))
# Reshape to wide format, keeping "QTR" as a column
out = pd.pivot_table(long, index = idx + ["QTR"], columns = 'item',
values = 'value', aggfunc = 'first').reset_index()
print(out)
item Center_Details State Center_Details District Center_Details Center \
0 JAMMU & KASHMIR KUPWARA Drug Mulla (CT)
1 JAMMU & KASHMIR KUPWARA Drug Mulla (CT)
2 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT)
3 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT)
4 PUNJAB GURDASPUR TIBRI
5 PUNJAB GURDASPUR TIBRI
6 PUNJAB PATHANKOT Mamun (CT)
7 PUNJAB PATHANKOT Mamun (CT)
item QTR Credit Deposit Offices
0 2017-18:Q1 600 500 4
1 2017-18:Q2 600 500 3
2 2017-18:Q1 600 500 4
3 2017-18:Q2 600 500 3
4 2017-18:Q1 600 500 4
5 2017-18:Q2 600 500 3
6 2017-18:Q1 600 500 4
7 2017-18:Q2 600 500 3
另一个选项可能是这样的:
long = df.set_index(['Center_Details']).stack().T.unstack()
long = pd.concat([pd.DataFrame(long.reset_index()['Center_Details'].tolist()),
long.reset_index()], axis=1)
long.columns = ['State', 'District', 'Center', 'Center_Details',
'Items', 'QTR', 'Value']
out = pd.pivot_table(long, index=['State', 'District', 'Center', 'QTR'],
columns='Items', values='Value',
aggfunc='first').reset_index()
print(out)
Items State District Center QTR Credit \
0 JAMMU & KASHMIR KUPWARA Drug Mulla (CT) 2017-18:Q1 600
1 JAMMU & KASHMIR KUPWARA Drug Mulla (CT) 2017-18:Q2 600
2 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT) 2017-18:Q1 600
3 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT) 2017-18:Q2 600
4 PUNJAB GURDASPUR TIBRI 2017-18:Q1 600
5 PUNJAB GURDASPUR TIBRI 2017-18:Q2 600
6 PUNJAB PATHANKOT Mamun (CT) 2017-18:Q1 600
7 PUNJAB PATHANKOT Mamun (CT) 2017-18:Q2 600
Items Deposit Offices
0 500 4
1 500 3
2 500 4
3 500 3
4 500 4
5 500 3
6 500 4
7 500 3
第三个选项是使用 wide_to_long
,但 wide_to_long
期望宽格式的列在开头有存根。该方法类似于第一种方法,但涉及的步骤更少。
它看起来像:
# Flatten the column names, but reverse the order of the tuples
# before flattening, and add a character to split on
df.columns = ['~'.join(col[::-1]).strip() for col in df.columns.values]
# Reshape the data, Stata-style
pd.wide_to_long(df, ['Offices', 'Deposit', 'Credit'],
i=['State~Center_Details', 'District~Center_Details', 'Center~Center_Details'],
j='Quarter', sep='~').reset_index()
您仍然需要对 "Center_Details" 列进行一些清理。
稍微修改一下@A5C1D2H2I1M1N2O1R2T1的回答,发现还是可以保留多索引结构:
idx = df[['Center_Details']].columns.values.tolist()
long = pd.melt(df, id_vars = idx)
# Renaming variable created by melt to a multiindex friendly name
long.rename(columns={'variable_0': ('Values', 'Qtr')}, inplace=True)
# Reshape to wide format, keeping Values, QTR as a hierarchical column
out = pd.pivot_table(long, index = idx + [('Values', 'Qtr')], columns = 'variable_1',
values = 'value', aggfunc = 'first')
# Creating tuples for new column names
out.columns = [('Values', col) for col in out.columns]
out = out.reset_index()
# Converting columns to multiindex
out.columns = pd.MultiIndex.from_tuples(out.columns.values)
print(out)
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| | Center_Details | | | Values | | | |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| | State | District | Center | QTR | Credit | Deposit | Offices |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 0 | JAMMU & KASHMIR | KUPWARA | Drug Mulla (CT) | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 1 | JAMMU & KASHMIR | KUPWARA | Drug Mulla (CT) | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 2 | JAMMU & KASHMIR | LEH LADAKH | Chuglamsar (CT) | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 3 | JAMMU & KASHMIR | LEH LADAKH | Chuglamsar (CT) | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 4 | PUNJAB | GURDASPUR | TIBRI | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 5 | PUNJAB | GURDASPUR | TIBRI | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 6 | PUNJAB | PATHANKOT | Mamun (CT) | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 7 | PUNJAB | PATHANKOT | Mamun (CT) | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
PS:对于丑陋的 table 格式感到抱歉,我仍然不知道如何在 SO 上创建 tables。
我有一个 MultiIndex pandas 数据框,如下所示:
我想要不同的区域作为行而不是分层列,即长格式而不是这种宽格式。像这样(输出不必是多索引):
如何在 Pandas 中执行此操作?
编辑:
请求的示例输入文件:
样本数据(Pandas):
cols = pd.MultiIndex(levels=[['Center_Details', '2017-18:Q2', '2017-18:Q1'],
['State', 'District', 'Center', 'Offices', 'Deposit', 'Credit']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 1, 2, 3, 4, 5, 3, 4, 5]])
data = [['JAMMU & KASHMIR', 'KUPWARA', 'Drug Mulla (CT)', '3', '500', '600', '4', '500', '600'],
['JAMMU & KASHMIR', 'LEH LADAKH', 'Chuglamsar (CT)', '3', '500', '600', '4', '500', '600'],
['PUNJAB', 'PATHANKOT', 'Mamun (CT)', '3', '500', '600', '4', '500', '600'],
['PUNJAB', 'GURDASPUR', 'TIBRI', '3', '500', '600', '4', '500', '600']]
df = pd.DataFrame(data=data, columns=cols)
一种方法可能是将 MultiIndex
展平并使用 melt
和 pivot_table
,如下所示:
# Flatten the MultiIndex columns
df.columns = [' '.join(col).strip() for col in df.columns.values]
# Save some typing
idx = ['Center_Details State', 'Center_Details District', 'Center_Details Center']
# Create a long dataframe
long = pd.melt(df, id_vars = idx)
# Split the "variable" column at the space created when flattening the MultiIndex
long['QTR'], long['item'] = zip(*long['variable'].map(lambda x: x.split(' ')))
# Reshape to wide format, keeping "QTR" as a column
out = pd.pivot_table(long, index = idx + ["QTR"], columns = 'item',
values = 'value', aggfunc = 'first').reset_index()
print(out)
item Center_Details State Center_Details District Center_Details Center \
0 JAMMU & KASHMIR KUPWARA Drug Mulla (CT)
1 JAMMU & KASHMIR KUPWARA Drug Mulla (CT)
2 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT)
3 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT)
4 PUNJAB GURDASPUR TIBRI
5 PUNJAB GURDASPUR TIBRI
6 PUNJAB PATHANKOT Mamun (CT)
7 PUNJAB PATHANKOT Mamun (CT)
item QTR Credit Deposit Offices
0 2017-18:Q1 600 500 4
1 2017-18:Q2 600 500 3
2 2017-18:Q1 600 500 4
3 2017-18:Q2 600 500 3
4 2017-18:Q1 600 500 4
5 2017-18:Q2 600 500 3
6 2017-18:Q1 600 500 4
7 2017-18:Q2 600 500 3
另一个选项可能是这样的:
long = df.set_index(['Center_Details']).stack().T.unstack()
long = pd.concat([pd.DataFrame(long.reset_index()['Center_Details'].tolist()),
long.reset_index()], axis=1)
long.columns = ['State', 'District', 'Center', 'Center_Details',
'Items', 'QTR', 'Value']
out = pd.pivot_table(long, index=['State', 'District', 'Center', 'QTR'],
columns='Items', values='Value',
aggfunc='first').reset_index()
print(out)
Items State District Center QTR Credit \
0 JAMMU & KASHMIR KUPWARA Drug Mulla (CT) 2017-18:Q1 600
1 JAMMU & KASHMIR KUPWARA Drug Mulla (CT) 2017-18:Q2 600
2 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT) 2017-18:Q1 600
3 JAMMU & KASHMIR LEH LADAKH Chuglamsar (CT) 2017-18:Q2 600
4 PUNJAB GURDASPUR TIBRI 2017-18:Q1 600
5 PUNJAB GURDASPUR TIBRI 2017-18:Q2 600
6 PUNJAB PATHANKOT Mamun (CT) 2017-18:Q1 600
7 PUNJAB PATHANKOT Mamun (CT) 2017-18:Q2 600
Items Deposit Offices
0 500 4
1 500 3
2 500 4
3 500 3
4 500 4
5 500 3
6 500 4
7 500 3
第三个选项是使用 wide_to_long
,但 wide_to_long
期望宽格式的列在开头有存根。该方法类似于第一种方法,但涉及的步骤更少。
它看起来像:
# Flatten the column names, but reverse the order of the tuples
# before flattening, and add a character to split on
df.columns = ['~'.join(col[::-1]).strip() for col in df.columns.values]
# Reshape the data, Stata-style
pd.wide_to_long(df, ['Offices', 'Deposit', 'Credit'],
i=['State~Center_Details', 'District~Center_Details', 'Center~Center_Details'],
j='Quarter', sep='~').reset_index()
您仍然需要对 "Center_Details" 列进行一些清理。
稍微修改一下@A5C1D2H2I1M1N2O1R2T1的回答,发现还是可以保留多索引结构:
idx = df[['Center_Details']].columns.values.tolist()
long = pd.melt(df, id_vars = idx)
# Renaming variable created by melt to a multiindex friendly name
long.rename(columns={'variable_0': ('Values', 'Qtr')}, inplace=True)
# Reshape to wide format, keeping Values, QTR as a hierarchical column
out = pd.pivot_table(long, index = idx + [('Values', 'Qtr')], columns = 'variable_1',
values = 'value', aggfunc = 'first')
# Creating tuples for new column names
out.columns = [('Values', col) for col in out.columns]
out = out.reset_index()
# Converting columns to multiindex
out.columns = pd.MultiIndex.from_tuples(out.columns.values)
print(out)
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| | Center_Details | | | Values | | | |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| | State | District | Center | QTR | Credit | Deposit | Offices |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 0 | JAMMU & KASHMIR | KUPWARA | Drug Mulla (CT) | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 1 | JAMMU & KASHMIR | KUPWARA | Drug Mulla (CT) | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 2 | JAMMU & KASHMIR | LEH LADAKH | Chuglamsar (CT) | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 3 | JAMMU & KASHMIR | LEH LADAKH | Chuglamsar (CT) | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 4 | PUNJAB | GURDASPUR | TIBRI | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 5 | PUNJAB | GURDASPUR | TIBRI | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 6 | PUNJAB | PATHANKOT | Mamun (CT) | 2017-18:Q1 | 600 | 500 | 4 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 7 | PUNJAB | PATHANKOT | Mamun (CT) | 2017-18:Q2 | 600 | 500 | 3 |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
PS:对于丑陋的 table 格式感到抱歉,我仍然不知道如何在 SO 上创建 tables。