在多索引中设置列 pandas
setting columns in multiindex pandas
我有这个 pandas 从 csv 导入的 df:
df
0 0 apple banana orange dates apple banana orange
1 1 1 1 1 Friday, January 01, 2021 1 1 1
2 2 1 1 1 Saturday, January 02, 2021 2 2 2
3 3 1 1 1 Sunday, January 03, 2021 3 3 3
4 4 1 1 1 Monday, January 04, 2021 4 4 4
5 5 1 1 1 Tuesday, January 05, 2021 5 5 5
6 6 1 1 1 Wednesday, January 06, 2021 6 6 6
7 7 1 1 1 Thursday, January 07, 2021 7 7 7
8 8 1 4 1 Friday, January 08, 2021 8 8 8
9 9 1 1 1 Saturday, January 09, 2021 9 9 9
是否可以在多索引格式中将左侧的所有内容分组在 fresh
下,并将日期右侧的所有内容分组在 spoil
列下。例如,有一列包含 [apple, banana, orange]。我想这样做是因为稍后当我将日期设置为索引时,不会混淆,因为列的两边具有相同的名称。
这可能有帮助
df.columns.values[1] = "苹果 1"
df.columns.values[2] = "香蕉 1"
df.columns = pd.MultiIndex.from_arrays([['', '', 'fresh', 'fresh', 'fresh', '', 'spoil', 'spoil', 'spoil'],
df.columns])
输出:
fresh spoil
0 0 apple banana orange dates apple banana orange
0 1 1 1 1 1 Friday, January 01, 2021 1 1 1
1 2 2 1 1 1 Saturday, January 02, 2021 2 2 2
2 3 3 1 1 1 Sunday, January 03, 2021 3 3 3
3 4 4 1 1 1 Monday, January 04, 2021 4 4 4
4 5 5 1 1 1 Tuesday, January 05, 2021 5 5 5
5 6 6 1 1 1 Wednesday, January 06, 2021 6 6 6
6 7 7 1 1 1 Thursday, January 07, 2021 7 7 7
7 8 8 1 4 1 Friday, January 08, 2021 8 8 8
8 9 9 1 1 1 Saturday, January 09, 2021 9 9 9
注意。如果你想set_index('dates')
在这个操作之前做,这会更容易
你可以试试:
# Get the column number of column `dates`
dates_loc = df.columns.get_loc('dates')
arrays = [['fresh'] * dates_loc + [''] + ['spoil'] * (len(df.columns) - dates_loc -1), df.columns.tolist()]
df.columns = pd.MultiIndex.from_arrays(arrays)
fresh spoil
0 0 apple banana orange dates apple banana orange
0 1 1 1 1 1 Friday, January 01, 2021 1 1 1
1 2 2 1 1 1 Saturday, January 02, 2021 2 2 2
2 3 3 1 1 1 Sunday, January 03, 2021 3 3 3
3 4 4 1 1 1 Monday, January 04, 2021 4 4 4
4 5 5 1 1 1 Tuesday, January 05, 2021 5 5 5
5 6 6 1 1 1 Wednesday, January 06, 2021 6 6 6
6 7 7 1 1 1 Thursday, January 07, 2021 7 7 7
7 8 8 1 4 1 Friday, January 08, 2021 8 8 8
8 9 9 1 1 1 Saturday, January 09, 2021 9 9 9
我有这个 pandas 从 csv 导入的 df:
df
0 0 apple banana orange dates apple banana orange
1 1 1 1 1 Friday, January 01, 2021 1 1 1
2 2 1 1 1 Saturday, January 02, 2021 2 2 2
3 3 1 1 1 Sunday, January 03, 2021 3 3 3
4 4 1 1 1 Monday, January 04, 2021 4 4 4
5 5 1 1 1 Tuesday, January 05, 2021 5 5 5
6 6 1 1 1 Wednesday, January 06, 2021 6 6 6
7 7 1 1 1 Thursday, January 07, 2021 7 7 7
8 8 1 4 1 Friday, January 08, 2021 8 8 8
9 9 1 1 1 Saturday, January 09, 2021 9 9 9
是否可以在多索引格式中将左侧的所有内容分组在 fresh
下,并将日期右侧的所有内容分组在 spoil
列下。例如,有一列包含 [apple, banana, orange]。我想这样做是因为稍后当我将日期设置为索引时,不会混淆,因为列的两边具有相同的名称。
这可能有帮助 df.columns.values[1] = "苹果 1" df.columns.values[2] = "香蕉 1"
df.columns = pd.MultiIndex.from_arrays([['', '', 'fresh', 'fresh', 'fresh', '', 'spoil', 'spoil', 'spoil'],
df.columns])
输出:
fresh spoil
0 0 apple banana orange dates apple banana orange
0 1 1 1 1 1 Friday, January 01, 2021 1 1 1
1 2 2 1 1 1 Saturday, January 02, 2021 2 2 2
2 3 3 1 1 1 Sunday, January 03, 2021 3 3 3
3 4 4 1 1 1 Monday, January 04, 2021 4 4 4
4 5 5 1 1 1 Tuesday, January 05, 2021 5 5 5
5 6 6 1 1 1 Wednesday, January 06, 2021 6 6 6
6 7 7 1 1 1 Thursday, January 07, 2021 7 7 7
7 8 8 1 4 1 Friday, January 08, 2021 8 8 8
8 9 9 1 1 1 Saturday, January 09, 2021 9 9 9
注意。如果你想set_index('dates')
在这个操作之前做,这会更容易
你可以试试:
# Get the column number of column `dates`
dates_loc = df.columns.get_loc('dates')
arrays = [['fresh'] * dates_loc + [''] + ['spoil'] * (len(df.columns) - dates_loc -1), df.columns.tolist()]
df.columns = pd.MultiIndex.from_arrays(arrays)
fresh spoil
0 0 apple banana orange dates apple banana orange
0 1 1 1 1 1 Friday, January 01, 2021 1 1 1
1 2 2 1 1 1 Saturday, January 02, 2021 2 2 2
2 3 3 1 1 1 Sunday, January 03, 2021 3 3 3
3 4 4 1 1 1 Monday, January 04, 2021 4 4 4
4 5 5 1 1 1 Tuesday, January 05, 2021 5 5 5
5 6 6 1 1 1 Wednesday, January 06, 2021 6 6 6
6 7 7 1 1 1 Thursday, January 07, 2021 7 7 7
7 8 8 1 4 1 Friday, January 08, 2021 8 8 8
8 9 9 1 1 1 Saturday, January 09, 2021 9 9 9