列堆叠到 Series 后的 MultiIndex DataFrame 排序顺序

Question

DataFrame 堆栈应用程序后 sort_index 期间的问题：

我有一个 MS Excel 单页文件，如下所示： link to table screenshot

我在 DataFrame 中读取它并收到预期结果：

df_test = pd.read_excel(io = path_test_xlsx, header = 0, index_col = 0) 
       US  AR   QA
Date                   
2015-01-01   1  10  100
2016-01-01   2  20  200
2017-01-01   3  30  300

然后我把它堆叠成这样的系列：

ser_test = df_test.stack()
ser_test.index.names = ['Date', 'Country']
Date        Country
2015-01-01  US           1
            AR          10
            QA         100
2016-01-01  US           2
            AR          20
            QA         200
2017-01-01  US           3
            AR          30
            QA         300

下一步是按国家/地区 对系列进行排序，然后按日期对系列进行排序。所以我尝试了这个：

print(ser_test.sort_index(level = ['Country', 'Date']))
Date        Country
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300

在我看来问题出在堆叠过程中，因为下一个操作列表促使我取得成功：

df_test_reseted = ser_test.reset_index(level = 'Country')
ser_test_reseted = df_test_reseted.set_index('Country', append = True).squeeze()
print(ser_test_reseted.sort_index(level = ['Country', 'Date']))
Date        Country
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3

是堆栈过程真的导致忽略字典顺序还是我做错了什么？

Answer 1

这似乎是错误，因为排序是按标签，而不是按级别。如果重新创建 MultiIndex 它工作正常：

print (ser_test.index.labels)
[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]]

print (ser_test.index.levels)
[['2015-01-01', '2016-01-01', '2017-01-01'], ['US', 'AR', 'QA']]

ser_test.index = pd.MultiIndex.from_tuples(ser_test.index.tolist(), names=['Date', 'Country'])
print(ser_test.sort_index(level = ['Country', 'Date']))
Date        Country
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3
dtype: int64

另一个想法是使用这个技巧 - 在最后一步中通过 0 将 Series 转换为一列 DataFrame 和 select：

print(ser_test.to_frame().sort_index(level=['Country', 'Date'])[0])
Date        Country
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3
Name: 0, dtype: int64

列堆叠到 Series 后的 MultiIndex DataFrame 排序顺序

MultiIndex DataFrame sorting order after columns stacking to Series

sorting

lexicographic

multi-index

python-3.x

pandas