按数据框中列的名称部分排序而不更改 python pandas 中其他列的位置？

Question

我有一个包含 906 列的数据框。 160 列是世界语言的名称。因此，数据框列或多或少是这样的：

[c1,c2,c3,c4,c....,Italian, English, German, French, Albanian, Spanish,... c903, c904, c905, c906]

我知道如何对列进行排序，但要考虑数据框的所有列，而不仅仅是其中的一部分。如何在 Python 中仅按字母顺序对具有语言名称的列进行排序，而不更改其他列的顺序？
我想要的输出应该是这样的：

[c1,c2,c3,c4,c....,Albanian, English, French, German, Italian, Spanish,... c903, c904, c905, c906]

非常感谢您的帮助！

Answer 1

假设您所有的非国家列都可以被识别并转换为布尔值（这里使用正则表达式来匹配 c\d+，总线这可以是任何东西），您可以使用 numpy.lexsort:

# identify non-target columns
# the regex here is just an example
# any other method can be used
a = df.columns.str.fullmatch('c\d+')
# array([ True,  True,  True,  True, False, False, False,
#        False, False, False,  True,  True,  True,  True])

# compute a first sorter
pos = a.cumsum()
# array([1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 5, 6, 7, 8])

# sort by first sorter then column names
df2 = df.iloc[:, np.lexsort((df.columns, pos))]

要从有效语言列表中获取语言列，可以使用 langcodes:

from langcodes import language_lists

lang = language_lists.WIKT_LANGUAGE_NAMES['en']

a = ~df.columns.isin(lang)
# array([ True,  True,  True,  True, False, False, False,
#        False, False, False,  True,  True,  True,  True])

pos = a.cumsum()
df2 = df.iloc[:, np.lexsort((df.columns, pos))]

输出：

c1, c2, c3, Albanian, English, French, German, Italian, Spanish, c4, c903, c904, c905, c906

使用的输入：

cols = ['c1', 'c2', 'c3', 'c4', 'Italian', 'English', 'German',
        'French', 'Albanian', 'Spanish', 'c903', 'c904', 'c905', 'c906']
df = pd.DataFrame(columns=cols)

按数据框中列的名称部分排序而不更改 python pandas 中其他列的位置？

Sorting by name part of the columns in a data frame without changing the position of the other columns in python pandas?

python

sorting

multiple-columns

pandas