追加两个 DataFrame 并对列进行排序，前两个除外

Question

我想连接从两个列表创建的两个数据框：

import pandas as pd
import numpy as np

header_1 = ['A', 'B', -1, 3, 5, 7]
data_1 = ['X', 'Y', 1, 2, 3, 4]
d = pd.DataFrame(np.array([data_1]), columns=header_1)

header_2 = ['A', 'B', -2, 4, 5, 6]
data_2 = ['X', 'Z', 1, 2, 3, 4]
e = pd.DataFrame(np.array([data_2]), columns=header_2)

f = pd.concat([d, e])

> f
   A  B   -1    3  5    7   -2    4    6
0  X  Y    1    2  3    4  NaN  NaN  NaN
0  X  Z  NaN  NaN  3  NaN    1    2    4

但是，我希望我的数字列按排序顺序显示，并且想知道是否有比拆分前两列、对剩余数据框排序并再次将两者连接起来更简单的方法：

ab_cols = f[['A', 'B']]               # Copy of first two columns
g = f.drop(['A', 'B'], axis=1)        # Removing cols from dataframe
h = g.sort_index(axis=1)              # Sort remaining by column header
i = pd.concat([ab_cols, h], axis=1)   # Putting everything together again

> i
   A  B   -2   -1    3    4  5    6    7
0  X  Y  NaN    1    2  NaN  3  NaN    4
0  X  Z    1  NaN  NaN    2  3    4  NaN

我考虑过多索引，但我已经在用索引做其他事情了（数据行的来源，这里没有显示），我担心三级多索引可能会使稍后切片数据帧会更复杂。

Answer 1

您可能已经发现的问题是，由于混合了 str 和 int 类型，目前无法对连接的列进行排序，您可以将列过滤为 str 和 numerical 类型，对 numerical 类型进行排序，然后使用新的列顺序对 reindex 进行排序，其中 str 类型在开头与排序后的数字列连接：

In [30]:
numerical_cols = f.columns[f.columns.to_series().apply(lambda x: type(x) != str)]
str_cols = f.columns[f.columns.to_series().apply(lambda x: type(x) == str)]
f.reindex(columns=str_cols.union(numerical_cols.sort_values()))

Out[30]:
   A  B   -2   -1    3    4  5    6    7
0  X  Y  NaN    1    2  NaN  3  NaN    4
0  X  Z    1  NaN  NaN    2  3    4  NaN

Answer 2

步骤：

将列作为索引和值都等于索引键的系列表示。

将 pd.to_numeric 与 errors=coerce 结合使用，以正确解析数值并将字符串值处理为 Nans。

对这些值进行排序，并在遇到它们时将 Nans（之前是字符串值）推到顶部。

获取它们相应的索引并根据这些新返回的列标签重新排列 DF。

c = pd.to_numeric(f.columns.to_series(), errors='coerce').sort_values(na_position='first')
f[c.index]

追加两个 DataFrame 并对列进行排序，前两个除外

Appending two DataFrames and sorting columns with exception of first two

python

simplify

pandas