如何根据包含列顺序的列表重新排序 pandas 数据框
How to reorder pandas dataframe based off list containing column order
假设我有一个包含文件列表及其内容的数据框 'df':
File Field Folder
Users.csv Age UserFolder
Users.csv Name UserFolder
Cars.csv Color CarFolder
Cars.csv Model CarFolder
如果我已经订购了 'Field' 列应该如何排序的列表,我该如何重新排序这个 df?
users_col_order = ['Name', 'Age']
cars_col_order = ['Model', 'Color']
这样生成的 df 会像这样重新排序(我并不是想按字母倒序对 'Field' 进行排序,这个例子纯属巧合):
File Field Folder
Users.csv Name UserFolder
Users.csv Age UserFolder
Cars.csv Model CarFolder
Cars.csv Color CarFolder
首先,将您的新订单放入字典中:
mapping = {
'Users': ['Name', 'Age'],
'Cars': ['Model', 'Color'],
}
然后,创建一个新列,根据 File
值正确定位这些值,并使 Field
成为索引并使用新列对其进行索引:
original_cols = df.columns
for k, v in mapping.items():
df.loc[df['File'] == k + '.csv', 'tmp'] = v
df = df.set_index('Field').loc[df['tmp']].reset_index().drop('tmp', axis=1)[original_cols]
输出:
>>> df
File Field Folder
0 Users.csv Name UserFolder
1 Users.csv Age UserFolder
2 Cars.csv Model CarFolder
3 Cars.csv Color CarFolder
将 pd.Categorical
与 ordered=True
一起使用!
categories = users_col_order + cars_col_order
df['Field'] = pd.Categorical(values = df['Field'],
categories = categories,
ordered = True)
df.sort_values(by='Field')
File Field Folder
Users.csv Name UserFolder
Users.csv Age UserFolder
Cars.csv Model CarFolder
Cars.csv Color CarFolder
如果需要,您可以随时创建一个新列 Field_categorical
以保留 Field
中的原始值。
假设我有一个包含文件列表及其内容的数据框 'df':
File Field Folder
Users.csv Age UserFolder
Users.csv Name UserFolder
Cars.csv Color CarFolder
Cars.csv Model CarFolder
如果我已经订购了 'Field' 列应该如何排序的列表,我该如何重新排序这个 df?
users_col_order = ['Name', 'Age']
cars_col_order = ['Model', 'Color']
这样生成的 df 会像这样重新排序(我并不是想按字母倒序对 'Field' 进行排序,这个例子纯属巧合):
File Field Folder
Users.csv Name UserFolder
Users.csv Age UserFolder
Cars.csv Model CarFolder
Cars.csv Color CarFolder
首先,将您的新订单放入字典中:
mapping = {
'Users': ['Name', 'Age'],
'Cars': ['Model', 'Color'],
}
然后,创建一个新列,根据 File
值正确定位这些值,并使 Field
成为索引并使用新列对其进行索引:
original_cols = df.columns
for k, v in mapping.items():
df.loc[df['File'] == k + '.csv', 'tmp'] = v
df = df.set_index('Field').loc[df['tmp']].reset_index().drop('tmp', axis=1)[original_cols]
输出:
>>> df
File Field Folder
0 Users.csv Name UserFolder
1 Users.csv Age UserFolder
2 Cars.csv Model CarFolder
3 Cars.csv Color CarFolder
将 pd.Categorical
与 ordered=True
一起使用!
categories = users_col_order + cars_col_order
df['Field'] = pd.Categorical(values = df['Field'],
categories = categories,
ordered = True)
df.sort_values(by='Field')
File Field Folder
Users.csv Name UserFolder
Users.csv Age UserFolder
Cars.csv Model CarFolder
Cars.csv Color CarFolder
如果需要,您可以随时创建一个新列 Field_categorical
以保留 Field
中的原始值。