使用其他长度不同的列表对列表进行排序
Sort list using other list with different length
尽管数据框列的长度更长,但我正在尝试根据它们在数据框中列出的顺序来使用列表。
enrolNo Surname
0 1 Jones
1 2 Smith
2 3 Henderson
3 4 Kilm
4 5 Henry
5 6 Joseph
late = ['Kilm', 'Henry', 'Smith']
期望的输出:
sorted_late = ['Smith', 'Kilm', 'Henry']
我最初的尝试是向现有数据框添加一个新列,然后将其提取为一个列表,但这似乎是一个很长的路要走。此外,我发现我的尝试不会成功,因为在尝试以以下方式开始后错误消息中指出的长度不同:
df_register['late_arrivals'] = np.where((df_register['Surname'] == late),
late , '')
我应该改用 'for' 循环吗?
为什么不使用 .isin()
功能?
df['Surename'].isin(late)
那么你应该得到想要的输出。
从数据框本身中提取匹配值。无需对列表本身进行排序:
sorted_late = df[df.Surname.isin(late)].Surname.to_list()
如果它是一个列表,你也可以巧妙地使用它:
sorted_late = [master_late for master_late in master_list if master_late in late]
你可以指定一个custom key for the sort function
import pandas
df = pandas.DataFrame([
{"enrolNo": 1, "Surname": "Jones"},
{"enrolNo": 2, "Surname": "Smith"},
{"enrolNo": 3, "Surname": "Henderson"},
{"enrolNo": 4, "Surname": "Kilm"},
{"enrolNo": 5, "Surname": "Henry"},
{"enrolNo": 6, "Surname": "Joseph"},
])
# set Surname as index so we can access enrolNo by it
df = df.set_index('Surname')
# now you can access enrolNo by Surname
assert df.loc['Kilm']['enrolNo'] == 4
# define the list to be sorted
late = ['Kilm', 'Henry', 'Smith']
# Sort late by enrolNo as listed in the dataframe
late_sorted = sorted(late, key=lambda n: df.loc[n]['enrolNo'])
# ['Smith', 'Kilm', 'Henry']
尽管数据框列的长度更长,但我正在尝试根据它们在数据框中列出的顺序来使用列表。
enrolNo Surname
0 1 Jones
1 2 Smith
2 3 Henderson
3 4 Kilm
4 5 Henry
5 6 Joseph
late = ['Kilm', 'Henry', 'Smith']
期望的输出:
sorted_late = ['Smith', 'Kilm', 'Henry']
我最初的尝试是向现有数据框添加一个新列,然后将其提取为一个列表,但这似乎是一个很长的路要走。此外,我发现我的尝试不会成功,因为在尝试以以下方式开始后错误消息中指出的长度不同:
df_register['late_arrivals'] = np.where((df_register['Surname'] == late),
late , '')
我应该改用 'for' 循环吗?
为什么不使用 .isin()
功能?
df['Surename'].isin(late)
那么你应该得到想要的输出。
从数据框本身中提取匹配值。无需对列表本身进行排序:
sorted_late = df[df.Surname.isin(late)].Surname.to_list()
如果它是一个列表,你也可以巧妙地使用它:
sorted_late = [master_late for master_late in master_list if master_late in late]
你可以指定一个custom key for the sort function
import pandas
df = pandas.DataFrame([
{"enrolNo": 1, "Surname": "Jones"},
{"enrolNo": 2, "Surname": "Smith"},
{"enrolNo": 3, "Surname": "Henderson"},
{"enrolNo": 4, "Surname": "Kilm"},
{"enrolNo": 5, "Surname": "Henry"},
{"enrolNo": 6, "Surname": "Joseph"},
])
# set Surname as index so we can access enrolNo by it
df = df.set_index('Surname')
# now you can access enrolNo by Surname
assert df.loc['Kilm']['enrolNo'] == 4
# define the list to be sorted
late = ['Kilm', 'Henry', 'Smith']
# Sort late by enrolNo as listed in the dataframe
late_sorted = sorted(late, key=lambda n: df.loc[n]['enrolNo'])
# ['Smith', 'Kilm', 'Henry']