根据特定属性对对象列表进行排序(或部分排序)
Sorting (or partially sorting) a list of objects based on a specific attribute
问题
我有一个对象列表。每个对象都有两个属性:"score" 和 "coordinates"。我需要根据 score
属性找到列表中最大的 N 个对象。我遇到的主要问题是仅使用 score
属性对对象进行排序。排序可以是部分的。我只对 N 个最大的对象感兴趣。
当前解
我目前的方法不是最优雅也不是最有效的。这个想法是创建对象 indices
及其 score
的 dictionary
,然后对分数列表进行排序并使用 dictionary
来索引产生最大分数的对象。
这些是步骤:
创建 scores
的列表。列表的每个元素对应一个对象。 即第一个条目是第一个对象的分数,第二个条目是第二个对象的分数,依此类推。
使用对象的 scores
作为 key
和对象 index
作为 value
创建一个 dictionary
。
使用 heapq
排序得分列表以获得 N
最大的对象。
使用dictionary
得到那些最大scores
的对象。
创建一个新的 list
,其中只有 N
个得分最高的对象。
代码片段
这是我的排序函数:
import random
import heapq
# Gets the N objects with the largest score:
def getLargest(N, objects):
# Set output objects:
outobjects = objects
# Get the total of objects in list:
totalobjects = len(objects)
# Check if the total number of objects is bigger than the N requested
# largest objects:
if totalobjects > N:
# Get the "score" attributes from all the objects:
objectScores = [o.score for o in objects]
# Create a dictionary with the index of the objects and their score.
# I'm using a dictionary to keep track of the largest scores and
# the objects that produced them:
objectIndices = range(totalobjects)
objectDictionary = dict(zip(objectIndices, objectScores))
# Get the N largest objects based on score:
largestObjects = heapq.nlargest(N, objectScores)
print(largestObjects)
# Prepare the output list of objects:
outobjects = [None] * N
# Look for those objects that produced the
# largest score:
for k in range(N):
# Get current largest object:
currentLargest = largestObjects[k]
# Get its original position on the keypoint list:
position = objectScores.index(currentLargest)
# Index the corresponding keypoint and store it
# in the output list:
outobjects[k] = objects[position]
# Done:
return outobjects
此代码段生成 100
个用于测试我的方法的随机对象。最后的循环打印 N = 3
个随机生成的对象,其中最大的 score
:
# Create a list with random objects:
totalObjects = 100
randomObjects = []
# Test object class:
class Object(object):
pass
# Generate a list of random objects
for i in range(totalObjects):
# Instance of objects:
tempObject = Object()
# Set the object's random score
random.seed()
tempObject.score = random.random()
# Set the object's random coordinates:
tempObject.coordinates = (random.randint(0, 5), random.randint(0, 5))
# Store object into list:
randomObjects.append(tempObject)
# Get the 3 largest objects sorted by score:
totalLargestObjects = 3
largestObjects = getLargest(totalLargestObjects, randomObjects)
# Print the filtered objects:
for i in range(len(largestObjects)):
# Get the current object in the list:
currentObject = largestObjects[i]
# Get its score:
currentScore = currentObject.score
# Get its coordinates as a tuple (x,y)
currentCoordinates = currentObject.coordinates
# Print the info:
print("object: " + str(i) + " score: " + str(currentScore) + " x: " + str(
currentCoordinates[0]) + " y: " + str(currentCoordinates[1]))
我目前的方法可以完成工作,但必须有更 pythonic(更矢量化)的方法来实现同样的事情。我的背景主要是 C++,我还在学习 Python。欢迎任何反馈。
附加信息
最初,我在寻找类似于 C++ std:: nth_element
. It appears this functionality is somewhat provided in Python by NumPy's partition
的东西。不幸的是,虽然 std::nth_element
支持自定义排序的谓词,但 NumPy 的 partition
不支持。我最终使用了 heapq
,它可以很好地完成工作并按所需顺序排序,但我不知道基于一个属性进行排序的最佳方式。
元组正是您所需要的。不是将分数存储在堆中,而是将 (score, object)
的元组存储在堆中。它将尝试通过分数和 return 元组列表进行比较,您可以使用它来检索原始对象。这将为您节省按分数检索对象的额外步骤:
heapq.nlargest(3, ((obj.score, obj) for obj in randomObjects))
# [(0.9996643881256989, <__main__.Object object at 0x155f730>), (0.9991398955041872, <__main__.Object object at 0x119e928>), (0.9858047551444177, <__main__.Object object at 0x15e38c0>)]
举个真实的例子:https://akuiper.com/console/g6YuNa_1WClp
或者正如@shriakhilc 评论的那样,在 heapq.nlargest
中使用 key
参数来指定您要按分数进行比较:
heapq.nlargest(3, randomObjects, lambda o: o.score)
我建议你使用 sorted python 本地方法 + lambda 函数。看这里:https://docs.python.org/3/howto/sorting.html#sortinghowto
基本上,这是你可以拥有的:
myList = [
{score: 32, coordinates: [...]},
{score: 12, coordinates: [...]},
{score: 20, coordinates: [...]},
{score: 8, coordinates: [...]},
{score: 40, coordinates: [...]},
]
# Sort by score DESCENDING
mySortedList = sorted(myList, key=lambda element: element['score'], reverse=True)
# Retrieve top 3 results
myTopResults = mySortedList[0:3]
问题
我有一个对象列表。每个对象都有两个属性:"score" 和 "coordinates"。我需要根据 score
属性找到列表中最大的 N 个对象。我遇到的主要问题是仅使用 score
属性对对象进行排序。排序可以是部分的。我只对 N 个最大的对象感兴趣。
当前解
我目前的方法不是最优雅也不是最有效的。这个想法是创建对象 indices
及其 score
的 dictionary
,然后对分数列表进行排序并使用 dictionary
来索引产生最大分数的对象。
这些是步骤:
创建
scores
的列表。列表的每个元素对应一个对象。 即第一个条目是第一个对象的分数,第二个条目是第二个对象的分数,依此类推。使用对象的
scores
作为key
和对象index
作为value
创建一个dictionary
。使用
heapq
排序得分列表以获得N
最大的对象。使用
dictionary
得到那些最大scores
的对象。创建一个新的
list
,其中只有N
个得分最高的对象。
代码片段
这是我的排序函数:
import random
import heapq
# Gets the N objects with the largest score:
def getLargest(N, objects):
# Set output objects:
outobjects = objects
# Get the total of objects in list:
totalobjects = len(objects)
# Check if the total number of objects is bigger than the N requested
# largest objects:
if totalobjects > N:
# Get the "score" attributes from all the objects:
objectScores = [o.score for o in objects]
# Create a dictionary with the index of the objects and their score.
# I'm using a dictionary to keep track of the largest scores and
# the objects that produced them:
objectIndices = range(totalobjects)
objectDictionary = dict(zip(objectIndices, objectScores))
# Get the N largest objects based on score:
largestObjects = heapq.nlargest(N, objectScores)
print(largestObjects)
# Prepare the output list of objects:
outobjects = [None] * N
# Look for those objects that produced the
# largest score:
for k in range(N):
# Get current largest object:
currentLargest = largestObjects[k]
# Get its original position on the keypoint list:
position = objectScores.index(currentLargest)
# Index the corresponding keypoint and store it
# in the output list:
outobjects[k] = objects[position]
# Done:
return outobjects
此代码段生成 100
个用于测试我的方法的随机对象。最后的循环打印 N = 3
个随机生成的对象,其中最大的 score
:
# Create a list with random objects:
totalObjects = 100
randomObjects = []
# Test object class:
class Object(object):
pass
# Generate a list of random objects
for i in range(totalObjects):
# Instance of objects:
tempObject = Object()
# Set the object's random score
random.seed()
tempObject.score = random.random()
# Set the object's random coordinates:
tempObject.coordinates = (random.randint(0, 5), random.randint(0, 5))
# Store object into list:
randomObjects.append(tempObject)
# Get the 3 largest objects sorted by score:
totalLargestObjects = 3
largestObjects = getLargest(totalLargestObjects, randomObjects)
# Print the filtered objects:
for i in range(len(largestObjects)):
# Get the current object in the list:
currentObject = largestObjects[i]
# Get its score:
currentScore = currentObject.score
# Get its coordinates as a tuple (x,y)
currentCoordinates = currentObject.coordinates
# Print the info:
print("object: " + str(i) + " score: " + str(currentScore) + " x: " + str(
currentCoordinates[0]) + " y: " + str(currentCoordinates[1]))
我目前的方法可以完成工作,但必须有更 pythonic(更矢量化)的方法来实现同样的事情。我的背景主要是 C++,我还在学习 Python。欢迎任何反馈。
附加信息
最初,我在寻找类似于 C++ std:: nth_element
. It appears this functionality is somewhat provided in Python by NumPy's partition
的东西。不幸的是,虽然 std::nth_element
支持自定义排序的谓词,但 NumPy 的 partition
不支持。我最终使用了 heapq
,它可以很好地完成工作并按所需顺序排序,但我不知道基于一个属性进行排序的最佳方式。
元组正是您所需要的。不是将分数存储在堆中,而是将 (score, object)
的元组存储在堆中。它将尝试通过分数和 return 元组列表进行比较,您可以使用它来检索原始对象。这将为您节省按分数检索对象的额外步骤:
heapq.nlargest(3, ((obj.score, obj) for obj in randomObjects))
# [(0.9996643881256989, <__main__.Object object at 0x155f730>), (0.9991398955041872, <__main__.Object object at 0x119e928>), (0.9858047551444177, <__main__.Object object at 0x15e38c0>)]
举个真实的例子:https://akuiper.com/console/g6YuNa_1WClp
或者正如@shriakhilc 评论的那样,在 heapq.nlargest
中使用 key
参数来指定您要按分数进行比较:
heapq.nlargest(3, randomObjects, lambda o: o.score)
我建议你使用 sorted python 本地方法 + lambda 函数。看这里:https://docs.python.org/3/howto/sorting.html#sortinghowto
基本上,这是你可以拥有的:
myList = [
{score: 32, coordinates: [...]},
{score: 12, coordinates: [...]},
{score: 20, coordinates: [...]},
{score: 8, coordinates: [...]},
{score: 40, coordinates: [...]},
]
# Sort by score DESCENDING
mySortedList = sorted(myList, key=lambda element: element['score'], reverse=True)
# Retrieve top 3 results
myTopResults = mySortedList[0:3]