根据分数重新排列列表项以适应函数曲线
Rearranging list items based on a score to fit a function curve
鉴于我有:
- 单词列表
- points/scores表示每个单词"simplicity"
- 每个单词的难度等级:
例如
>>> words = ['apple', 'pear', 'car', 'man', 'average', 'older', 'values', 'coefficient', 'exponential']
>>> points = ['9999', '9231', '8231', '5123', '4712', '3242', '500', '10', '5']
>>> bins = [0, 0, 0, 0, 1, 1, 1, 2, 2]
目前,单词列表按简单程度排序 points
。
如果我想将简单性建模为"quadratic curve"怎么办?,即从最高点到最低点然后再回到最高点,即产生一个单词列表对应点看起来像这样:
['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']
我已经试过了,但它太疯狂了:
>>> from collections import Counter
>>> Counter(bins)[0]
4
>>> num_easy, num_mid, num_hard = Counter(bins)[0], Counter(bins)[1], Counter(bins)[2]
>>> num_easy
4
>>> easy_words = words[:num_easy]
>>> mid_words = words[num_easy:num_easy+num_mid]
>>> hard_words = words[-num_hard:]
>>> easy_words, mid_words, hard_words
(['apple', 'pear', 'car', 'man'], ['average', 'older', 'values'], ['coefficient', 'exponential'])
>>> easy_1 = easy_words[:int(num_easy/2)]
>>> easy_2 = easy_words[len(easy_1):]
>>> mid_1 = mid_words[:int(num_mid/2)]
>>> mid_2 = mid_words[len(mid_1):]
>>> new_words = easy_1 + mid_1 + hard_words + mid_2 + easy_1
>>> new_words
['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']
想象一下没有。垃圾箱的数量 >3 或者我可能想要 "points" 个单词来拟合正弦曲线。
请注意,这不完全是一个 nlp
问题,也与 'zipf' 分布和创建匹配或重新排列单词排名的内容无关。
假设有一个整数列表,您有一个对象(在本例中是一个词)映射到每个整数,并且您想要重新排序对象列表以适合二次曲线。
根据您的自定义标准将其排序到列表中,检查其长度是偶数还是奇数,然后将其分成两个块并反转后半部分:
>>> def peak(s):
... return s[::2]+s[-1-(len(s)%2)::-2]
...
>>> peak('112233445566778')
'123456787654321'
>>> peak('1122334455667788')
'1234567887654321'
请注意,不均匀的数据可能会产生不对称的结果:
>>> peak('11111123')
'11123111'
我会按照这些思路做某事。按点对单词进行排序,取出每一秒,反转一半并连接两个:
>>> s = sorted(zip(map(int, points), words))
>>> new_words = [word for p, word in list(reversed(s[::2])) + s[1::2]]
# If you have lots of words you'll be better off using some
# itertools like islice and chain, but the principle becomes evident
>>> new_words
['apple', 'car', 'older', 'values', 'exponential', 'coefficient', 'average', 'man', 'pear']
顺序如下:
[(9999, 'apple'), (8231, 'car'), (4712, 'older'), (500, 'values'), (5, 'exponential'), (10, 'coefficient'), (3242, 'average'), (5123, 'man'), (9231, 'pear')]
鉴于我有:
- 单词列表
- points/scores表示每个单词"simplicity"
- 每个单词的难度等级:
例如
>>> words = ['apple', 'pear', 'car', 'man', 'average', 'older', 'values', 'coefficient', 'exponential']
>>> points = ['9999', '9231', '8231', '5123', '4712', '3242', '500', '10', '5']
>>> bins = [0, 0, 0, 0, 1, 1, 1, 2, 2]
目前,单词列表按简单程度排序 points
。
如果我想将简单性建模为"quadratic curve"怎么办?,即从最高点到最低点然后再回到最高点,即产生一个单词列表对应点看起来像这样:
['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']
我已经试过了,但它太疯狂了:
>>> from collections import Counter
>>> Counter(bins)[0]
4
>>> num_easy, num_mid, num_hard = Counter(bins)[0], Counter(bins)[1], Counter(bins)[2]
>>> num_easy
4
>>> easy_words = words[:num_easy]
>>> mid_words = words[num_easy:num_easy+num_mid]
>>> hard_words = words[-num_hard:]
>>> easy_words, mid_words, hard_words
(['apple', 'pear', 'car', 'man'], ['average', 'older', 'values'], ['coefficient', 'exponential'])
>>> easy_1 = easy_words[:int(num_easy/2)]
>>> easy_2 = easy_words[len(easy_1):]
>>> mid_1 = mid_words[:int(num_mid/2)]
>>> mid_2 = mid_words[len(mid_1):]
>>> new_words = easy_1 + mid_1 + hard_words + mid_2 + easy_1
>>> new_words
['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']
想象一下没有。垃圾箱的数量 >3 或者我可能想要 "points" 个单词来拟合正弦曲线。
请注意,这不完全是一个 nlp
问题,也与 'zipf' 分布和创建匹配或重新排列单词排名的内容无关。
假设有一个整数列表,您有一个对象(在本例中是一个词)映射到每个整数,并且您想要重新排序对象列表以适合二次曲线。
根据您的自定义标准将其排序到列表中,检查其长度是偶数还是奇数,然后将其分成两个块并反转后半部分:
>>> def peak(s):
... return s[::2]+s[-1-(len(s)%2)::-2]
...
>>> peak('112233445566778')
'123456787654321'
>>> peak('1122334455667788')
'1234567887654321'
请注意,不均匀的数据可能会产生不对称的结果:
>>> peak('11111123')
'11123111'
我会按照这些思路做某事。按点对单词进行排序,取出每一秒,反转一半并连接两个:
>>> s = sorted(zip(map(int, points), words))
>>> new_words = [word for p, word in list(reversed(s[::2])) + s[1::2]]
# If you have lots of words you'll be better off using some
# itertools like islice and chain, but the principle becomes evident
>>> new_words
['apple', 'car', 'older', 'values', 'exponential', 'coefficient', 'average', 'man', 'pear']
顺序如下:
[(9999, 'apple'), (8231, 'car'), (4712, 'older'), (500, 'values'), (5, 'exponential'), (10, 'coefficient'), (3242, 'average'), (5123, 'man'), (9231, 'pear')]