用键替换 python 列表元素
Replacing python list elements with key
我有一个非唯一字符串列表:
list = ["a", "b", "c", "a", "a", "d", "b"]
我想用唯一标识每个字符串的整数键替换每个元素:
list = [0, 1, 2, 0, 0, 3, 1]
数字无所谓,只要是唯一标识即可。
到目前为止我能想到的就是将列表复制到一个集合中,并使用集合的索引来引用列表。不过我相信还有更好的方法。
>>> lst = ["a", "b", "c", "a", "a", "d", "b"]
>>> nums = [ord(x) for x in lst]
>>> print(nums)
[97, 98, 99, 97, 97, 100, 98]
如果你不挑剔,那就用散列函数:它returns一个整数。对于相同的字符串,它 returns 相同的散列:
li = ["a", "b", "c", "a", "a", "d", "b"]
li = map(hash, li) # Turn list of strings into list of ints
li = [hash(item) for item in li] # Same as above
这将保证唯一性,并且 ID 从 0
:
开始是连续的
id_s = {c: i for i, c in enumerate(set(list))}
li = [id_s[c] for c in list]
换句话说,你不应该使用 'list'
作为变量名,因为它会隐藏内置类型 list
.
这是 defaultdict 的单遍解决方案:
from collections import defaultdict
seen = defaultdict()
seen.default_factory = lambda: len(seen) # you could instead bind to seen.__len__
In [11]: [seen[c] for c in list]
Out[11]: [0, 1, 2, 0, 0, 3, 1]
有点小技巧但值得一提!
另一种选择,suggested by @user2357112 in a related question/answer, is to increment with itertools.count
。这允许您仅在构造函数中执行此操作:
from itertools import count
seen = defaultdict(count().__next__) # .next in python 2
这可能更可取,因为 default_factory 方法不会在全局范围内查找 seen
。
函数式方法:
l = ["a", "b", "c", "a", "a", "d", "b", "abc", "def", "abc"]
from itertools import count
from operator import itemgetter
mapped = itemgetter(*l)(dict(zip(l, count())))
您也可以使用简单的生成器函数:
from itertools import count
def uniq_ident(l):
cn,d = count(), {}
for ele in l:
if ele not in d:
c = next(cn)
d[ele] = c
yield c
else:
yield d[ele]
In [35]: l = ["a", "b", "c", "a", "a", "d", "b"]
In [36]: list(uniq_ident(l))
Out[36]: [0, 1, 2, 0, 0, 3, 1]
我有一个非唯一字符串列表:
list = ["a", "b", "c", "a", "a", "d", "b"]
我想用唯一标识每个字符串的整数键替换每个元素:
list = [0, 1, 2, 0, 0, 3, 1]
数字无所谓,只要是唯一标识即可。
到目前为止我能想到的就是将列表复制到一个集合中,并使用集合的索引来引用列表。不过我相信还有更好的方法。
>>> lst = ["a", "b", "c", "a", "a", "d", "b"]
>>> nums = [ord(x) for x in lst]
>>> print(nums)
[97, 98, 99, 97, 97, 100, 98]
如果你不挑剔,那就用散列函数:它returns一个整数。对于相同的字符串,它 returns 相同的散列:
li = ["a", "b", "c", "a", "a", "d", "b"]
li = map(hash, li) # Turn list of strings into list of ints
li = [hash(item) for item in li] # Same as above
这将保证唯一性,并且 ID 从 0
:
id_s = {c: i for i, c in enumerate(set(list))}
li = [id_s[c] for c in list]
换句话说,你不应该使用 'list'
作为变量名,因为它会隐藏内置类型 list
.
这是 defaultdict 的单遍解决方案:
from collections import defaultdict
seen = defaultdict()
seen.default_factory = lambda: len(seen) # you could instead bind to seen.__len__
In [11]: [seen[c] for c in list]
Out[11]: [0, 1, 2, 0, 0, 3, 1]
有点小技巧但值得一提!
另一种选择,suggested by @user2357112 in a related question/answer, is to increment with itertools.count
。这允许您仅在构造函数中执行此操作:
from itertools import count
seen = defaultdict(count().__next__) # .next in python 2
这可能更可取,因为 default_factory 方法不会在全局范围内查找 seen
。
函数式方法:
l = ["a", "b", "c", "a", "a", "d", "b", "abc", "def", "abc"]
from itertools import count
from operator import itemgetter
mapped = itemgetter(*l)(dict(zip(l, count())))
您也可以使用简单的生成器函数:
from itertools import count
def uniq_ident(l):
cn,d = count(), {}
for ele in l:
if ele not in d:
c = next(cn)
d[ele] = c
yield c
else:
yield d[ele]
In [35]: l = ["a", "b", "c", "a", "a", "d", "b"]
In [36]: list(uniq_ident(l))
Out[36]: [0, 1, 2, 0, 0, 3, 1]