无需循环即可替换列表中的相同元素
Replace identical elements in a list without loop
我正在尝试用新字符串替换列表中所有相同的元素,并且还试图摆脱对所有内容使用循环。
# My aim is to turn:
list = ["A", "", "", "D"]
# into:
list = ["A", "???", "???", "D"]
# but without using a for-loop
我从理解的变体开始:
# e.g. 1
['' = "???"(i) for i in list]
# e.g. 2
list = [list[i] .replace '???' if ''(i) for i in range(len(lst))]
然后我尝试使用 Python 的地图功能 here:
list[:] = map(lambda i: "???", list)
# I couldn't work out where to add the '""' to be replaced.
终于杀了一只third solution:
list[:] = ["???" if ''(i) else i for i in list]
我觉得我离合理的攻击路线更远了,我只是想要一个整洁的方式来完成一个简单的任务。
您可以使用列表理解,但您要做的是比较每个元素,如果匹配则替换为不同的字符串,否则只保留原始元素。
>>> data = ["A", "", "", "D"]
>>> ['???' if i == '' else i for i in data]
['A', '???', '???', 'D']
你可以试试这个:
list1 = ["A", "", "", "D"]
list2=list(map(lambda x: "???" if not x else x,list1))
print(list2)
这是上面那个的加长版:
list1 = ["A", "", "", "D"]
def check_string(string):
if not string:
return "???"
return string
list2=list(map(check_string,list1))
print(list2)
利用 ""
字符串是 False 值这一事实,您可以分别使用隐式布尔值和 return 值。
输出:
['A', '???', '???', 'D']
为了简洁(如果我们允许列表理解,是一种循环形式)。此外,正如@ComteHerappait 正确指出的那样,这是用 '???'
替换空字符串,与问题示例一致。
>>> [e or '???' for e in l]
['A', '???', '???', 'D']
如果我们专注于替换重复元素,那么:
seen = set()
newl = ['???' if e in seen or seen.add(e) else e for e in l]
>>> newl
['A', '', '???', 'D']
最后,以下内容替换列表中的所有重复项:
from collections import Counter
c = Counter(l)
newl = [e if c[e] < 2 else '???' for e in l]
>>> newl
['A', '???', '???', 'D']
这个怎么样:-
myList = ['A', '', '', 'D']
myMap = map(lambda i: '???' if i == '' else i, myList)
print(list(myMap))
...将导致:-
['A', '???', '???', 'D']
如果您想避免如标题所示使用循环,可以使用 np.where
而不是 list-comprehension,并且对于大型数组来说速度更快:
data = np.array(["A", "", "", "D"], dtype='object')
index = np.where(data == '')[0]
data[index] = "???"
data.tolist()
结果:
['A', '???', '???', 'D']
速度测试
for rep in [1, 10, 100, 1000, 10000]:
data = ["A", "", "", "D"] * rep
print(f'array of length {4 * rep}')
print('np.where:')
%timeit data2 = np.array(data, dtype='object'); index = np.where(data2 == '')[0]; data2[index] = "???"; data2.tolist()
print('list-comprehension:')
%timeit ['???' if i == '' else i for i in data]
结果:
array of length 4
np.where:
The slowest run took 11.79 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 10.7 µs per loop
list-comprehension:
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 487 ns per loop
array of length 40
np.where:
The slowest run took 7.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 13 µs per loop
list-comprehension:
100000 loops, best of 5: 2.99 µs per loop
array of length 400
np.where:
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 31 µs per loop
list-comprehension:
10000 loops, best of 5: 26 µs per loop
array of length 4000
np.where:
1000 loops, best of 5: 225 µs per loop
list-comprehension:
1000 loops, best of 5: 244 µs per loop
array of length 40000
np.where:
100 loops, best of 5: 2.27 ms per loop
list-comprehension:
100 loops, best of 5: 2.63 ms per loop
对于长度超过 4000 的数组 np.where
更快。
我正在尝试用新字符串替换列表中所有相同的元素,并且还试图摆脱对所有内容使用循环。
# My aim is to turn:
list = ["A", "", "", "D"]
# into:
list = ["A", "???", "???", "D"]
# but without using a for-loop
我从理解的变体开始:
# e.g. 1
['' = "???"(i) for i in list]
# e.g. 2
list = [list[i] .replace '???' if ''(i) for i in range(len(lst))]
然后我尝试使用 Python 的地图功能 here:
list[:] = map(lambda i: "???", list)
# I couldn't work out where to add the '""' to be replaced.
终于杀了一只third solution:
list[:] = ["???" if ''(i) else i for i in list]
我觉得我离合理的攻击路线更远了,我只是想要一个整洁的方式来完成一个简单的任务。
您可以使用列表理解,但您要做的是比较每个元素,如果匹配则替换为不同的字符串,否则只保留原始元素。
>>> data = ["A", "", "", "D"]
>>> ['???' if i == '' else i for i in data]
['A', '???', '???', 'D']
你可以试试这个:
list1 = ["A", "", "", "D"]
list2=list(map(lambda x: "???" if not x else x,list1))
print(list2)
这是上面那个的加长版:
list1 = ["A", "", "", "D"]
def check_string(string):
if not string:
return "???"
return string
list2=list(map(check_string,list1))
print(list2)
利用 ""
字符串是 False 值这一事实,您可以分别使用隐式布尔值和 return 值。
输出:
['A', '???', '???', 'D']
为了简洁(如果我们允许列表理解,是一种循环形式)。此外,正如@ComteHerappait 正确指出的那样,这是用 '???'
替换空字符串,与问题示例一致。
>>> [e or '???' for e in l]
['A', '???', '???', 'D']
如果我们专注于替换重复元素,那么:
seen = set()
newl = ['???' if e in seen or seen.add(e) else e for e in l]
>>> newl
['A', '', '???', 'D']
最后,以下内容替换列表中的所有重复项:
from collections import Counter
c = Counter(l)
newl = [e if c[e] < 2 else '???' for e in l]
>>> newl
['A', '???', '???', 'D']
这个怎么样:-
myList = ['A', '', '', 'D']
myMap = map(lambda i: '???' if i == '' else i, myList)
print(list(myMap))
...将导致:-
['A', '???', '???', 'D']
如果您想避免如标题所示使用循环,可以使用 np.where
而不是 list-comprehension,并且对于大型数组来说速度更快:
data = np.array(["A", "", "", "D"], dtype='object')
index = np.where(data == '')[0]
data[index] = "???"
data.tolist()
结果:
['A', '???', '???', 'D']
速度测试
for rep in [1, 10, 100, 1000, 10000]:
data = ["A", "", "", "D"] * rep
print(f'array of length {4 * rep}')
print('np.where:')
%timeit data2 = np.array(data, dtype='object'); index = np.where(data2 == '')[0]; data2[index] = "???"; data2.tolist()
print('list-comprehension:')
%timeit ['???' if i == '' else i for i in data]
结果:
array of length 4
np.where:
The slowest run took 11.79 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 10.7 µs per loop
list-comprehension:
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 487 ns per loop
array of length 40
np.where:
The slowest run took 7.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 13 µs per loop
list-comprehension:
100000 loops, best of 5: 2.99 µs per loop
array of length 400
np.where:
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 31 µs per loop
list-comprehension:
10000 loops, best of 5: 26 µs per loop
array of length 4000
np.where:
1000 loops, best of 5: 225 µs per loop
list-comprehension:
1000 loops, best of 5: 244 µs per loop
array of length 40000
np.where:
100 loops, best of 5: 2.27 ms per loop
list-comprehension:
100 loops, best of 5: 2.63 ms per loop
对于长度超过 4000 的数组 np.where
更快。