在二维列表中查找最常见的字符串
Find most common string in a 2D list
我有一个二维列表:
arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
我想找到二维列表中出现次数最多的元素。在上面的例子中,最常见的字符串是 'Mohit'
.
我知道我可以使用两个 for 循环和一个字典来使用蛮力来做到这一点,但是有没有使用 numpy 或任何其他库的更有效的方法?
The nested lists could be of different lengths
谁也可以加上他们方法的时间?寻找禁食的方法。还有它可能不是很有效的注意事项。
编辑
这些是我系统上不同方法的时间:
#timegb
%%timeit
collections.Counter(chain.from_iterable(arr)).most_common(1)[0][0]
5.91 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Kevin Fang and Curious Mind
%%timeit
flat_list = [item for sublist in arr for item in sublist]
collections.Counter(flat_list).most_common(1)[0]
6.42 µs ± 501 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
c = collections.Counter(item for sublist in arr for item in sublist).most_common(1)c[0][0]
6.79 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Mayank Porwal
def most_common(lst):
return max(set(lst), key=lst.count)
%%timeit
ls = list(chain.from_iterable(arr))
most_common(ls)
2.33 µs ± 42.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#U9-Forward
%%timeit
l=[x for i in arr for x in i]
max(l,key=l.count)
2.6 µs ± 68.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Mayank Porwal 的方法在我的系统上运行最快。
我建议展平二维数组,然后使用计数器找出出现频率最高的元素。
flat_list = [item for sublist in arr for item in sublist]
from collections import Counter
Counter(flat_list).most_common(1)[0]
# ('Mohit', 3)
Counter(flat_list).most_common(1)[0][0]
# 'Mohit'
但不确定这是否是最快的方法。
编辑:
@timgeb 的回答有一种使用 itertools.chain
来展平列表的更快方法
@schwobaseggl 建议的更 space 有效的方法:
from collections import Counter
c = Counter(item for sublist in arr for item in sublist).most_common(1)
# [('Mohit', 3)]
c[0][0]
# 'Mohit'
像这样:
In [920]: from itertools import chain
In [923]: arr = list(chain.from_iterable(arr)) ## flatten into 1-D array
In [922]: def most_common(lst):
...: return max(set(lst), key=lst.count)
In [924]: most_common(arr)
Out[924]: 'Mohit'
时间安排:
from itertools import chain
import time
start_time = time.time()
arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
arr = list(chain.from_iterable(arr))
arr = arr*100
def most_common(lst):
return max(set(lst), key=lst.count)
print(most_common(arr))
print("--- %s seconds ---" % (time.time() - start_time))
mayankp@mayank:~$ python t1.py
Mohit
--- 0.000154972076416 seconds ---
- 用
itertools.chain.from_iterable
展平列表
- 应用
Counter
.
演示:
>>> from itertools import chain
>>> from collections import Counter
>>>
>>> lst = [['Mohit', 'shini','Manoj','Mot'],
...: ['Mohit', 'shini','Manoj'],
...: ['Mohit', 'Vis', 'Nusrath']]
...:
>>> Counter(chain.from_iterable(lst)).most_common(1)[0][0]
'Mohit'
详情:
>>> list(chain.from_iterable(lst))
['Mohit',
'shini',
'Manoj',
'Mot',
'Mohit',
'shini',
'Manoj',
'Mohit',
'Vis',
'Nusrath']
>>> Counter(chain.from_iterable(lst))
Counter({'Manoj': 2, 'Mohit': 3, 'Mot': 1, 'Nusrath': 1, 'Vis': 1, 'shini': 2})
>>> Counter(chain.from_iterable(lst)).most_common(1)
[('Mohit', 3)]
一些时间:
>>> lst = lst*100
>>> %timeit Counter(chain.from_iterable(lst)).most_common(1)[0][0] # timgeb
53.7 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit max([x for i in lst for x in i], key=l.count) # U9-Forward
207 µs ± 389 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit Counter([x for sublist in lst for x in sublist]).most_common(1)[0][0] # Curious_Mind/Kevin Fang #1
75.2 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit Counter(item for sublist in lst for item in sublist).most_common(1)[0][0] # Kevin Fang #2
95.2 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit flat = list(chain.from_iterable(lst)); max(set(flat), key=flat.count) # Mayank Porwal
98.4 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(注意 Kevin Fang 的第二个解决方案比第一个要慢一点,但内存效率更高。)
这样做的一种方法,
import collections
import time
start_time = time.time()
arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
c = collections.Counter([x for sublist in arr for x in sublist])
print(c.most_common(1) )
print("--- %s seconds ---" % (time.time() - start_time))
耗时: 0.00016713142395 秒
或者为什么不呢:
l=[x for i in arr for x in i]
max(l,key=l.count)
代码示例:
>>> arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
>>> l=[x for i in arr for x in i]
>>> max(l,key=l.count)
'Mohit'
>>>
我有一个二维列表:
arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
我想找到二维列表中出现次数最多的元素。在上面的例子中,最常见的字符串是 'Mohit'
.
我知道我可以使用两个 for 循环和一个字典来使用蛮力来做到这一点,但是有没有使用 numpy 或任何其他库的更有效的方法?
The nested lists could be of different lengths
谁也可以加上他们方法的时间?寻找禁食的方法。还有它可能不是很有效的注意事项。
编辑
这些是我系统上不同方法的时间:
#timegb
%%timeit
collections.Counter(chain.from_iterable(arr)).most_common(1)[0][0]
5.91 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Kevin Fang and Curious Mind
%%timeit
flat_list = [item for sublist in arr for item in sublist]
collections.Counter(flat_list).most_common(1)[0]
6.42 µs ± 501 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
c = collections.Counter(item for sublist in arr for item in sublist).most_common(1)c[0][0]
6.79 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Mayank Porwal
def most_common(lst):
return max(set(lst), key=lst.count)
%%timeit
ls = list(chain.from_iterable(arr))
most_common(ls)
2.33 µs ± 42.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#U9-Forward
%%timeit
l=[x for i in arr for x in i]
max(l,key=l.count)
2.6 µs ± 68.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Mayank Porwal 的方法在我的系统上运行最快。
我建议展平二维数组,然后使用计数器找出出现频率最高的元素。
flat_list = [item for sublist in arr for item in sublist]
from collections import Counter
Counter(flat_list).most_common(1)[0]
# ('Mohit', 3)
Counter(flat_list).most_common(1)[0][0]
# 'Mohit'
但不确定这是否是最快的方法。
编辑:
@timgeb 的回答有一种使用 itertools.chain
@schwobaseggl 建议的更 space 有效的方法:
from collections import Counter
c = Counter(item for sublist in arr for item in sublist).most_common(1)
# [('Mohit', 3)]
c[0][0]
# 'Mohit'
像这样:
In [920]: from itertools import chain
In [923]: arr = list(chain.from_iterable(arr)) ## flatten into 1-D array
In [922]: def most_common(lst):
...: return max(set(lst), key=lst.count)
In [924]: most_common(arr)
Out[924]: 'Mohit'
时间安排:
from itertools import chain
import time
start_time = time.time()
arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
arr = list(chain.from_iterable(arr))
arr = arr*100
def most_common(lst):
return max(set(lst), key=lst.count)
print(most_common(arr))
print("--- %s seconds ---" % (time.time() - start_time))
mayankp@mayank:~$ python t1.py
Mohit
--- 0.000154972076416 seconds ---
- 用
itertools.chain.from_iterable
展平列表
- 应用
Counter
.
演示:
>>> from itertools import chain
>>> from collections import Counter
>>>
>>> lst = [['Mohit', 'shini','Manoj','Mot'],
...: ['Mohit', 'shini','Manoj'],
...: ['Mohit', 'Vis', 'Nusrath']]
...:
>>> Counter(chain.from_iterable(lst)).most_common(1)[0][0]
'Mohit'
详情:
>>> list(chain.from_iterable(lst))
['Mohit',
'shini',
'Manoj',
'Mot',
'Mohit',
'shini',
'Manoj',
'Mohit',
'Vis',
'Nusrath']
>>> Counter(chain.from_iterable(lst))
Counter({'Manoj': 2, 'Mohit': 3, 'Mot': 1, 'Nusrath': 1, 'Vis': 1, 'shini': 2})
>>> Counter(chain.from_iterable(lst)).most_common(1)
[('Mohit', 3)]
一些时间:
>>> lst = lst*100
>>> %timeit Counter(chain.from_iterable(lst)).most_common(1)[0][0] # timgeb
53.7 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit max([x for i in lst for x in i], key=l.count) # U9-Forward
207 µs ± 389 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit Counter([x for sublist in lst for x in sublist]).most_common(1)[0][0] # Curious_Mind/Kevin Fang #1
75.2 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit Counter(item for sublist in lst for item in sublist).most_common(1)[0][0] # Kevin Fang #2
95.2 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit flat = list(chain.from_iterable(lst)); max(set(flat), key=flat.count) # Mayank Porwal
98.4 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(注意 Kevin Fang 的第二个解决方案比第一个要慢一点,但内存效率更高。)
这样做的一种方法,
import collections
import time
start_time = time.time()
arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
c = collections.Counter([x for sublist in arr for x in sublist])
print(c.most_common(1) )
print("--- %s seconds ---" % (time.time() - start_time))
耗时: 0.00016713142395 秒
或者为什么不呢:
l=[x for i in arr for x in i]
max(l,key=l.count)
代码示例:
>>> arr = [['Mohit', 'shini','Manoj','Mot'],
['Mohit', 'shini','Manoj'],
['Mohit', 'Vis', 'Nusrath']]
>>> l=[x for i in arr for x in i]
>>> max(l,key=l.count)
'Mohit'
>>>