计算列表列表中的共现
Counting co-occurence in a list of list
使用python,我定义了以下字符串列表
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
select 和计算与 'abc' 出现在同一子列表中的字符串的最佳方法是什么?在这个例子中,我正在寻找一个输出(字典或列表),例如:
('aa', 1), ('xdf', 1), ('xyz',2), (None, 1)
(None, 1)
将捕获 'abc'
是子列表中唯一字符串的情况。
这是一个利用 defaultdict()
:
的长手方法
from collections import defaultdict
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = defaultdict(int)
for i in some_list:
if 'abc' in i:
if len(i)==1:
d[None] += 1
for j in i:
if j=='abc': continue
d[j] += 1
产量:
defaultdict(<class 'int'>, {'aa': 1, 'xdf': 1, 'xyz': 2, None: 1})
我先把它们都算上,然后再单独处理边缘情况:
from collections import Counter
from itertools import chain
c = Counter(chain.from_iterable(x for x in A if 'abc' in x))
c[None] = A.count(['abc'])
您可以根据搜索条件过滤您的列表,然后过滤出搜索值以获得您的共现。然后你可以使用任何方法来计算它们的频率。
from collections import Counter
from itertools import chain
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
search = 'abc'
co_occurrences = [[v for v in lst if v != search] if len(lst) > 1 else [None] for lst in some_list if search in lst]
print(co_occurrences)
c = Counter(chain.from_iterable(co_occurrences))
print(c)
输出:
[['aa', 'xdf'], ['xyz'], ['xyz'], [None]]
Counter({'xyz': 2, 'aa': 1, 'xdf': 1, None: 1})
我的方法没有导入任何包:
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'],
['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = {'None':0}
for e in some_list:
if 'abc' in e and len(e) != 1:
for f in e:
if f != 'abc' and f not in d:
d[f] = 1
elif f != 'abc' and f in d:
d[f] +=1
elif 'abc' in e and len(e) == 1:
d['None'] += 1
print(d)
此代码将打印:
{'None': 1, 'aa': 1, 'xdf': 1, 'xyz': 2}
使用python,我定义了以下字符串列表
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
select 和计算与 'abc' 出现在同一子列表中的字符串的最佳方法是什么?在这个例子中,我正在寻找一个输出(字典或列表),例如:
('aa', 1), ('xdf', 1), ('xyz',2), (None, 1)
(None, 1)
将捕获 'abc'
是子列表中唯一字符串的情况。
这是一个利用 defaultdict()
:
from collections import defaultdict
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = defaultdict(int)
for i in some_list:
if 'abc' in i:
if len(i)==1:
d[None] += 1
for j in i:
if j=='abc': continue
d[j] += 1
产量:
defaultdict(<class 'int'>, {'aa': 1, 'xdf': 1, 'xyz': 2, None: 1})
我先把它们都算上,然后再单独处理边缘情况:
from collections import Counter
from itertools import chain
c = Counter(chain.from_iterable(x for x in A if 'abc' in x))
c[None] = A.count(['abc'])
您可以根据搜索条件过滤您的列表,然后过滤出搜索值以获得您的共现。然后你可以使用任何方法来计算它们的频率。
from collections import Counter
from itertools import chain
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
search = 'abc'
co_occurrences = [[v for v in lst if v != search] if len(lst) > 1 else [None] for lst in some_list if search in lst]
print(co_occurrences)
c = Counter(chain.from_iterable(co_occurrences))
print(c)
输出:
[['aa', 'xdf'], ['xyz'], ['xyz'], [None]]
Counter({'xyz': 2, 'aa': 1, 'xdf': 1, None: 1})
我的方法没有导入任何包:
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'],
['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = {'None':0}
for e in some_list:
if 'abc' in e and len(e) != 1:
for f in e:
if f != 'abc' and f not in d:
d[f] = 1
elif f != 'abc' and f in d:
d[f] +=1
elif 'abc' in e and len(e) == 1:
d['None'] += 1
print(d)
此代码将打印:
{'None': 1, 'aa': 1, 'xdf': 1, 'xyz': 2}