计算列表列表中的共现

Counting co-occurence in a list of list

使用python,我定义了以下字符串列表

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]

select 和计算与 'abc' 出现在同一子列表中的字符串的最佳方法是什么?在这个例子中,我正在寻找一个输出(字典或列表),例如:

('aa', 1), ('xdf', 1), ('xyz',2), (None, 1)

(None, 1) 将捕获 'abc' 是子列表中唯一字符串的情况。

这是一个利用 defaultdict():

的长手方法
from collections import defaultdict

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]

d = defaultdict(int)

for i in some_list:
    if 'abc' in i:
        if len(i)==1:
            d[None] += 1
        for j in i:
            if j=='abc': continue
            d[j] += 1

产量:

defaultdict(<class 'int'>, {'aa': 1, 'xdf': 1, 'xyz': 2, None: 1})

我先把它们都算上,然后再单独处理边缘情况:

from collections import Counter
from itertools import chain

c = Counter(chain.from_iterable(x for x in A if 'abc' in x))
c[None] = A.count(['abc'])

您可以根据搜索条件过滤您的列表,然后过滤出搜索值以获得您的共现。然后你可以使用任何方法来计算它们的频率。

from collections import Counter
from itertools import chain

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
search = 'abc'
co_occurrences = [[v for v in lst if v != search] if len(lst) > 1 else [None] for lst in some_list if search in lst]
print(co_occurrences)
c = Counter(chain.from_iterable(co_occurrences))
print(c)

输出:

[['aa', 'xdf'], ['xyz'], ['xyz'], [None]]
Counter({'xyz': 2, 'aa': 1, 'xdf': 1, None: 1})

我的方法没有导入任何包:

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], 
             ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = {'None':0}
for e in some_list:
    if 'abc' in e and len(e) != 1:
        for f in e:
            if f != 'abc' and f not in d:
                d[f] = 1
            elif f != 'abc' and f in d:
                d[f] +=1
    elif 'abc' in e and len(e) == 1:
        d['None'] += 1
print(d)

此代码将打印:

{'None': 1, 'aa': 1, 'xdf': 1, 'xyz': 2}