我如何计算 python 中 list/set 中的唯一单词，即 nested/complicated

Question

我正在尝试计算 list/set（或任何名称）中的独特单词，看起来像这样：

names = [[], [], [], [], [], [['John ', 'John '], ['Peter ']], [], [], [], [['Morgan']], [], [], []]

(如果你需要知道，这个列表是通过匹配函数在我的计算机上的目录中查找 word 文档中的名称列表而形成的。你看到的空白是匹配的文档没什么）

到目前为止我已经试过了

names1 = set(names)
len (names1)

和

Counter(names).keys() 
Counter(names).values()

但都没有用。感谢任何帮助

Answer 1

我想到了这个：

from collections import defaultdict

d = defaultdict(int) # default int is 0
names = [[], [], [], [], [], [['John ', 'John '], ['Peter ']], [], [], [], [['Morgan']], [], [], []]

def find(ele):
    if isinstance(ele, str):
        d[ele] += 1
    
    if isinstance(ele, list):
        for e in ele:
            find(e)
    
find(names)
print(d) # {'John ': 2, 'Peter ': 1, 'Morgan': 1}

这是一个递归函数，用于检查它是否是一个列表。如果是，那么它会检查它是否不为空并再次调用自身。否则，它只是 returns。如果它找到一个字符串，它只会将自己添加到字典中。

Answer 2

我试图使@Jaideep Shekhar 的评论更加明确，并且还包括您最初对 Counter 对象的使用：

from collections import Counter

wordcount_dict = Counter()

for elem in names:
    for namelist in elem:
        wordcount_dict += Counter(namelist)

print(len(wordcount_dict))

我如何计算 python 中 list/set 中的唯一单词，即 nested/complicated

How do I count unique words in a list/set in python that is kind of nested/complicated

python

counting