计算Python中2个词所有组合的出现频率

Calculate the frequency of all the combination of 2 words in Python

我有一段文字。我想计算2个词的所有可能组合(2个词必须相邻) 例如:

"I have 2 laptops, I have 2 chargers"

结果应该是:

"I have": 2
"have 2": 2
"2 laptops": 1
"Laptops, I": (Dont count)
"2 chargers": 1

我试过正则表达式,但问题是它不会对一个字符串计数两次

我用过:\b[a-z]{1,20}\b \b[a-z]{1,20}\b

正文:cold chain, energy storage device, industrial cooling system

它几乎可以工作,但它不包括 "storage device"、cooling system 等词,因为它已经需要 energy storageindustrial cooling

感谢您的建议

您可以使用zip获取每两个单词的组,然后使用Counter获取频率

>>> from collections import Counter
>>> text = "I have 2 laptops, I have 2 chargers"
>>> words = text.split()

>>> d = {' '.join(words):n for words,n in Counter(zip(words, words[1:])).items() if not  words[0][-1]==(',')}
>>> print (d)
{'I have': 2, 'have 2': 2, '2 laptops,': 1, '2 chargers': 1}

>>> import json
>>> print (json.dumps(d, indent=4))
{
    "I have": 2,
    "have 2": 2,
    "2 I": 1,
    "2 chargers": 1
}