计算句子中的词频

Question

我有两列 - 一列是句子，另一列是单个单词。

Sentence	word
"Such a day! It's a beautiful day out there"	"beautiful"
"Such a day! It's a beautiful day out there"	"day"
"I am sad by the sad weather"	"weather"
"I am sad by the sad weather"	"sad"

我想统计“词”列在“句子”列中出现的频率并实现此输出：

Sentence	word	n
"Such a day! It's a beautiful day out there"	"beautiful"	1
"Such a day! It's a beautiful day out there"	"day"	2
"I am sad by the sad weather"	"weather"	1
"I am sad by the sad weather"	"sad"	2

我试过了：

ok = []
for l in [x.split() for x in df['Sentence']]:
    for y in df['word']:
        ok.append(l.count(y))

但是它不会停止运行并且需要很长时间，所以对于我的实际数据集来说是不可行的，因为它有 50k 行。

任何人都可以帮助实现这个目标吗？

Answer 1

你可以用 zip

df['new'] = [x.count(y) for x, y in zip(df.Sentence,df.word)]
df
Out[419]: 
                                     Sentence       word  new
0  Such a day! It's a beautiful day out there  beautiful    1
1  Such a day! It's a beautiful day out there        day    2
2                 I am sad by the sad weather    weather    1
3                 I am sad by the sad weather        sad    2

Answer 2

尝试使用 pandas.apply:

df['n'] = df.apply(lambda r: r['Sentence'].count(r['word']), axis=1)

结果：

                                     Sentence       word  n
0  Such a day! It's a beautiful day out there  beautiful  1
1  Such a day! It's a beautiful day out there        day  2
2                 I am sad by the sad weather    weather  1
3                 I am sad by the sad weather        sad  2

Answer 3

您可以使用以下代码对字符串中的字符串进行计数

# define string
string = "This is how you count same word of your defined string to another string using python"
substring = "string"

count = string.count(substring)

# print count
print(f"The count of the word {substring} is:", count)

输出：字串的个数为：2

计算句子中的词频

Counting word frequency in a sentence

python

string

nlp

count

pandas