计算句子中的词频

Counting word frequency in a sentence

我有两列 - 一列是句子,另一列是单个单词。

Sentence word
"Such a day! It's a beautiful day out there" "beautiful"
"Such a day! It's a beautiful day out there" "day"
"I am sad by the sad weather" "weather"
"I am sad by the sad weather" "sad"

我想统计“词”列在“句子”列中出现的频率 并实现此输出:

Sentence word n
"Such a day! It's a beautiful day out there" "beautiful" 1
"Such a day! It's a beautiful day out there" "day" 2
"I am sad by the sad weather" "weather" 1
"I am sad by the sad weather" "sad" 2

我试过了:

ok = []
for l in [x.split() for x in df['Sentence']]:
    for y in df['word']:
        ok.append(l.count(y))

但是它不会停止 运行 并且需要很长时间,所以对于我的实际数据集来说是不可行的,因为它有 50k 行。

任何人都可以帮助实现这个目标吗?

你可以用 zip

df['new'] = [x.count(y) for x, y in zip(df.Sentence,df.word)]
df
Out[419]: 
                                     Sentence       word  new
0  Such a day! It's a beautiful day out there  beautiful    1
1  Such a day! It's a beautiful day out there        day    2
2                 I am sad by the sad weather    weather    1
3                 I am sad by the sad weather        sad    2

尝试使用 pandas.apply:

df['n'] = df.apply(lambda r: r['Sentence'].count(r['word']), axis=1)

结果:

                                     Sentence       word  n
0  Such a day! It's a beautiful day out there  beautiful  1
1  Such a day! It's a beautiful day out there        day  2
2                 I am sad by the sad weather    weather  1
3                 I am sad by the sad weather        sad  2

您可以使用以下代码对字符串中的字符串进行计数

# define string
string = "This is how you count same word of your defined string to another string using python"
substring = "string"

count = string.count(substring)

# print count
print(f"The count of the word {substring} is:", count)

输出: 字串的个数为:2