计算句子中的词频
Counting word frequency in a sentence
我有两列 - 一列是句子,另一列是单个单词。
Sentence
word
"Such a day! It's a beautiful day out there"
"beautiful"
"Such a day! It's a beautiful day out there"
"day"
"I am sad by the sad weather"
"weather"
"I am sad by the sad weather"
"sad"
我想统计“词”列在“句子”列中出现的频率
并实现此输出:
Sentence
word
n
"Such a day! It's a beautiful day out there"
"beautiful"
1
"Such a day! It's a beautiful day out there"
"day"
2
"I am sad by the sad weather"
"weather"
1
"I am sad by the sad weather"
"sad"
2
我试过了:
ok = []
for l in [x.split() for x in df['Sentence']]:
for y in df['word']:
ok.append(l.count(y))
但是它不会停止 运行 并且需要很长时间,所以对于我的实际数据集来说是不可行的,因为它有 50k 行。
任何人都可以帮助实现这个目标吗?
你可以用 zip
df['new'] = [x.count(y) for x, y in zip(df.Sentence,df.word)]
df
Out[419]:
Sentence word new
0 Such a day! It's a beautiful day out there beautiful 1
1 Such a day! It's a beautiful day out there day 2
2 I am sad by the sad weather weather 1
3 I am sad by the sad weather sad 2
尝试使用 pandas.apply
:
df['n'] = df.apply(lambda r: r['Sentence'].count(r['word']), axis=1)
结果:
Sentence word n
0 Such a day! It's a beautiful day out there beautiful 1
1 Such a day! It's a beautiful day out there day 2
2 I am sad by the sad weather weather 1
3 I am sad by the sad weather sad 2
您可以使用以下代码对字符串中的字符串进行计数
# define string
string = "This is how you count same word of your defined string to another string using python"
substring = "string"
count = string.count(substring)
# print count
print(f"The count of the word {substring} is:", count)
输出:
字串的个数为:2
我有两列 - 一列是句子,另一列是单个单词。
Sentence | word |
---|---|
"Such a day! It's a beautiful day out there" | "beautiful" |
"Such a day! It's a beautiful day out there" | "day" |
"I am sad by the sad weather" | "weather" |
"I am sad by the sad weather" | "sad" |
我想统计“词”列在“句子”列中出现的频率 并实现此输出:
Sentence | word | n |
---|---|---|
"Such a day! It's a beautiful day out there" | "beautiful" | 1 |
"Such a day! It's a beautiful day out there" | "day" | 2 |
"I am sad by the sad weather" | "weather" | 1 |
"I am sad by the sad weather" | "sad" | 2 |
我试过了:
ok = []
for l in [x.split() for x in df['Sentence']]:
for y in df['word']:
ok.append(l.count(y))
但是它不会停止 运行 并且需要很长时间,所以对于我的实际数据集来说是不可行的,因为它有 50k 行。
任何人都可以帮助实现这个目标吗?
你可以用 zip
df['new'] = [x.count(y) for x, y in zip(df.Sentence,df.word)]
df
Out[419]:
Sentence word new
0 Such a day! It's a beautiful day out there beautiful 1
1 Such a day! It's a beautiful day out there day 2
2 I am sad by the sad weather weather 1
3 I am sad by the sad weather sad 2
尝试使用 pandas.apply
:
df['n'] = df.apply(lambda r: r['Sentence'].count(r['word']), axis=1)
结果:
Sentence word n
0 Such a day! It's a beautiful day out there beautiful 1
1 Such a day! It's a beautiful day out there day 2
2 I am sad by the sad weather weather 1
3 I am sad by the sad weather sad 2
您可以使用以下代码对字符串中的字符串进行计数
# define string
string = "This is how you count same word of your defined string to another string using python"
substring = "string"
count = string.count(substring)
# print count
print(f"The count of the word {substring} is:", count)
输出: 字串的个数为:2