计算名词和 verbs/adjectives 之间的共现
Counting co-occurrences between nouns and verbs/adjectives
我有一个包含评论的数据框,以及两个列表,一个存储名词,另一个存储 verbs/adjectives。
示例代码:
import pandas as pd
data = {'reviews':['Very professional operation. Room is very clean and comfortable',
'Daniel is the most amazing host! His place is extremely clean, and he provides everything you could possibly want (comfy bed, guidebooks & maps, mini-fridge, towels, even toiletries). He is extremely friendly and helpful.',
'The room is very quiet, and well decorated, very clean.',
'He provides the room with towels, tea, coffee and a wardrobe.',
'Daniel is a great host. Always recomendable.',
'My friend and I were very satisfied with our stay in his apartment.']}
df = pd.DataFrame(data)
nouns = ['place','Amsterdam','apartment','location','host','stay','city','room','everything','time','house',
'area','home','’','center','restaurants','centre','Great','tram','très','minutes','walk','space','neighborhood',
'à','station','bed','experience','hosts','Thank','bien']
verbs_adj = ['was','is','great','nice','had','clean','were','recommend','stay','are','good','perfect','comfortable',
'have','easy','be','quiet','helpful','get','beautiful',"'s",'has','est','located','un','amazing','wonderful',]
使用数据框和两个列表,我如何创建一个函数,其中 returns 每个评论中名词的动词和形容词共现字典的字典?我理想的输出是:
评论示例:'A big restaurant served delicious food in big dishes'
>>> {‘restaurant’: {‘big’: 2, ‘served’:1, ‘delicious’:1}}
你可以试试这个:
from collections import Counter
from copy import deepcopy
from pprint import pprint
data = ...
nouns = ...
verbs_adj = ...
def count_co_occurences(reviews):
# Iterate on each review and count
occurences_per_review = {
f"review_{i+1}": {
noun: dict(Counter(review.lower().split(" ")))
for noun in nouns
if noun in review.lower()
}
for i, review in enumerate(reviews)
}
# Remove verb_adj not found in main list
opr = deepcopy(occurences_per_review)
for review, occurences in opr.items():
for noun, counts in occurences.items():
for verb_adj in counts.keys():
if verb_adj not in verbs_adj:
del occurences_per_review[review][noun][verb_adj]
return occurences_per_review
pprint(count_co_occurences(data["reviews"]))
# Outputs
{'review_1': {'room': {'clean': 1, 'comfortable': 1, 'is': 1}},
'review_2': {'bed': {'amazing': 1, 'is': 3},
'everything': {'amazing': 1, 'is': 3},
'host': {'amazing': 1, 'is': 3},
'place': {'amazing': 1, 'is': 3}},
'review_3': {'room': {'is': 1}},
'review_4': {'room': {}},
'review_5': {'host': {'great': 1, 'is': 1}},
'review_6': {'apartment': {'stay': 1, 'were': 1},
'stay': {'stay': 1, 'were': 1}}}
我有一个包含评论的数据框,以及两个列表,一个存储名词,另一个存储 verbs/adjectives。
示例代码:
import pandas as pd
data = {'reviews':['Very professional operation. Room is very clean and comfortable',
'Daniel is the most amazing host! His place is extremely clean, and he provides everything you could possibly want (comfy bed, guidebooks & maps, mini-fridge, towels, even toiletries). He is extremely friendly and helpful.',
'The room is very quiet, and well decorated, very clean.',
'He provides the room with towels, tea, coffee and a wardrobe.',
'Daniel is a great host. Always recomendable.',
'My friend and I were very satisfied with our stay in his apartment.']}
df = pd.DataFrame(data)
nouns = ['place','Amsterdam','apartment','location','host','stay','city','room','everything','time','house',
'area','home','’','center','restaurants','centre','Great','tram','très','minutes','walk','space','neighborhood',
'à','station','bed','experience','hosts','Thank','bien']
verbs_adj = ['was','is','great','nice','had','clean','were','recommend','stay','are','good','perfect','comfortable',
'have','easy','be','quiet','helpful','get','beautiful',"'s",'has','est','located','un','amazing','wonderful',]
使用数据框和两个列表,我如何创建一个函数,其中 returns 每个评论中名词的动词和形容词共现字典的字典?我理想的输出是:
评论示例:'A big restaurant served delicious food in big dishes'
>>> {‘restaurant’: {‘big’: 2, ‘served’:1, ‘delicious’:1}}
你可以试试这个:
from collections import Counter
from copy import deepcopy
from pprint import pprint
data = ...
nouns = ...
verbs_adj = ...
def count_co_occurences(reviews):
# Iterate on each review and count
occurences_per_review = {
f"review_{i+1}": {
noun: dict(Counter(review.lower().split(" ")))
for noun in nouns
if noun in review.lower()
}
for i, review in enumerate(reviews)
}
# Remove verb_adj not found in main list
opr = deepcopy(occurences_per_review)
for review, occurences in opr.items():
for noun, counts in occurences.items():
for verb_adj in counts.keys():
if verb_adj not in verbs_adj:
del occurences_per_review[review][noun][verb_adj]
return occurences_per_review
pprint(count_co_occurences(data["reviews"]))
# Outputs
{'review_1': {'room': {'clean': 1, 'comfortable': 1, 'is': 1}},
'review_2': {'bed': {'amazing': 1, 'is': 3},
'everything': {'amazing': 1, 'is': 3},
'host': {'amazing': 1, 'is': 3},
'place': {'amazing': 1, 'is': 3}},
'review_3': {'room': {'is': 1}},
'review_4': {'room': {}},
'review_5': {'host': {'great': 1, 'is': 1}},
'review_6': {'apartment': {'stay': 1, 'were': 1},
'stay': {'stay': 1, 'were': 1}}}