我怎样才能让词云一起出现?
How can i make cloud of words occuring together?
编辑"please focus the answer only for example below, no broad scenarios"
好的。我读过词云。但我想知道如何在字符串变量中表示最常一起出现的单词,如下例所示?:
Var_x
wireless problems, migration to competitor
dissatisfied customers, technicians visits scheduled
call waiting, technicians visits
bad customer experience, wireless problems
所以我想要的是:("wireless problems" 和 "technicians visits")在云中的表示。如何做到这一点?
这段代码产生了相邻词的频率分布,可以作为底层词云数据:
from nltk import bigrams, FreqDist
from nltk.tokenize import RegexpTokenizer
from operator import itemgetter
sent = 'wireless problems, migration to competitor\n\
dissatisfied customers, technicians visits scheduled\n\
call waiting, technicians visits\n\
bad customer experience, wireless problems'
tokenizer = RegexpTokenizer(r'\w+')
sent_words = tokenizer.tokenize(sent)
freq_dist = FreqDist(bigrams(sent_words))
for k,v in sorted(freq_dist.items(), key=itemgetter(1), reverse=True):
print(k,v)
输出
('technicians', 'visits') 2
('wireless', 'problems') 2
('dissatisfied', 'customers') 1
('bad', 'customer') 1
('scheduled', 'call') 1
('competitor', 'dissatisfied') 1
('migration', 'to') 1
('to', 'competitor') 1
('visits', 'scheduled') 1
('call', 'waiting') 1
('problems', 'migration') 1
('waiting', 'technicians') 1
('customers', 'technicians') 1
('customer', 'experience') 1
('experience', 'wireless') 1
('visits', 'bad') 1
编辑"please focus the answer only for example below, no broad scenarios"
好的。我读过词云。但我想知道如何在字符串变量中表示最常一起出现的单词,如下例所示?:
Var_x
wireless problems, migration to competitor
dissatisfied customers, technicians visits scheduled
call waiting, technicians visits
bad customer experience, wireless problems
所以我想要的是:("wireless problems" 和 "technicians visits")在云中的表示。如何做到这一点?
这段代码产生了相邻词的频率分布,可以作为底层词云数据:
from nltk import bigrams, FreqDist
from nltk.tokenize import RegexpTokenizer
from operator import itemgetter
sent = 'wireless problems, migration to competitor\n\
dissatisfied customers, technicians visits scheduled\n\
call waiting, technicians visits\n\
bad customer experience, wireless problems'
tokenizer = RegexpTokenizer(r'\w+')
sent_words = tokenizer.tokenize(sent)
freq_dist = FreqDist(bigrams(sent_words))
for k,v in sorted(freq_dist.items(), key=itemgetter(1), reverse=True):
print(k,v)
输出
('technicians', 'visits') 2
('wireless', 'problems') 2
('dissatisfied', 'customers') 1
('bad', 'customer') 1
('scheduled', 'call') 1
('competitor', 'dissatisfied') 1
('migration', 'to') 1
('to', 'competitor') 1
('visits', 'scheduled') 1
('call', 'waiting') 1
('problems', 'migration') 1
('waiting', 'technicians') 1
('customers', 'technicians') 1
('customer', 'experience') 1
('experience', 'wireless') 1
('visits', 'bad') 1