NLTK: TypeError: unhashable type: 'list'
NLTK: TypeError: unhashable type: 'list'
我正在按照以下 bleu 评分的原始代码进行操作:
from nltk.translate.bleu_score import sentence_bleu
reference = [['this', 'is', 'a', 'test'], ['this', 'is' 'test']]
candidate = ['this', 'is', 'a', 'test']
score = sentence_bleu(reference, candidate)
print(score)
并且代码工作正常。但我正在尝试通过导入 csv 文件来更改 reference
和 candidate
,代码如下:
import nltk
import csv
import itertools
from nltk.translate.bleu_score import sentence_bleu
print("Opening references file...")
with open('bleu-ref.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
sentences = []
for row in spamreader:
# print(', '.join(row))
sentences.append(' '.join(row))
sent = [[i] for i in sentences]
reference = []
for i in range(len(sent)):
sent[i]
chink = []
for j in sent[i]:
chink = chink + nltk.word_tokenize(j)
reference.append(chink)
print("Opening candidates file...")
with open('bleu-can.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
sentences_can = []
for row in spamreader:
# print(', '.join(row))
sentences_can.append(' '.join(row))
sent_can = [[i] for i in sentences_can]
candidate = []
for i in range(len(sent_can)):
sent_can[i]
chink_can = []
for j in sent_can[i]:
chink_can = chink_can + nltk.word_tokenize(j)
candidate.append(chink_can)
score = sentence_bleu(reference, candidate)
但是遇到错误:
Traceback (most recent call last):
File "nltk-bleu-score.py", line 56, in <module>
score = sentence_bleu(reference, candidate)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 89, in sentence_bleu
emulate_multibleu)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 162, in corpus_bleu
p_i = modified_precision(references, hypothesis, i)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 292, in modified_precision
counts = Counter(ngrams(hypothesis, n)) if len(hypothesis) >= n else Counter()
File "C:\Users\Fachri\Anaconda3\lib\collections\__init__.py", line 535, in __init__
self.update(*args, **kwds)
File "C:\Users\Fachri\Anaconda3\lib\collections\__init__.py", line 622, in update
_count_elements(self, iterable)
TypeError: unhashable type: 'list'
然后,我检查 reference
和 candidate
的类型,无论是从原始代码还是修改后的代码,它 return 相同的类型 list
我不明白是什么让这些列表不同。
reference
和 `candidate' 的列表如下
Opening references file...
[['two', 'airplanes', 'are', 'waiting', 'on', 'the', 'tarmac'], ['Two', 'airplanes', 'parked', 'at', 'the', 'airport', '.']]
Opening candidates file...
[['An', 'airplane', 'sitting', 'on', 'the', 'tarmac', 'at', 'an', 'airport', 'with', 'another', 'plane', 'in', 'the', 'background', '.']]
hypothesis
的预期类型是 list(str)
,来自 documentation:
:type hypothesis: list(str)
candidate
是一个 list(list(str))
,你可以这样计算 bleu_score:
from nltk.translate.bleu_score import sentence_bleu
references = [['two', 'passenger', 'planes', 'on', 'a', 'grassy', 'plain'],
['An', 'airplane', 'sitting', 'on', 'the', 'tarmac', 'at', 'an', 'airport', 'with', 'another', 'plane',
'in', 'the', 'background', '.'],
['A', 'white', 'an', 'blue', 'airplane', 'parked', 'at', 'the', 'airport', 'near', 'another', 'small',
'plane', '.'], ['Blue', 'and', 'white', 'airplane', 'parked', '.'],
['two', 'airplanes', 'are', 'waiting', 'on', 'the', 'tarmac'],
['Two', 'airplanes', 'parked', 'at', 'the', 'airport', '.'],
['A', 'passenger', 'aircraft', 'with', 'landing', 'gear', 'down', '.'],
['A', 'passenger', 'jet', 'flies', 'through', 'the', 'air', '.'],
['A', 'passenger', 'plane', 'fly', 'through', 'the', 'sky', '.'],
['The', 'Austrian', 'plane', 'soars', 'in', 'the', 'sky', '.']]
candidates = [
['An', 'airplane', 'sitting', 'on', 'the', 'tarmac', 'at', 'an', 'airport', 'with', 'another', 'plane', 'in', 'the',
'background', '.'], ['A', 'passenger', 'jet', 'flies', 'through', 'the', 'air', '.']]
for candidate in candidates:
print(sentence_bleu(references, candidate))
输出
1.0
1.0
我正在按照以下 bleu 评分的原始代码进行操作:
from nltk.translate.bleu_score import sentence_bleu
reference = [['this', 'is', 'a', 'test'], ['this', 'is' 'test']]
candidate = ['this', 'is', 'a', 'test']
score = sentence_bleu(reference, candidate)
print(score)
并且代码工作正常。但我正在尝试通过导入 csv 文件来更改 reference
和 candidate
,代码如下:
import nltk
import csv
import itertools
from nltk.translate.bleu_score import sentence_bleu
print("Opening references file...")
with open('bleu-ref.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
sentences = []
for row in spamreader:
# print(', '.join(row))
sentences.append(' '.join(row))
sent = [[i] for i in sentences]
reference = []
for i in range(len(sent)):
sent[i]
chink = []
for j in sent[i]:
chink = chink + nltk.word_tokenize(j)
reference.append(chink)
print("Opening candidates file...")
with open('bleu-can.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
sentences_can = []
for row in spamreader:
# print(', '.join(row))
sentences_can.append(' '.join(row))
sent_can = [[i] for i in sentences_can]
candidate = []
for i in range(len(sent_can)):
sent_can[i]
chink_can = []
for j in sent_can[i]:
chink_can = chink_can + nltk.word_tokenize(j)
candidate.append(chink_can)
score = sentence_bleu(reference, candidate)
但是遇到错误:
Traceback (most recent call last):
File "nltk-bleu-score.py", line 56, in <module>
score = sentence_bleu(reference, candidate)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 89, in sentence_bleu
emulate_multibleu)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 162, in corpus_bleu
p_i = modified_precision(references, hypothesis, i)
File "C:\Users\Fachri\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py", line 292, in modified_precision
counts = Counter(ngrams(hypothesis, n)) if len(hypothesis) >= n else Counter()
File "C:\Users\Fachri\Anaconda3\lib\collections\__init__.py", line 535, in __init__
self.update(*args, **kwds)
File "C:\Users\Fachri\Anaconda3\lib\collections\__init__.py", line 622, in update
_count_elements(self, iterable)
TypeError: unhashable type: 'list'
然后,我检查 reference
和 candidate
的类型,无论是从原始代码还是修改后的代码,它 return 相同的类型 list
我不明白是什么让这些列表不同。
reference
和 `candidate' 的列表如下
Opening references file...
[['two', 'airplanes', 'are', 'waiting', 'on', 'the', 'tarmac'], ['Two', 'airplanes', 'parked', 'at', 'the', 'airport', '.']]
Opening candidates file...
[['An', 'airplane', 'sitting', 'on', 'the', 'tarmac', 'at', 'an', 'airport', 'with', 'another', 'plane', 'in', 'the', 'background', '.']]
hypothesis
的预期类型是 list(str)
,来自 documentation:
:type hypothesis: list(str)
candidate
是一个 list(list(str))
,你可以这样计算 bleu_score:
from nltk.translate.bleu_score import sentence_bleu
references = [['two', 'passenger', 'planes', 'on', 'a', 'grassy', 'plain'],
['An', 'airplane', 'sitting', 'on', 'the', 'tarmac', 'at', 'an', 'airport', 'with', 'another', 'plane',
'in', 'the', 'background', '.'],
['A', 'white', 'an', 'blue', 'airplane', 'parked', 'at', 'the', 'airport', 'near', 'another', 'small',
'plane', '.'], ['Blue', 'and', 'white', 'airplane', 'parked', '.'],
['two', 'airplanes', 'are', 'waiting', 'on', 'the', 'tarmac'],
['Two', 'airplanes', 'parked', 'at', 'the', 'airport', '.'],
['A', 'passenger', 'aircraft', 'with', 'landing', 'gear', 'down', '.'],
['A', 'passenger', 'jet', 'flies', 'through', 'the', 'air', '.'],
['A', 'passenger', 'plane', 'fly', 'through', 'the', 'sky', '.'],
['The', 'Austrian', 'plane', 'soars', 'in', 'the', 'sky', '.']]
candidates = [
['An', 'airplane', 'sitting', 'on', 'the', 'tarmac', 'at', 'an', 'airport', 'with', 'another', 'plane', 'in', 'the',
'background', '.'], ['A', 'passenger', 'jet', 'flies', 'through', 'the', 'air', '.']]
for candidate in candidates:
print(sentence_bleu(references, candidate))
输出
1.0
1.0