ValueError: inconsistent shapes after using MultiLabelBinarizer
ValueError: inconsistent shapes after using MultiLabelBinarizer
我正在尝试为我的 CRF 模型创建一个性能评估结果,它描述了这个词属于哪个词性。我创建了一个函数来以更 'datasetish' 的格式转换数据。此函数 returns 将数据作为两个列表,一个是特征字典,另一个是标签。
def transform_to_dataset(tagged_sentences):
X, y = [], []
for sentence, tags in tagged_sentences:
sent_word_features, sent_tags = [], []
for index in range(len(sentence)):
sent_word_features.append(extract_features(sentence, index)),
sent_tags.append(tags[index])
X.append(sent_word_features)
y.append(sent_tags)
return X, y
然后我在编码之前将集合划分为 training/testing 集合中的完整句子。
penn_train_size = int(0.8*len(penn_treebank))
penn_training = penn_treebank[:penn_train_size]
penn_testing = penn_treebank[penn_train_size:]
X_penn_train, y_penn_train = transform_to_dataset(penn_training)
X_penn_test, y_penn_test = transform_to_dataset(penn_testing)
然后我加载模型以训练和测试我的数据
penn_crf = CRF(
algorithm='lbfgs',
c1=0.01,
c2=0.1,
max_iterations=100,
all_possible_transitions=True
)
#The fit method is the default name used by Machine Learning algorithms to start training.
print("Started training on Penn Treebank corpus!")
penn_crf.fit(X_penn_train, y_penn_train)
print("Finished training on Penn Treebank corpus!")
然后我用
测试它
y_penn_pred=penn_crf.predict(X_penn_test)
但是当我尝试
from sklearn.metrics import accuracy_score
print("Accuracy: ", accuracy_score(y_penn_test, y_penn_pred))
它给出一个错误:
ValueError: You appear to be using a legacy multi-label data
representation. Sequence of sequences are no longer supported; use a
binary array or sparse matrix instead - the MultiLabelBinarizer
transformer can convert to this format.
但是当我尝试使用 MultiLabelBinarizer 时;
from sklearn.preprocessing import MultiLabelBinarizer
bin_y_penn_test = MultiLabelBinarizer().fit_transform(y_penn_test)
bin_y_penn_pred = MultiLabelBinarizer().fit_transform(y_penn_pred)
它给我一个错误:
ValueError: inconsistent shapes
这是完整的回溯
--------------------------------------------------------------------------- ValueError Traceback (most recent call last)
/tmp/ipykernel_5694/856179584.py in
1 from sklearn.metrics import accuracy_score
2
----> 3 print("Accuracy: ", accuracy_score(bin_y_penn_test, bin_y_penn_pred))
~/.local/lib/python3.8/site-packages/sklearn/utils/validation.py in
inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py
in accuracy_score(y_true, y_pred, normalize, sample_weight)
203 check_consistent_length(y_true, y_pred, sample_weight)
204 if y_type.startswith('multilabel'):
--> 205 differing_labels = count_nonzero(y_true - y_pred, axis=1)
206 score = differing_labels == 0
207 else:
~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in
sub(self, other)
431 elif isspmatrix(other):
432 if other.shape != self.shape:
--> 433 raise ValueError("inconsistent shapes")
434 return self._sub_sparse(other)
435 elif isdense(other):
ValueError: inconsistent shapes
我应该怎么做才能生成模型的混淆矩阵?
试试
penn_train_size = int(0.7*len(penn_treebank))
然后检查形状是否仍然不一致
bin_y_penn_test.shape
bin_y_penn_pred.shape
if bin_y_penn_test.shape == bin_y_penn_pred.shape:
print('Consistent Shape')
else:
print('Inconsistent Shape')
如果它发出一致的形状,做
from sklearn.metrics import multilabel_confusion_matrix
multilabel_confusion_matrix(bin_y_penn_test, bin_y_penn_pred)
但是,如果仍然不一致,请尝试修改您的数据。
我正在尝试为我的 CRF 模型创建一个性能评估结果,它描述了这个词属于哪个词性。我创建了一个函数来以更 'datasetish' 的格式转换数据。此函数 returns 将数据作为两个列表,一个是特征字典,另一个是标签。
def transform_to_dataset(tagged_sentences):
X, y = [], []
for sentence, tags in tagged_sentences:
sent_word_features, sent_tags = [], []
for index in range(len(sentence)):
sent_word_features.append(extract_features(sentence, index)),
sent_tags.append(tags[index])
X.append(sent_word_features)
y.append(sent_tags)
return X, y
然后我在编码之前将集合划分为 training/testing 集合中的完整句子。
penn_train_size = int(0.8*len(penn_treebank))
penn_training = penn_treebank[:penn_train_size]
penn_testing = penn_treebank[penn_train_size:]
X_penn_train, y_penn_train = transform_to_dataset(penn_training)
X_penn_test, y_penn_test = transform_to_dataset(penn_testing)
然后我加载模型以训练和测试我的数据
penn_crf = CRF(
algorithm='lbfgs',
c1=0.01,
c2=0.1,
max_iterations=100,
all_possible_transitions=True
)
#The fit method is the default name used by Machine Learning algorithms to start training.
print("Started training on Penn Treebank corpus!")
penn_crf.fit(X_penn_train, y_penn_train)
print("Finished training on Penn Treebank corpus!")
然后我用
测试它y_penn_pred=penn_crf.predict(X_penn_test)
但是当我尝试
from sklearn.metrics import accuracy_score
print("Accuracy: ", accuracy_score(y_penn_test, y_penn_pred))
它给出一个错误:
ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead - the MultiLabelBinarizer transformer can convert to this format.
但是当我尝试使用 MultiLabelBinarizer 时;
from sklearn.preprocessing import MultiLabelBinarizer
bin_y_penn_test = MultiLabelBinarizer().fit_transform(y_penn_test)
bin_y_penn_pred = MultiLabelBinarizer().fit_transform(y_penn_pred)
它给我一个错误:
ValueError: inconsistent shapes
这是完整的回溯
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ipykernel_5694/856179584.py in 1 from sklearn.metrics import accuracy_score 2 ----> 3 print("Accuracy: ", accuracy_score(bin_y_penn_test, bin_y_penn_pred))
~/.local/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0
~/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight) 203 check_consistent_length(y_true, y_pred, sample_weight) 204 if y_type.startswith('multilabel'): --> 205 differing_labels = count_nonzero(y_true - y_pred, axis=1) 206 score = differing_labels == 0 207 else:
~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in sub(self, other) 431 elif isspmatrix(other): 432 if other.shape != self.shape: --> 433 raise ValueError("inconsistent shapes") 434 return self._sub_sparse(other) 435 elif isdense(other):
ValueError: inconsistent shapes
我应该怎么做才能生成模型的混淆矩阵?
试试
penn_train_size = int(0.7*len(penn_treebank))
然后检查形状是否仍然不一致
bin_y_penn_test.shape
bin_y_penn_pred.shape
if bin_y_penn_test.shape == bin_y_penn_pred.shape:
print('Consistent Shape')
else:
print('Inconsistent Shape')
如果它发出一致的形状,做
from sklearn.metrics import multilabel_confusion_matrix
multilabel_confusion_matrix(bin_y_penn_test, bin_y_penn_pred)
但是,如果仍然不一致,请尝试修改您的数据。