如何根据另一个函数的结果对 scipy.sparse.csr.csr_matrix 进行排序？

Question

我正在学习机器学习，更准确地说是逻辑回归/分类。在我的代码中，我有一个 <class 'scipy.sparse.csr.csr_matrix'> 对象。我需要根据 LogisticRegression.predict_proba 的结果对这个稀疏矩阵或生成它的 SFrame 进行排序（使用 ...），准确地说是数组的第二列，它们包含在predict_proba.

的结果数组

我如何生成稀疏矩阵：

from sklearn.feature_extraction.text import CountVectorizer

products = sframe.SFrame('...')

train_data, test_data = products.random_split(.8, seed=1)

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
test_matrix = vectorizer.transform(test_data['review_clean'])

我如何计算概率：

sentiment_model.predict_proba(test_matrix)

（其中 sentiment_model 是学习分类器，使用逻辑回归）这给了我一个 <class 'numpy.ndarray'>，它看起来像这样：

[[  4.65761066e-03   9.95342389e-01]
 [  9.75851270e-01   2.41487300e-02]
 [  9.99983374e-01   1.66258341e-05]]

这里是 SFrame 数据的示例，如果我使用 print 函数打印它：

+-------------------------------+-------------------------------+--------+
|              name             |             review            | rating |
+-------------------------------+-------------------------------+--------+
|   Our Baby Girl Memory Book   | Absolutely love it and all... |  5.0   |
| Wall Decor Removable Decal... | Would not purchase again o... |  2.0   |
| New Style Trailing Cherry ... | Was so excited to get this... |  1.0   |
+-------------------------------+-------------------------------+--------+
+-------------------------------+-----------+
|          review_clean         | sentiment |
+-------------------------------+-----------+
| Absolutely love it and all... |     1     |
| Would not purchase again o... |     -1    |
| Was so excited to get this... |     -1    |
+-------------------------------+-----------+

所以我需要一些函数，它可以根据 predict_proba 函数的结果对矩阵进行排序。

问题：我怎样才能这样排序？

我已经尝试过的

sorted(test_matrix)

结果：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

sorted(test_matrix_complete, key=lambda x: sentiment_model.predict_proba(x))

也导致：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

主要问题是，我不知道如何有效地建立 SFrame 数据和稀疏矩阵之间的连接，我想。

Answer 1

您可以简单地按 result 的排序索引进行索引。

sorted_matrix = test_matrix[np.argsort(result)]

如何根据另一个函数的结果对 scipy.sparse.csr.csr_matrix 进行排序？

How to sort scipy.sparse.csr.csr_matrix according to result of another function?

python

sorting

numpy

scipy

sparse-matrix

我已经尝试过的