如何将列中带有字符串的数据框转换为 csr_matrix
How to convert a dataframe with string in columns into csr_matrix
我正在处理 PMI 问题,到目前为止我有这样的数据框:
w = ['by', 'step', 'by', 'the', 'is', 'step', 'is', 'by', 'is']
c = ['step', 'what', 'is', 'what', 'the', 'the', 'step', 'the', 'what']
ppmi = [1, 3, 12, 3, 123, 1, 321, 1, 23]
df = pd.DataFrame({'w':w, 'c':c, 'ppmi': ppmi})
我想将此数据框转换为稀疏矩阵。由于 w
和 c
是字符串列表,如果我这样做 csr_matrix((ppmi, (w, c)))
,它会给我一个错误 TypeError: cannot perform reduce with flexible type
。转换此数据框的另一种方法是什么?
也许你可以试试 coo_matrix
:
import pandas as pd
import scipy.sparse as sps
w = ['by', 'step', 'by', 'the', 'is', 'step', 'is', 'by', 'is']
c = ['step', 'what', 'is', 'what', 'the', 'the', 'step', 'the', 'what']
ppmi = [1, 3, 12, 3, 123, 1, 321, 1, 23]
df = pd.DataFrame({'w':w, 'c':c, 'ppmi': ppmi})
df.set_index(['w', 'c'], inplace=True)
mat = sps.coo_matrix((df['ppmi'],(df.index.labels[0], df.index.labels[1])))
print(mat.todense())
输出:
[[ 12 1 1 0]
[ 0 321 123 23]
[ 0 0 1 3]
[ 0 0 0 3]]
我正在处理 PMI 问题,到目前为止我有这样的数据框:
w = ['by', 'step', 'by', 'the', 'is', 'step', 'is', 'by', 'is']
c = ['step', 'what', 'is', 'what', 'the', 'the', 'step', 'the', 'what']
ppmi = [1, 3, 12, 3, 123, 1, 321, 1, 23]
df = pd.DataFrame({'w':w, 'c':c, 'ppmi': ppmi})
我想将此数据框转换为稀疏矩阵。由于 w
和 c
是字符串列表,如果我这样做 csr_matrix((ppmi, (w, c)))
,它会给我一个错误 TypeError: cannot perform reduce with flexible type
。转换此数据框的另一种方法是什么?
也许你可以试试 coo_matrix
:
import pandas as pd
import scipy.sparse as sps
w = ['by', 'step', 'by', 'the', 'is', 'step', 'is', 'by', 'is']
c = ['step', 'what', 'is', 'what', 'the', 'the', 'step', 'the', 'what']
ppmi = [1, 3, 12, 3, 123, 1, 321, 1, 23]
df = pd.DataFrame({'w':w, 'c':c, 'ppmi': ppmi})
df.set_index(['w', 'c'], inplace=True)
mat = sps.coo_matrix((df['ppmi'],(df.index.labels[0], df.index.labels[1])))
print(mat.todense())
输出:
[[ 12 1 1 0]
[ 0 321 123 23]
[ 0 0 1 3]
[ 0 0 0 3]]