Scipy 从 json 文件稀疏
Scipy sparse from json file
我尝试用 scipy.sparse 从 json 文件中创建一个矩阵。
我有json这样的文件
{"reviewerID": "A10000012B7CGYKOMPQ4L", "asin": "000100039X", "reviewerName": "Adam", "helpful": [0, 0], "reviewText": "Spiritually and mentally inspiring! A book that allows you to question your morals and will help you discover who you really are!", "overall": 5.0, "summary": "Wonderful!", "unixReviewTime": 1355616000, "reviewTime": "12 16, 2012"}
这是我的 Json 格式...更多类似的元素(基于亚马逊评论文件)
并希望执行 scipy 稀疏矩阵以获得此矩阵
count
object a b c d
id
him NaN 1 NaN 1
me 1 NaN NaN 1
you 1 NaN 1 NaN
我正在尝试这样做
我
mport numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
df= pd.read_json('C:\Users\anto-\Desktop\university\Big Data computing\Ex. Resource\test2.json',lines=True)
a= df['reviewerID']
b= df['asin']
data= df.groupby(["reviewerID"]).size()
row = df.reviewerID.astype('category', categories=a).cat.codes
col = df.asin.astype('category', categories=b).cat.codes
sparse_matrix = csr_matrix((data, (row, col)), shape=(len(a), len(b)))
阅读这个旧例子
我的代码中有一些关于 deprecates 元素的错误,但我不明白如何构建这个矩阵。
这是错误日志:
FutureWarning: specifying 'categories' or 'ordered' in .astype() is deprecated; pass a CategoricalDtype instead
from ipykernel import kernelapp as app
我有点困惑。
任何人都可以给我一些建议或类似的例子吗?
生成看起来像
的稀疏矩阵
count
object a b c d
id
him NaN 1 NaN 1
me 1 NaN NaN 1
you 1 NaN 1 NaN
您需要生成 3 个数组,例如:
In [215]: from scipy import sparse
In [216]: data = np.array([1,1,1,1,1,1])
In [217]: row = np.array([1,2,0,2,0,1])
In [218]: col = np.array([0,0,1,2,3,3])
In [219]: M = sparse.csr_matrix((data, (row, col)), shape=(3,4))
In [220]: M
Out[220]:
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [221]: M.A
Out[221]:
array([[0, 1, 0, 1],
[1, 0, 0, 1],
[1, 0, 1, 0]], dtype=int64)
'him'、'me'、'you' 等类别必须映射到唯一索引,如 0、1、2。 'a','b','c','d'.
也是如此
我尝试用 scipy.sparse 从 json 文件中创建一个矩阵。
我有json这样的文件
{"reviewerID": "A10000012B7CGYKOMPQ4L", "asin": "000100039X", "reviewerName": "Adam", "helpful": [0, 0], "reviewText": "Spiritually and mentally inspiring! A book that allows you to question your morals and will help you discover who you really are!", "overall": 5.0, "summary": "Wonderful!", "unixReviewTime": 1355616000, "reviewTime": "12 16, 2012"}
这是我的 Json 格式...更多类似的元素(基于亚马逊评论文件)
并希望执行 scipy 稀疏矩阵以获得此矩阵
count
object a b c d
id
him NaN 1 NaN 1
me 1 NaN NaN 1
you 1 NaN 1 NaN
我正在尝试这样做
我
mport numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
df= pd.read_json('C:\Users\anto-\Desktop\university\Big Data computing\Ex. Resource\test2.json',lines=True)
a= df['reviewerID']
b= df['asin']
data= df.groupby(["reviewerID"]).size()
row = df.reviewerID.astype('category', categories=a).cat.codes
col = df.asin.astype('category', categories=b).cat.codes
sparse_matrix = csr_matrix((data, (row, col)), shape=(len(a), len(b)))
阅读这个旧例子
我的代码中有一些关于 deprecates 元素的错误,但我不明白如何构建这个矩阵。
这是错误日志:
FutureWarning: specifying 'categories' or 'ordered' in .astype() is deprecated; pass a CategoricalDtype instead
from ipykernel import kernelapp as app
我有点困惑。 任何人都可以给我一些建议或类似的例子吗?
生成看起来像
的稀疏矩阵 count
object a b c d
id
him NaN 1 NaN 1
me 1 NaN NaN 1
you 1 NaN 1 NaN
您需要生成 3 个数组,例如:
In [215]: from scipy import sparse
In [216]: data = np.array([1,1,1,1,1,1])
In [217]: row = np.array([1,2,0,2,0,1])
In [218]: col = np.array([0,0,1,2,3,3])
In [219]: M = sparse.csr_matrix((data, (row, col)), shape=(3,4))
In [220]: M
Out[220]:
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [221]: M.A
Out[221]:
array([[0, 1, 0, 1],
[1, 0, 0, 1],
[1, 0, 1, 0]], dtype=int64)
'him'、'me'、'you' 等类别必须映射到唯一索引,如 0、1、2。 'a','b','c','d'.
也是如此