np.asarray() 给我一列数组,其中数据是多列
np.asarray() gives me one column array where data was multi column
print(X_train_bow.shape) #Output: (897, 2794)
print(type(X_train_bow)) #Output: <class 'scipy.sparse.csr.csr_matrix'>
x_train_groups = [X_train_bow[i::5] for i in range(5)]
print(x_train_groups[0].shape) #Output: (299, 2794)
print(type(X_train_bow[0])) #Output: <class 'scipy.sparse.csr.csr_matrix'>
K = 2
train_data = []
test_data = []
for j in range(0, 5):
if(j != K):
train_data.extend(x_train_groups[j])
test_data.extend(x_train_groups[K])
print(np.asarray(train_data).shape) #Output: (598,)
print(np.asarray(test_data).shape) #Output: (299,)
我正在尝试 k 折交叉验证。
所以我创建了一种合并训练和测试数据的方法。
但问题是,当我调用 np.asarray 时,它 returns 与原始数据形状不同的形状数组。
你可以看到代码。我还打印了输出以寻求帮助。
您正在调用 .extend()
并传入一个二维数组。我怀疑你的每个 train_data
元素都有 2794 "columns" 和类似的 test_data
.
只需将这些直接设置为 np.arrays
而不是扩展列表。
类似于:
K = 1
for j in range(0, 3):
if(j != K):
try:
np.vstack((train_data, x_train_groups[j]))
except NameError:
train_data = x_train_groups[j]
test_data = x_train_groups[K]
如果您尝试自己实现 sklearn 的 train_test_split(),您可以使用当前代码做的是:
import numpy as np
train_data = np.array(x_train_groups)[299:, ]
# shape: (598, 2794) by selecting row 299 onwards
test_data = np.array(x_train_groups)[0:299, ]
# shape: (299, 2794) by selecting first 299 rows
让我们做一个小的演示 csr 矩阵:
In [212]: M = (sparse.random(12,3,.5, 'csr')*10).astype(int)
In [213]: M
Out[213]:
<12x3 sparse matrix of type '<class 'numpy.int64'>'
with 18 stored elements in Compressed Sparse Row format>
In [214]: M.A
Out[214]:
array([[3, 1, 3],
[0, 0, 1],
[1, 0, 9],
[0, 6, 0],
[5, 4, 0],
[4, 5, 6],
[3, 0, 0],
[0, 0, 5],
[0, 0, 2],
[0, 1, 0],
[0, 0, 0],
[0, 9, 0]])
您的分组生成了一个小型 csr 矩阵列表
In [216]: alist = [M[i::3] for i in range(3)]
In [217]: alist
Out[217]:
[<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>]
查看K
案例:
In [218]: data = []
In [219]: data.extend(alist[2])
In [220]: data
Out[220]:
[<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 3 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>]
List extend
将可迭代的元素添加到列表中(在 'flat' 意义上)。对稀疏矩阵 (alist[2]
) 的迭代产生一堆 1 行稀疏矩阵(仍然是 2d)。
我们可以使用 sparse.vstack
:
加入他们
In [221]: sparse.vstack(data)
Out[221]:
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [222]: sparse.vstack(data).A
Out[222]:
array([[1, 0, 9],
[4, 5, 6],
[0, 0, 2],
[0, 9, 0]])
这与子矩阵的来源相同。
In [223]: alist[2]
Out[223]:
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [224]: alist[2].A
Out[224]:
array([[1, 0, 9],
[4, 5, 6],
[0, 0, 2],
[0, 9, 0]])
将 data
列表放入 array
中只会生成 1 行稀疏矩阵的 1d 对象 dtype 数组。这些矩阵只是 np.array
的外来对象。作为一般规则,不要指望 numpy
函数对稀疏矩阵执行 'right' 操作。
In [225]: np.array(data)
Out[225]:
array([<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 3 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>], dtype=object)
不要只看形状。检查dtype
,并检查一些元素!
print(X_train_bow.shape) #Output: (897, 2794)
print(type(X_train_bow)) #Output: <class 'scipy.sparse.csr.csr_matrix'>
x_train_groups = [X_train_bow[i::5] for i in range(5)]
print(x_train_groups[0].shape) #Output: (299, 2794)
print(type(X_train_bow[0])) #Output: <class 'scipy.sparse.csr.csr_matrix'>
K = 2
train_data = []
test_data = []
for j in range(0, 5):
if(j != K):
train_data.extend(x_train_groups[j])
test_data.extend(x_train_groups[K])
print(np.asarray(train_data).shape) #Output: (598,)
print(np.asarray(test_data).shape) #Output: (299,)
我正在尝试 k 折交叉验证。 所以我创建了一种合并训练和测试数据的方法。 但问题是,当我调用 np.asarray 时,它 returns 与原始数据形状不同的形状数组。 你可以看到代码。我还打印了输出以寻求帮助。
您正在调用 .extend()
并传入一个二维数组。我怀疑你的每个 train_data
元素都有 2794 "columns" 和类似的 test_data
.
只需将这些直接设置为 np.arrays
而不是扩展列表。
类似于:
K = 1
for j in range(0, 3):
if(j != K):
try:
np.vstack((train_data, x_train_groups[j]))
except NameError:
train_data = x_train_groups[j]
test_data = x_train_groups[K]
如果您尝试自己实现 sklearn 的 train_test_split(),您可以使用当前代码做的是:
import numpy as np
train_data = np.array(x_train_groups)[299:, ]
# shape: (598, 2794) by selecting row 299 onwards
test_data = np.array(x_train_groups)[0:299, ]
# shape: (299, 2794) by selecting first 299 rows
让我们做一个小的演示 csr 矩阵:
In [212]: M = (sparse.random(12,3,.5, 'csr')*10).astype(int)
In [213]: M
Out[213]:
<12x3 sparse matrix of type '<class 'numpy.int64'>'
with 18 stored elements in Compressed Sparse Row format>
In [214]: M.A
Out[214]:
array([[3, 1, 3],
[0, 0, 1],
[1, 0, 9],
[0, 6, 0],
[5, 4, 0],
[4, 5, 6],
[3, 0, 0],
[0, 0, 5],
[0, 0, 2],
[0, 1, 0],
[0, 0, 0],
[0, 9, 0]])
您的分组生成了一个小型 csr 矩阵列表
In [216]: alist = [M[i::3] for i in range(3)]
In [217]: alist
Out[217]:
[<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>]
查看K
案例:
In [218]: data = []
In [219]: data.extend(alist[2])
In [220]: data
Out[220]:
[<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 3 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>]
List extend
将可迭代的元素添加到列表中(在 'flat' 意义上)。对稀疏矩阵 (alist[2]
) 的迭代产生一堆 1 行稀疏矩阵(仍然是 2d)。
我们可以使用 sparse.vstack
:
In [221]: sparse.vstack(data)
Out[221]:
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [222]: sparse.vstack(data).A
Out[222]:
array([[1, 0, 9],
[4, 5, 6],
[0, 0, 2],
[0, 9, 0]])
这与子矩阵的来源相同。
In [223]: alist[2]
Out[223]:
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [224]: alist[2].A
Out[224]:
array([[1, 0, 9],
[4, 5, 6],
[0, 0, 2],
[0, 9, 0]])
将 data
列表放入 array
中只会生成 1 行稀疏矩阵的 1d 对象 dtype 数组。这些矩阵只是 np.array
的外来对象。作为一般规则,不要指望 numpy
函数对稀疏矩阵执行 'right' 操作。
In [225]: np.array(data)
Out[225]:
array([<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 3 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>,
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>], dtype=object)
不要只看形状。检查dtype
,并检查一些元素!