从文件加载数据并规范化
Load data from file and normalize
如何规范化从文件加载的数据?这是我所拥有的。数据看起来像这样:
65535, 3670, 65535, 3885, -0.73, 1
65535, 3962, 65535, 3556, -0.72, 1
每一行的最后一个值是一个目标。我想要具有相同结构的数据,但具有标准化值。
import numpy as np
dataset = np.loadtxt('infrared_data.txt', delimiter=',')
# select first 5 columns as the data
X = dataset[:, 0:5]
# is that correct? Should I normalize along 0 axis?
normalized_X = preprocessing.normalize(X, axis=0)
y = dataset[:, 5]
现在的问题是,如何将 normalized_X
和 y
正确打包回去,使其具有以下结构:
dataset = [[normalized_X[0], y[0]],[normalized_X[1], y[1]],...]
听起来你在要求 np.column_stack
。例如,让我们设置一些虚拟数据:
import numpy as np
x = np.arange(25).reshape(5, 5)
y = np.arange(5) + 1000
这给了我们:
X:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Y:
array([1000, 1001, 1002, 1003, 1004])
我们想要:
new = np.column_stack([x, y])
这给了我们:
New:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
如果您希望减少输入,您还可以使用:
In [4]: np.c_[x, y]
Out[4]:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
但是,出于可读性考虑,我不鼓励将 np.c_
用于交互用途以外的任何用途。
如何规范化从文件加载的数据?这是我所拥有的。数据看起来像这样:
65535, 3670, 65535, 3885, -0.73, 1
65535, 3962, 65535, 3556, -0.72, 1
每一行的最后一个值是一个目标。我想要具有相同结构的数据,但具有标准化值。
import numpy as np
dataset = np.loadtxt('infrared_data.txt', delimiter=',')
# select first 5 columns as the data
X = dataset[:, 0:5]
# is that correct? Should I normalize along 0 axis?
normalized_X = preprocessing.normalize(X, axis=0)
y = dataset[:, 5]
现在的问题是,如何将 normalized_X
和 y
正确打包回去,使其具有以下结构:
dataset = [[normalized_X[0], y[0]],[normalized_X[1], y[1]],...]
听起来你在要求 np.column_stack
。例如,让我们设置一些虚拟数据:
import numpy as np
x = np.arange(25).reshape(5, 5)
y = np.arange(5) + 1000
这给了我们:
X:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Y:
array([1000, 1001, 1002, 1003, 1004])
我们想要:
new = np.column_stack([x, y])
这给了我们:
New:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
如果您希望减少输入,您还可以使用:
In [4]: np.c_[x, y]
Out[4]:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
但是,出于可读性考虑,我不鼓励将 np.c_
用于交互用途以外的任何用途。