烤宽面条的 csv 文件中的 numpy 数组
numpy array from csv file for lasagne
我开始学习如何将 theano 与 lasagne 一起使用,并从 mnist 示例开始。现在,我想试试我自己的例子:我有一个 train.csv 文件,其中每一行都以 0 或 1 开头,代表正确答案,后面跟着 773 个 0 和 1 代表输入。我不明白如何在 load_database() 函数中将此文件转换为所需的 numpy 数组。这是 mnist 数据库的原始函数的一部分:
...
with gzip.open(filename, 'rb') as f:
data = pickle_load(f, encoding='latin-1')
# The MNIST dataset we have here consists of six numpy arrays:
# Inputs and targets for the training set, validation set and test set.
X_train, y_train = data[0]
X_val, y_val = data[1]
X_test, y_test = data[2]
...
# We just return all the arrays in order, as expected in main().
# (It doesn't matter how we do this as long as we can read them again.)
return X_train, y_train, X_val, y_val, X_test, y_test
我需要从我的 csv 文件中获取 X_train(输入)和 y_train(每行的开头)。
谢谢!
您可以使用 numpy.genfromtxt()
或 numpy.loadtxt()
如下:
from sklearn.cross_validation import KFold
Xy = numpy.genfromtxt('yourfile.csv', delimiter=",")
# the next section provides the required
# training-validation set splitting but
# you can do it manually too, if you want
skf = KFold(len(Xy))
for train_index, valid_index in skf:
ind_train, ind_valid = train_index, valid_index
break
Xy_train, Xy_valid = Xy[ind_train], Xy[ind_valid]
X_train = Xy_train[:, 1:]
y_train = Xy_train[:, 0]
X_valid = Xy_valid[:, 1:]
y_valid = Xy_valid[:, 0]
...
# you can simply ignore the test sets in your case
return X_train, y_train, X_val, y_val #, X_test, y_test
在代码片段中,我们忽略了传递 test
集。
现在您可以将数据集导入主模块或脚本或其他任何内容,但请注意也从中删除所有测试部分。
或者您可以简单地将有效集作为 test
set:
# you can simply pass the valid sets as `test` set
return X_train, y_train, X_val, y_val, X_val, y_val
在后一种情况下,我们不必关心主要模块部分引用例外的 test
集,但作为分数(如果有)你将得到 validation scores
两次,即 test scores
.
注意:我不知道,那个mnist例子是哪个,但可能,你按照上面的方法准备好你的数据后,你还得在你的trainer中做进一步的修改模块也适合您的数据。例如:数据的输入形状,输出形状即 类 的数量 e.g.在你的情况下,前者是 773
,后者是 2
.
我开始学习如何将 theano 与 lasagne 一起使用,并从 mnist 示例开始。现在,我想试试我自己的例子:我有一个 train.csv 文件,其中每一行都以 0 或 1 开头,代表正确答案,后面跟着 773 个 0 和 1 代表输入。我不明白如何在 load_database() 函数中将此文件转换为所需的 numpy 数组。这是 mnist 数据库的原始函数的一部分:
...
with gzip.open(filename, 'rb') as f:
data = pickle_load(f, encoding='latin-1')
# The MNIST dataset we have here consists of six numpy arrays:
# Inputs and targets for the training set, validation set and test set.
X_train, y_train = data[0]
X_val, y_val = data[1]
X_test, y_test = data[2]
...
# We just return all the arrays in order, as expected in main().
# (It doesn't matter how we do this as long as we can read them again.)
return X_train, y_train, X_val, y_val, X_test, y_test
我需要从我的 csv 文件中获取 X_train(输入)和 y_train(每行的开头)。
谢谢!
您可以使用 numpy.genfromtxt()
或 numpy.loadtxt()
如下:
from sklearn.cross_validation import KFold
Xy = numpy.genfromtxt('yourfile.csv', delimiter=",")
# the next section provides the required
# training-validation set splitting but
# you can do it manually too, if you want
skf = KFold(len(Xy))
for train_index, valid_index in skf:
ind_train, ind_valid = train_index, valid_index
break
Xy_train, Xy_valid = Xy[ind_train], Xy[ind_valid]
X_train = Xy_train[:, 1:]
y_train = Xy_train[:, 0]
X_valid = Xy_valid[:, 1:]
y_valid = Xy_valid[:, 0]
...
# you can simply ignore the test sets in your case
return X_train, y_train, X_val, y_val #, X_test, y_test
在代码片段中,我们忽略了传递 test
集。
现在您可以将数据集导入主模块或脚本或其他任何内容,但请注意也从中删除所有测试部分。
或者您可以简单地将有效集作为 test
set:
# you can simply pass the valid sets as `test` set
return X_train, y_train, X_val, y_val, X_val, y_val
在后一种情况下,我们不必关心主要模块部分引用例外的 test
集,但作为分数(如果有)你将得到 validation scores
两次,即 test scores
.
注意:我不知道,那个mnist例子是哪个,但可能,你按照上面的方法准备好你的数据后,你还得在你的trainer中做进一步的修改模块也适合您的数据。例如:数据的输入形状,输出形状即 类 的数量 e.g.在你的情况下,前者是 773
,后者是 2
.