如何使用 numpy 数据和批量大小训练 pytorch 模型?

How to train pytorch model with numpy data and batch size?

我正在学习 pytorch 的基础知识,并想创建一个简单的 4 层带 dropout 的神经网络来训练 IRIS 数据集进行分类。参考了很多教程我写了这段代码

import pandas as pd
from sklearn.datasets import load_iris
import torch
from torch.autograd import Variable

epochs=300
batch_size=20
lr=0.01

#loading data as numpy array
data = load_iris()
X=data.data
y=pd.get_dummies(data.target).values

#convert to tensor
X= Variable(torch.from_numpy(X), requires_grad=False)
y=Variable(torch.from_numpy(y), requires_grad=False)
print(X.size(),y.size())

#neural net model
model = torch.nn.Sequential(
    torch.nn.Linear(4, 10),
    torch.nn.ReLU(),
    torch.nn.Dropout(),
    torch.nn.Linear(10, 5),
    torch.nn.ReLU(),
    torch.nn.Dropout(),
    torch.nn.Linear(5, 3),
    torch.nn.Softmax()
)

print(model)

# Loss and Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr)  
loss_func = torch.nn.CrossEntropyLoss()  

for i in range(epochs):
    # Forward pass
    y_pred = model(X)

    # Compute and print loss.
    loss = loss_func(y_pred, y)
    print(i, loss.data[0])

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable weights
    # of the model)
    optimizer.zero_grad()

    # Backward pass
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its parameters
    optimizer.step()

目前我面临两个问题。

  1. 我想设置批量大小20。我应该怎么做?
  2. 在这一步y_pred = model(X)它显​​示了这个错误

错误

 TypeError: addmm_ received an invalid combination of arguments - got (int, int, torch.DoubleTensor, torch.FloatTensor), but expected one of:
 * (torch.DoubleTensor mat1, torch.DoubleTensor mat2)
 * (torch.SparseDoubleTensor mat1, torch.DoubleTensor mat2)
 * (float beta, torch.DoubleTensor mat1, torch.DoubleTensor mat2)
 * (float alpha, torch.DoubleTensor mat1, torch.DoubleTensor mat2)
 * (float beta, torch.SparseDoubleTensor mat1, torch.DoubleTensor mat2)
 * (float alpha, torch.SparseDoubleTensor mat1, torch.DoubleTensor mat2)
 * (float beta, float alpha, torch.DoubleTensor mat1, torch.DoubleTensor mat2)
      didn't match because some of the arguments have invalid types: (int, int, torch.DoubleTensor, !torch.FloatTensor!)
 * (float beta, float alpha, torch.SparseDoubleTensor mat1, torch.DoubleTensor mat2)
      didn't match because some of the arguments have invalid types: (int, int, !torch.DoubleTensor!, !torch.FloatTensor!)

可能是同一问题:

简而言之:从 numpy 转换时,值存储在 DoubleTensor 中,而优化器需要 FloatTensor。你必须改变其中之一。

I want to set a batch size of 20. How should I do this?

对于数据处理和加载,PyTorch 提供了两种class,一种是Dataset,用来表示你的数据集。具体来说,Dataset 提供了使用样本索引从整个数据集中获取一个样本的接口。

但是Dataset还不够,对于大数据集,我们需要做批处理。因此 PyTorch 提供了第二个 class Dataloader,用于根据给定的批量大小和其他参数从 Dataset 生成批量。

对于你的具体情况,我认为你应该尝试 TensorDataset. Then use a Dataloader to set batch size to 20. Just look through the PyTorch official examples 来了解如何去做。

At this step y_pred = model(X) its showing this error

错误消息非常有用。您对模型的输入 X 是类型 DoubleTensor。但是您的模型参数的类型为 FloatTensor。在 PyTorch 中,不同类型的 Tensor 之间不能进行运算。你应该做的是替换行

X= Variable(torch.from_numpy(X), requires_grad=False)

X= Variable(torch.from_numpy(X).float(), requires_grad=False)

现在,X 的类型为 FloatTensor,错误消息应该会消失。

另外,温馨提示一下,网上关于你的问题的资料非常多,可以充分解决你的问题。你要努力自己解决。