如何使用 torchvision.transforms 对 Pytorch 中的分割任务进行数据扩充?
How to use torchvision.transforms for data augmentation of segmentation task in Pytorch?
我对在 PyTorch 中执行的数据扩充有点困惑。
因为我们处理的是分割任务,需要data和mask做同样的data augmentation,但是有些是随机的,比如随机旋转。
Keras提供了random seed
保证data和mask做同样的操作,如下代码所示:
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=25,
horizontal_flip=True,
vertical_flip=True)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
seed = 1
image_generator = image_datagen.flow(train_data, seed=seed, batch_size=1)
mask_generator = mask_datagen.flow(train_label, seed=seed, batch_size=1)
train_generator = zip(image_generator, mask_generator)
在Pytorch官方文档中没有找到类似的描述,所以不知道如何保证data和mask可以同步处理。
Pytorch确实提供了这样的功能,但是我想把它应用到一个自定义的Dataloader中。
例如:
def __getitem__(self, index):
img = np.zeros((self.im_ht, self.im_wd, channel_size))
mask = np.zeros((self.im_ht, self.im_wd, channel_size))
temp_img = np.load(Image_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
temp_label = np.load(Label_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
for i in range(channel_size):
img[:,:,i] = temp_img[self.count[index] + i]
mask[:,:,i] = temp_label[self.count[index] + i]
if self.transforms:
img = np.uint8(img)
mask = np.uint8(mask)
img = self.transforms(img)
mask = self.transforms(mask)
return img, mask
在这种情况下,img和mask会分别进行变换,因为随机旋转等一些操作是随机的,所以mask和image的对应关系可能会改变。换句话说,图像可能已经旋转但蒙版没有这样做。
编辑 1
我用了augmentations.py中的方法,但是报错:
Traceback (most recent call last):
File "test_transform.py", line 87, in <module>
for batch_idx, image, mask in enumerate(train_loader):
File "/home/dirk/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 314, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/dirk/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 314, in <listcomp>
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/dirk/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in __getitem__
return self.dataset[self.indices[idx]]
File "/home/dirk/home/data/dirk/segmentation_unet_pytorch/data.py", line 164, in __getitem__
img, mask = self.transforms(img, mask)
File "/home/dirk/home/data/dirk/segmentation_unet_pytorch/augmentations.py", line 17, in __call__
img, mask = a(img, mask)
TypeError: __call__() takes 2 positional arguments but 3 were given
这是我的__getitem__()
代码:
data_transforms = {
'train': Compose([
RandomHorizontallyFlip(),
RandomRotate(degree=25),
transforms.ToTensor()
]),
}
train_set = DatasetUnetForTestTransform(fold=args.fold, random_index=args.random_index,transforms=data_transforms['train'])
# __getitem__ in class DatasetUnetForTestTransform
def __getitem__(self, index):
img = np.zeros((self.im_ht, self.im_wd, channel_size))
mask = np.zeros((self.im_ht, self.im_wd, channel_size))
temp_img = np.load(Label_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
temp_label = np.load(Label_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
temp_img, temp_label = crop_data_label_from_0(temp_img, temp_label)
for i in range(channel_size):
img[:,:,i] = temp_img[self.count[index] + i]
mask[:,:,i] = temp_label[self.count[index] + i]
if self.transforms:
img = T.ToPILImage()(np.uint8(img))
mask = T.ToPILImage()(np.uint8(mask))
img, mask = self.transforms(img, mask)
img = T.ToTensor()(img).copy()
mask = T.ToTensor()(mask).copy()
return img, mask
编辑 2
我发现在ToTensor之后,相同标签之间的骰子变成了255而不是1,如何解决?
# Dice computation
def DSC_computation(label, pred):
pred_sum = pred.sum()
label_sum = label.sum()
inter_sum = np.logical_and(pred, label).sum()
return 2 * float(inter_sum) / (pred_sum + label_sum)
欢迎询问是否需要更多代码来解释问题。
torchvision
也提供类似的功能[document].
这是一个简单的例子,
import torchvision
from torchvision import transforms
trans = transforms.Compose([transforms.CenterCrop((178, 178)),
transforms.Resize(128),
transforms.RandomRotation(20),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
dset = torchvision.datasets.MNIST(data_root, transforms=trans)
编辑
自定义您自己的 CelebA 数据集时的一个简短示例。请注意,要应用转换,您需要在 __getitem__
.
中调用 transform
列表
class CelebADataset(Dataset):
def __init__(self, root, transforms=None, num=None):
super(CelebADataset, self).__init__()
self.img_root = os.path.join(root, 'img_align_celeba')
self.attr_root = os.path.join(root, 'Anno/list_attr_celeba.txt')
self.transforms = transforms
df = pd.read_csv(self.attr_root, sep='\s+', header=1, index_col=0)
#print(df.columns.tolist())
if num is None:
self.labels = df.values
self.img_name = df.index.values
else:
self.labels = df.values[:num]
self.img_name = df.index.values[:num]
def __getitem__(self, index):
img = Image.open(os.path.join(self.img_root, self.img_name[index]))
# only use blond_hair, eyeglass, male, smile
indices = [9, 15, 20, 31]
label = np.take(self.labels[index], indices)
label[label==-1] = 0
if self.transforms is not None:
img = self.transforms(img)
return np.asarray(img), label
def __len__(self):
return len(self.labels)
编辑 2
我可能第一眼看漏了什么。您的问题的要点是如何将 "the same" 数据预处理应用于 img 和标签。据我了解,没有可用的 Pytorch 内置函数。所以,我之前所做的就是自己实现增强。
class RandomRotate(object):
def __init__(self, degree):
self.degree = degree
def __call__(self, img, mask):
rotate_degree = random.random() * 2 * self.degree - self.degree
return img.rotate(rotate_degree, Image.BILINEAR),
mask.rotate(rotate_degree, Image.NEAREST)
请注意,输入应为 PIL 格式。有关详细信息,请参阅 this。
需要像 RandomCrop
这样的输入参数的转换有一个 get_param
方法,可以 return 该特定转换的参数。然后可以使用转换的功能接口将其应用于图像和蒙版:
from torchvision import transforms
import torchvision.transforms.functional as F
i, j, h, w = transforms.RandomCrop.get_params(input, (100, 100))
input = F.crop(input, i, j, h, w)
target = F.crop(target, i, j, h, w)
此处提供示例:
https://github.com/pytorch/vision/releases/tag/v0.2.0
此处提供了 VOC 和 COCO 的完整示例:
https://github.com/pytorch/vision/blob/master/references/segmentation/transforms.py
https://github.com/pytorch/vision/blob/master/references/segmentation/train.py
关于错误,
ToTensor()
未被覆盖以处理额外的掩码参数,因此它不能在 data_transforms
中。此外,__getitem__
在 return 之前对 img
和 mask
进行了 ToTensor
。
data_transforms = {
'train': Compose([
RandomHorizontallyFlip(),
RandomRotate(degree=25),
#transforms.ToTensor() => remove this line
]),
}
另一个想法是沿着通道维度堆叠图像和蒙版,然后将它们一起变换。显然这只适用于几何类型的转换,你需要为两者使用相同的 dtype。我使用这样的东西:
# Apply these to image and mask
affine_transforms = transforms.Compose([
transforms.RandomAffine(degrees=180),
...
])
# Apply these to image only
image_transforms = transforms.Compose([
transforms.GaussianBlur(),
...
])
# Loader...
def __getitem__(self, index: int):
# Get the image and mask, here shape=(HxW) for both
image = self.images[index]
mask = self.masks[index]
# Stack the image and mask together so they get the same geometric transformations
stacked = torch.cat([image, mask], dim=0) # shape=(2xHxW)
stacked = self.affine_transforms(stacked)
# Split them back up again
image, mask = torch.chunk(stacked, chunks=2, dim=0)
# Image transforms are only applied to the image
image = self.image_transforms(image)
return image, mask
我对在 PyTorch 中执行的数据扩充有点困惑。
因为我们处理的是分割任务,需要data和mask做同样的data augmentation,但是有些是随机的,比如随机旋转。
Keras提供了random seed
保证data和mask做同样的操作,如下代码所示:
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=25,
horizontal_flip=True,
vertical_flip=True)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
seed = 1
image_generator = image_datagen.flow(train_data, seed=seed, batch_size=1)
mask_generator = mask_datagen.flow(train_label, seed=seed, batch_size=1)
train_generator = zip(image_generator, mask_generator)
在Pytorch官方文档中没有找到类似的描述,所以不知道如何保证data和mask可以同步处理。
Pytorch确实提供了这样的功能,但是我想把它应用到一个自定义的Dataloader中。
例如:
def __getitem__(self, index):
img = np.zeros((self.im_ht, self.im_wd, channel_size))
mask = np.zeros((self.im_ht, self.im_wd, channel_size))
temp_img = np.load(Image_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
temp_label = np.load(Label_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
for i in range(channel_size):
img[:,:,i] = temp_img[self.count[index] + i]
mask[:,:,i] = temp_label[self.count[index] + i]
if self.transforms:
img = np.uint8(img)
mask = np.uint8(mask)
img = self.transforms(img)
mask = self.transforms(mask)
return img, mask
在这种情况下,img和mask会分别进行变换,因为随机旋转等一些操作是随机的,所以mask和image的对应关系可能会改变。换句话说,图像可能已经旋转但蒙版没有这样做。
编辑 1
我用了augmentations.py中的方法,但是报错:
Traceback (most recent call last):
File "test_transform.py", line 87, in <module>
for batch_idx, image, mask in enumerate(train_loader):
File "/home/dirk/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 314, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/dirk/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 314, in <listcomp>
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/dirk/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in __getitem__
return self.dataset[self.indices[idx]]
File "/home/dirk/home/data/dirk/segmentation_unet_pytorch/data.py", line 164, in __getitem__
img, mask = self.transforms(img, mask)
File "/home/dirk/home/data/dirk/segmentation_unet_pytorch/augmentations.py", line 17, in __call__
img, mask = a(img, mask)
TypeError: __call__() takes 2 positional arguments but 3 were given
这是我的__getitem__()
代码:
data_transforms = {
'train': Compose([
RandomHorizontallyFlip(),
RandomRotate(degree=25),
transforms.ToTensor()
]),
}
train_set = DatasetUnetForTestTransform(fold=args.fold, random_index=args.random_index,transforms=data_transforms['train'])
# __getitem__ in class DatasetUnetForTestTransform
def __getitem__(self, index):
img = np.zeros((self.im_ht, self.im_wd, channel_size))
mask = np.zeros((self.im_ht, self.im_wd, channel_size))
temp_img = np.load(Label_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
temp_label = np.load(Label_path + '{:0>4}'.format(self.patient_index[index]) + '.npy')
temp_img, temp_label = crop_data_label_from_0(temp_img, temp_label)
for i in range(channel_size):
img[:,:,i] = temp_img[self.count[index] + i]
mask[:,:,i] = temp_label[self.count[index] + i]
if self.transforms:
img = T.ToPILImage()(np.uint8(img))
mask = T.ToPILImage()(np.uint8(mask))
img, mask = self.transforms(img, mask)
img = T.ToTensor()(img).copy()
mask = T.ToTensor()(mask).copy()
return img, mask
编辑 2
我发现在ToTensor之后,相同标签之间的骰子变成了255而不是1,如何解决?
# Dice computation
def DSC_computation(label, pred):
pred_sum = pred.sum()
label_sum = label.sum()
inter_sum = np.logical_and(pred, label).sum()
return 2 * float(inter_sum) / (pred_sum + label_sum)
欢迎询问是否需要更多代码来解释问题。
torchvision
也提供类似的功能[document].
这是一个简单的例子,
import torchvision
from torchvision import transforms
trans = transforms.Compose([transforms.CenterCrop((178, 178)),
transforms.Resize(128),
transforms.RandomRotation(20),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
dset = torchvision.datasets.MNIST(data_root, transforms=trans)
编辑
自定义您自己的 CelebA 数据集时的一个简短示例。请注意,要应用转换,您需要在 __getitem__
.
transform
列表
class CelebADataset(Dataset):
def __init__(self, root, transforms=None, num=None):
super(CelebADataset, self).__init__()
self.img_root = os.path.join(root, 'img_align_celeba')
self.attr_root = os.path.join(root, 'Anno/list_attr_celeba.txt')
self.transforms = transforms
df = pd.read_csv(self.attr_root, sep='\s+', header=1, index_col=0)
#print(df.columns.tolist())
if num is None:
self.labels = df.values
self.img_name = df.index.values
else:
self.labels = df.values[:num]
self.img_name = df.index.values[:num]
def __getitem__(self, index):
img = Image.open(os.path.join(self.img_root, self.img_name[index]))
# only use blond_hair, eyeglass, male, smile
indices = [9, 15, 20, 31]
label = np.take(self.labels[index], indices)
label[label==-1] = 0
if self.transforms is not None:
img = self.transforms(img)
return np.asarray(img), label
def __len__(self):
return len(self.labels)
编辑 2
我可能第一眼看漏了什么。您的问题的要点是如何将 "the same" 数据预处理应用于 img 和标签。据我了解,没有可用的 Pytorch 内置函数。所以,我之前所做的就是自己实现增强。
class RandomRotate(object):
def __init__(self, degree):
self.degree = degree
def __call__(self, img, mask):
rotate_degree = random.random() * 2 * self.degree - self.degree
return img.rotate(rotate_degree, Image.BILINEAR),
mask.rotate(rotate_degree, Image.NEAREST)
请注意,输入应为 PIL 格式。有关详细信息,请参阅 this。
需要像 RandomCrop
这样的输入参数的转换有一个 get_param
方法,可以 return 该特定转换的参数。然后可以使用转换的功能接口将其应用于图像和蒙版:
from torchvision import transforms
import torchvision.transforms.functional as F
i, j, h, w = transforms.RandomCrop.get_params(input, (100, 100))
input = F.crop(input, i, j, h, w)
target = F.crop(target, i, j, h, w)
此处提供示例: https://github.com/pytorch/vision/releases/tag/v0.2.0
此处提供了 VOC 和 COCO 的完整示例: https://github.com/pytorch/vision/blob/master/references/segmentation/transforms.py https://github.com/pytorch/vision/blob/master/references/segmentation/train.py
关于错误,
ToTensor()
未被覆盖以处理额外的掩码参数,因此它不能在 data_transforms
中。此外,__getitem__
在 return 之前对 img
和 mask
进行了 ToTensor
。
data_transforms = {
'train': Compose([
RandomHorizontallyFlip(),
RandomRotate(degree=25),
#transforms.ToTensor() => remove this line
]),
}
另一个想法是沿着通道维度堆叠图像和蒙版,然后将它们一起变换。显然这只适用于几何类型的转换,你需要为两者使用相同的 dtype。我使用这样的东西:
# Apply these to image and mask
affine_transforms = transforms.Compose([
transforms.RandomAffine(degrees=180),
...
])
# Apply these to image only
image_transforms = transforms.Compose([
transforms.GaussianBlur(),
...
])
# Loader...
def __getitem__(self, index: int):
# Get the image and mask, here shape=(HxW) for both
image = self.images[index]
mask = self.masks[index]
# Stack the image and mask together so they get the same geometric transformations
stacked = torch.cat([image, mask], dim=0) # shape=(2xHxW)
stacked = self.affine_transforms(stacked)
# Split them back up again
image, mask = torch.chunk(stacked, chunks=2, dim=0)
# Image transforms are only applied to the image
image = self.image_transforms(image)
return image, mask