如何将图像调整大小转换添加到 CNTK

Question

我想添加一个图像转换（我称之为 ResizeTransformer）

将图像的较小尺寸调整为给定尺寸，同时保持原始纵横比

为了在不实现单独的 ResizeTransformer 的情况下实现这一点，我想修改 class ScaleTransformer : public ImageTransformerBase class in this file 但是，此 class 实现 StreamInformation ScaleTransformer::Transform(const StreamInformation& inputStream) 的目的是转换流，使所有样本的大小相同。我的查询是：

为什么需要实现这个功能？这是否会增加任何性能优势，或者这对于更基本的目的是否重要？
我是否必须将 ResizeTransformer() 作为单独的 class 实施？
在这种情况下，我是否必须实施 StreamInformation ResizeTransformer::Transform(const StreamInformation& inputStream？

需要进行此转换 这种转换是必需的，因为一个人的数据集中的所有图像都可以具有不同的大小，并且有人可能希望从每个图像中提取多个补丁。在这种情况下，最好的解决方案是将图像的较小尺寸调整为某个尺寸 S，该尺寸大于裁剪尺寸 C，然后从中提取尺寸为 C 的多个补丁它。这种数据增强在我所知道的某些论文中进行了实践。

PS : 我做了以下添加以添加 ResizeTransformer

我对如何测试它感到困惑。在 C++ 中编译成功，这意味着 C++ 代码是正确的。但我想在 python.

中使用它

我系统中 header file 的补充： `

class ResizeTransformer : public ImageTransformerBase
 {
 public:
   explicit ResizeTransformer(const Microsoft::MSR::CNTK::ConfigParameters& config);

 private:
   enum class ResizeMode
   {
     ResizeMin = 0,
     ResizeMax = 0
    };

   ResizeMode resize_mode;
   size_t resized_length;
   void Apply(uint8_t copyId, cv::Mat &mat) override;
 };

然后 source file：

ResizeTransformer::ResizeTransformer(const ConfigParameters& config) : ImageTransformerBase(config)
{
  resized_length = config(L"resized_length");
  if (resized_length <= 0)
    RuntimeError("Cannot resize any dimension of an image to zero or negative number.");

  string resize_type = config(L"resize_type", "ResizeMin");
  if (resize_type == "ResizeMin")
    resize_mode = ResizeMode::ResizeMin;
  else if (resize_type == "ResizeMax")
    resize_mode = ResizeMode::ResizeMax;
  else RuntimeError("Invalid resize_type. Must be one of ResizeMin and ResizeMax");
}

void ResizeTransformer::Apply(uint8_t, cv::Mat &mat)
{
  float height = mat.rows;
  float width = mat.cols;
  float aspectratio = height/width;
  float newheight{};
  float newwidth{};
  if (resize_mode == ResizeMode::ResizeMin)
    {
      if(height <=width)
    {
      newheight = resized_length;
      newwidth = newheight/aspectratio;
    }
      else
    {
      newheight = aspectratio * resized_length;
      newwidth = resized_length;
    }
    }
  else
    {
      if(height <=width)
    {
      newheight = aspectratio * resized_length;
      newwidth = resized_length;
    }
      else
    {
      newheight = resized_length;
      newwidth = newheight/aspectratio;
    }
    }
  resize(mat, mat, cv::Size2f(newwidth, newheight));
}

我将以下行添加到 this file

transformations.push_back(Transformation{ std::make_shared<ResizeTransformer>(featureStream), featureName });

然后我将以下内容添加到 this file

CNTK_API ImageTransform ReaderResize(int resized_length,
                                         const wchar_t* resize_type = L"ResizeMin");

最后我在 this file

中添加了以下函数

def resize(resized_length, resize_type='ResizeMin'):
    '''
    Resize transform that can be used to pass to `map_features`
    Given an input image, it will resize a given dimension to
    a fixed size (resized_length), while preserving the aspect ratio.


    Args:
        resized_length (int): A positive integer. It is the resized value of the
           dimension which has to be resized. The other dimension is resized while
           maintaining the aspect ratio.
        resize_type (str, default 'ResizeMin'): 'ResizeMin' or 'ResizeMax'.
           When 'ResizeMin', the smaller dimension of the image is resized to a fixed size
           given by resized_length, with the larger dimension resized in a way to preserve
           the priginal aspect ratio. When 'ResizeMax', the same operation is performed
           but now the larger dimension of the image is resized to a fixed size.
   Returns:
       A dictionary like object describing the ResizeTransform.
    '''
    return cntk_py.reader_resize(resized_length, resize_type)

Answer 1

1) 这允许上层尽可能提前定义缓冲区。因此，如果您知道您将调整大小为 (x, y) - 那么您可以在其中定义输出流形状（类似于 ScaleTransform）。否则 - 您可以在 Transform(SequenceDataPtr)/（如果您使用 ImageBaseTranform class）方法中设置图像布局。

2) 你可以，或者你可以更改 ScaleTransformer 来做你需要的（只需在配置中使用另一个参数）。

3) 如果您实现自己的 ResizeTranformer - 您可以简单地将 NDShape::Unknown 放入转换中，例如：

StreamInformation ResizeTranformer::Transform(
    const StreamInformation& inputStream) 
{
     TransformBase::Transform(inputStream);
     m_outputStream.m_sampleLayout = NDShape::Unknown();
     return m_outputStream; 
}

PS。代码看起来不错，但您可能仍需要如上所述在 inputStream 上添加一个 Transform。另请注意，当图像到达核心网络时，所有图像都应具有相同的维度。反序列化器不支持不同形状的图像。

如果您想公开 ResizeTransformer，您需要执行以下操作：

1) 实现 ResizerTranformer（正如我们上面所讨论的，你做到了）

2) 在 ImageReader/Exports.cpp 中将名称解析添加到 CreateTransformer 函数中，即

else if (type == L"Resize")
        *transformer = new ResizeTransformer(config);

（你这边好像少了这个）

3）在CNTKLibrary.h/MinibatchSource.cpp中添加工厂方法到C++ API，示例见scale transform (ReaderScale): (you did) ImageTransform ReaderResize(...) {...}

4) 在 bindings/python/cntk/io/transforms.py 中实现一个 python 包装器，检查参数等（你做到了） def resize(...):

然后，如果您重新编译并将 PATH 设置为 CNTK 的本地构建 (/x64/Release)，并将 PYTHON_PATH 设置为 /binding/python，您应该能够使用您的新转换。您可以将测试添加到 io/tests，然后转到 /binding/python/cntk 并简单地运行 "pytest".

我可能忘记了什么，所以如果您遇到任何问题，请询问 CNTK 团队，他们应该能够提供帮助。

谢谢！

如何将图像调整大小转换添加到 CNTK

How to add an Image Resizing transform to CNTK

image-resizing

cntk