无论定义操作如何，Tensorflow 迁移问题 tensorflow.python.framework.ops

Question

我是运行 a model from github 我已经遇到了几个路径错误等。修复这个问题后，我认为现在的主要错误是 tensorflow。这个 repo 可能是在 TF 1.x 时完成的，现在随着对 TF 2 的更改，我可能需要迁移所有内容。

主要是，我得到以下错误：

    @ops.RegisterShape('ApproxMatch')
AttributeError: module 'tensorflow.python.framework.ops' has no attribute 'RegisterShape'

在：

import tensorflow as tf
from tensorflow.python.framework import ops
import os.path as osp

base_dir = osp.dirname(osp.abspath(__file__))

approxmatch_module = tf.load_op_library(osp.join(base_dir, 'tf_approxmatch_so.so'))


def approx_match(xyz1,xyz2):
    '''
input:
    xyz1 : batch_size * #dataset_points * 3
    xyz2 : batch_size * #query_points * 3
returns:
    match : batch_size * #query_points * #dataset_points
    '''
    return approxmatch_module.approx_match(xyz1,xyz2)
ops.NoGradient('ApproxMatch')
#@tf.RegisterShape('ApproxMatch')
@ops.RegisterShape('ApproxMatch')
def _approx_match_shape(op):
    shape1=op.inputs[0].get_shape().with_rank(3)
    shape2=op.inputs[1].get_shape().with_rank(3)
    return [tf.TensorShape([shape1.dims[0],shape2.dims[1],shape1.dims[1]])]

2个我不明白的主要事情：

我读到这可能会让我不得不用 REGISTER_OP(...).SetShapeFn(...) 创建 ops C++ routines but at the same time, I can see that these are done in here: @ops.RegisterShape('ApproxMatch') . Like this and this。但我不认为我理解这个过程并且看到了其他相同的问题，但没有真正的 implementation/answer.
如果我转到 tf_approxmatch 共享库 ( approxmatch_module = tf.load_op_library(osp.join(base_dir, 'tf_approxmatch_so.so')) ) 的位置，我无法打开它或使用 gedit 编辑它，所以我假设我不应该更改其中的任何内容 (?)。

该文件夹中有 py、cpp 和 cu 个文件（我昨天已经 make 并且一切运行都很顺利）。

__init__.py     tf_approxmatch.cu.o   tf_nndistance.cu.o
makefile        tf_approxmatch.py     tf_nndistance.py
__pycache__     tf_approxmatch_so.so  tf_nndistance_so.so
tf_approxmatch.cpp  tf_nndistance.cpp
tf_approxmatch.cu   tf_nndistance.cu

我的主要猜测是我应该以某种方式在 cpp 文件中注册 RegisterShape 的操作，因为它已经有一些已注册的操作，但我有点迷茫，因为 我什至不确定我是否了解我遇到的问题。我将只显示文件的第一行：

#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include <algorithm>
#include <vector>
#include <math.h>
using namespace tensorflow;
REGISTER_OP("ApproxMatch")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Output("match: float32");
REGISTER_OP("MatchCost")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Input("match: float32")
    .Output("cost: float32");
REGISTER_OP("MatchCostGrad")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Input("match: float32")
    .Output("grad1: float32")
    .Output("grad2: float32");

Answer 1

根据 tensorflow 的发布日志，RegisterShape 已弃用，在 c++ 源文件中注册运算符时应使用 SetShapeFn 定义形状。

Answer 2

免责声明：除非别无选择，否则我强烈建议坚持使用 TensorFlow 1.x。将代码从 TF 1.x 迁移到 2.x 可能非常耗时。

注册形状是在 c++ 中使用 SetShapeFn 完成的，而不是在 python 中，因为 TF 1.0。然而，私有 python API 保留在 TF 1.x 中（我认为是出于向后兼容性的原因），但在 TF 2.0 中被完全删除。

在这种情况下，Create an Op 指南对于迁移代码非常有用，我强烈建议您阅读它。

首先，为什么要注册形状？形状推断需要它，该功能允许 TensorFlow 在没有运行实际代码的情况下知道计算图中输入和输出的形状。例如，形状推断允许在尝试对不具有兼容形状的张量使用操作时进行错误处理。

在您的特定情况下，您需要将使用 ops.RegisterShape 的 python 代码转换为使用 SetShapeFn 的 C++ 代码。值得庆幸的是，您正在使用的 github 存储库提供了有用的评论。

让我们从approx_match函数开始。 python 代码如下：

def approx_match(xyz1,xyz2):
    '''
input:
    xyz1 : batch_size * #dataset_points * 3
    xyz2 : batch_size * #query_points * 3
returns:
    match : batch_size * #query_points * #dataset_points
    '''
    return approxmatch_module.approx_match(xyz1,xyz2)
@ops.RegisterShape('ApproxMatch')
def _approx_match_shape(op):
    shape1=op.inputs[0].get_shape().with_rank(3)
    shape2=op.inputs[1].get_shape().with_rank(3)
    return [tf.TensorShape([shape1.dims[0],shape2.dims[1],shape1.dims[1]])]

阅读代码和注释，我们了解以下内容：

有 2 个输入 xyz1，xyz2
xyz1 的形状为 (batch_size, dataset_points, 3)
xyz2 的形状为 (batch_size, query_points, 3)
有1个输出：match
match 的形状为 (batch_size, query_points, dataset_points)

这将转换为以下 C++ 代码：

#include "tensorflow/core/framework/shape_inference.h"
using namespace tensorflow;
REGISTER_OP("ApproxMatch")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Output("match: float32")
    .SetShapeFn([](shape_inference::InferenceContext* c) {
        shape_inference::ShapeHandle xyz1_shape = c->input(0);
        shape_inference::ShapeHandle xyz2_shape = c->input(1);
        // batch_size is the first dimension 
        shape_inference::DimensionHandle batch_size = c->Dim(xyz1_shape, 0);
        // dataset_points points is the 2nd dimension of the first input
        shape_inference::DimensionHandle dataset_points = c->Dim(xyz1_shape, 1);
        // query_points points is the 2nd dimension of the second input
        shape_inference::DimensionHandle query_points = c->Dim(xyz2_shape, 1);
        // Creating the new shape (batch_size, query_points, dataset_points)
        // and setting it to the output
        c->set_output(0, c->MakeShape({batch_size, query_points, dataset_points})); 
        // Returning a status telling that everything went well
        return Status::OK();    
    });

警告：此代码不包含任何错误处理（例如，检查两个输入的第一个维度是否相同，或者两个输入的最后一个维度是 3).我将其作为对 reader 的练习，您可以查看上述指南或直接查看 source code of some ops 以了解如何进行错误处理，例如使用宏 TF_RETURN_IF_ERROR.

可以将相同的步骤应用于 match_cost 函数，它看起来像这样：

REGISTER_OP("MatchCost")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Input("match: float32")
    .Output("cost: float32")
    .SetShapeFn([](shape_inference::InferenceContext* c) {
        shape_inference::DimensionHandle batch_size = c->Dim(c->input(0), 0);
        c->set_output(0, c->Vector(batch_size));
        return Status::OK();        
    });

然后您需要使用项目中包含的 makefile 重新编译 so 库。您可能需要更改一些标志，例如，TF 2.8 使用 c++ 标准 c++14 而不是 c++11，因此需要标志 -std=c++14。编译库后，您可以在 python:

中测试导入它

>>> import tensorflow as tf
>>> approxmatch_module = tf.load_op_library('./tf_approxmatch_so.so')
>>> a = tf.random.uniform((10,20,3))
>>> b = tf.random.uniform((10,50,3))
>>> c = approxmatch_module.approx_match(a,b)
>>> c.shape
TensorShape([10, 50, 20])

无论定义操作如何，Tensorflow 迁移问题 tensorflow.python.framework.ops

Tensorflow migration problem regardless defining the operations tensorflow.python.framework.ops

python

migration

tensorflow