如何可视化实际的底层计算代码？

Question

我正在研究 Tensorflow。

在一些测试中，我想看看Tensorflow计算的实际代码。实际上，在定义图形之后，它会生成要计算的会话。

但是，我无法确定实际计算（例如 a*b、a+b、e^a 等）的位置。我期待代码在 C++ 中实现。例如，我想查看 tanh 操作的代码，但是，即使查看了 cwise_op_tanh.cc 文件，我也找不到它。

我能得到一些建议吗？

Answer 1

documentation on adding a new op (and related) 中解释了操作架构。它很灵活，用于自动代码生成，一开始很混乱。

在寻找实现时，我通常会在源代码中搜索REGISTER_OP 字符串。该宏向TensorFlow引擎注册了一个操作（详见前述link——比较复杂）。宏应该让所有 "pointers" 知道在哪里看。当每个设备类型（CPU、GPU、特殊 GPU）或 ramifications/compositions.

都有一个实现时，它可能会很复杂

比如在tanh上，我找tanh，或者好像约定是Tanh，在源代码中：

> cd /path/to/tensorflow/source
> grep -R REGISTER_OP tensorflow | grep -i tanh
tensorflow/core/ops/math_grad.cc:REGISTER_OP_GRADIENT("Tanh", TanhGrad);
tensorflow/core/ops/math_grad.cc:REGISTER_OP_GRADIENT("Atanh", AtanhGrad);
tensorflow/core/ops/math_ops.cc:REGISTER_OP("Tanh").UNARY_COMPLEX();
tensorflow/core/ops/math_ops.cc:REGISTER_OP("Atanh").UNARY_COMPLEX();
tensorflow/core/ops/math_ops.cc:REGISTER_OP("TanhGrad").UNARY_GRADIENT_COMPLEX();
tensorflow/core/ops/nn_ops.cc:REGISTER_OP("_MklTanh")
tensorflow/core/ops/nn_ops.cc:REGISTER_OP("_MklTanhGrad")

结果：tensorflow/core/ops/math_ops.cc:REGISTER_OP("Tanh").UNARY_COMPLEX(); 看起来像我们要找的东西。这导致 UNARY_COMPLEX 宏定义：

#define UNARY_COMPLEX()                                                  \
  Input("x: T")                                                          \
      .Output("y: T")                                                    \
      .Attr("T: {half, bfloat16, float, double, complex64, complex128}") \
      .SetShapeFn(shape_inference::UnchangedShape)

实际实现通常是包装在 SetShapeFn 调用中的函数。但是实现也经常被委托，就像这里一样。我们知道 tanh 是一个系数运算，通过更多的搜索，tensorflow/core/kernels/cwise_op_tanh.cc 和 tensorflow/core/kernels/cwise_op_gpu_tanh.cu.cc 是唯一注册 tanh 的内核。但是这次这些文件使用 "utilities" 就像 REGISTER5 一样用于更紧凑的样板注册。例如我们可以找到：

REGISTER5(UnaryOp, CPU, "Tanh", functor::tanh, float, Eigen::half, double,
      complex64, complex128);

最终会调用 REGISTER_KERNEL_BUILDER---partner of REGISTER_OP---to register the actual implementation for the registered operation. Looking through the code, the implementation is functor::tanh, which is a wrapper for the actual implementation, usually delegated to the Eigen linear algebra library. And Eigen may sometimes itself delegate 一些操作甚至更进一步。

备注：

一种可能加快搜索速度的方法是使用 REGISTER 而不是 REGISTER_OP 作为搜索模式。不过，那会 return 更多结果。
所有代码摘录均来自当前的 TensorFlow 版本（提交 ac8e67399d75edce6a9f94afaa2adb577035966e）。请注意代码经常更改。
Tensorflow 正在迁移到一个更抽象的架构，称为 XLA，出于某种范围或原因。所以上面的解释会过时

如何可视化实际的底层计算代码？

How to visualize actual underlying computation code?

tensorflow

tensorflow-gpu