当 A 是下三角矩阵时实现 ATA 的更好方法

Question

在 Eigen 库中实现 A^T*A 可以这样写：

X.template triangularView<Lower>().setZero(); 
X.template selfadjointView<Lower>().rankUpdate(A.transpose());

如果A是下三角矩阵，有没有更好（更有效）的写法？我试过以下，但它给出了编译错误：

X.template selfadjointView<Lower>().rankUpdate(A.template triangularView<Lower>().transpose());

它给出错误：

 error: no matching member function for call to 'rankUpdate'

Answer 1

Eigen 通常只包括那些在 BLAS 中具有相应功能的矩阵例程。这种特殊情况没有 BLAS 例程。另一个问题是，Eigen 的 rankUpdate 实现和三角矩阵乘法不会并行化，除非您针对执行此操作的 BLAS 库定义 -DEIGEN_USE_BLAS 和 link，例如OpenBLAS.

我对各种实现进行了一些测试。

排名更新

  void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
  {
    out = Eigen::MatrixXf::Zero(in.rows(), in.cols());
    out.selfadjointView<Eigen::Lower>().rankUpdate(in.transpose());
    out.triangularView<Eigen::StrictlyUpper>() = out.transpose();
  }

仅与 OpenBLAS 并行化。然后就是挺快的，不然就有点慢了。

三角矩阵乘积

  void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
  { out.noalias() = in.transpose() * in.triangularView<Eigen::Lower>(); }

仅与 OpenBLAS 并行化。即使这样还是有点慢。

一般矩阵乘积

  void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
  { out.noalias() = in.transpose() * in; }

始终并行化。始终如一的快速。

自定义块乘法

我尝试实现我自己的版本，将矩阵分成块以删除冗余操作。这对于大型矩阵来说要快一些，尤其是在使用 OpenBLAS 时。

它不适用于小矩阵。

  void trtr_recursive
  (Eigen::Ref<Eigen::MatrixXf> out,
   const Eigen::Ref<const Eigen::MatrixXf>& in) noexcept
  {
    Eigen::Index size = out.rows();
    if(size <= 16) {
      /*
       * Two notes:
       * 1. The cutoff point is a possible tuning parameter
       * 2. This is the only place where we assume that the upper triangle
       *    is initialized to zero. We can remove this assumption by making
       *    a copy of the input into a fixed size matrix. Use Eigen::Matrix
       *    with the MaxRows and MaxCols template argument for this
       */
      out.selfadjointView<Eigen::Lower>().rankUpdate(in.transpose());
      return;
    }
    Eigen::Index full = (size + 1) >> 1;
    Eigen::Index part = size >> 1;
    const auto& fullin = in.bottomLeftCorner(full, full);
    const auto& bottomright = in.bottomRightCorner(part, part);
    const auto& bottomleft = in.bottomLeftCorner(part, full);
    out.topLeftCorner(full, full).selfadjointView<Eigen::Lower>()
      .rankUpdate(fullin.transpose());
    out.bottomLeftCorner(part, full).noalias() +=
      bottomright.transpose().triangularView<Eigen::Upper>() *
      bottomleft;
    trtr_recursive(out.topLeftCorner(part, part),
                   in.topLeftCorner(part, part));
    trtr_recursive(out.bottomRightCorner(part, part),
                   bottomright);
  }
  void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
  {
    out = Eigen::MatrixXf::Zero(in.rows(), in.cols());
    trtr4_recursive(out, in);
    out.triangularView<Eigen::StrictlyUpper>() = out.transpose();
  }

当 A 是下三角矩阵时实现 ATA 的更好方法

Better way to implement ATA when A is lower triangular matrix

eigen3

排名更新

三角矩阵乘积

一般矩阵乘积

自定义块乘法