是否可以使用预先计算的因式分解来加速 backslash\mldivide 与稀疏矩阵

Question

我执行了多次求解线性方程组的迭代：Mx=b 具有大而稀疏的 M。 M 在迭代之间不会改变，但 b 会改变。我已经尝试了几种方法，到目前为止发现 backslash\mldivide 是最有效和最准确的。

下面的代码与我正在做的非常相似：

for ii=1:num_iter
  x = M\x;
  x = x+dx;
end

现在我想利用 M 是固定的这一事实来进一步加速计算。

设置标志spparms('spumoni',2)允许求解器算法的详细信息。

我运行了以下代码：

spparms('spumoni',2);
x = M\B;

输出（监控）：

sp\: bandwidth = 2452+1+2452.
sp\: is A diagonal? no.
sp\: is band density (0.01) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with Control parameters:
UMFPACK V5.4.0 (May 20, 2009), Control:
    Matrix entry defined as: double
    Int (generic integer) defined as: UF_long

    0: print level: 2
    1: dense row parameter:    0.2
        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
    2: dense column parameter: 0.2
        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
    3: pivot tolerance: 0.1
    4: block size for dense matrix kernels: 32
    5: strategy: 0 (auto)
    6: initial allocation ratio: 0.7
    7: max iterative refinement steps: 2
    12: 2-by-2 pivot tolerance: 0.01
    13: Q fixed during numerical factorization: 0 (auto)
    14: AMD dense row/col parameter:    10
       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
        Only used if the AMD ordering is used.
    15: diagonal pivot tolerance: 0.001
        Only used if diagonal pivoting is attempted.
    16: scaling: 1 (divide each row by sum of abs. values in each row)
    17: frontal matrix allocation ratio: 0.5
    18: drop tolerance: 0
    19: AMD and COLAMD aggressive absorption: 1 (yes)

    The following options can only be changed at compile-time:
    8: BLAS library used:  Fortran BLAS.  size of BLAS integer: 8
    9: compiled for MATLAB
    10: CPU timer is ANSI C clock (may wrap around).
    11: compiled for normal operation (debugging disabled)
    computer/operating system: Microsoft Windows
    size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)

sp\: is UMFPACK's symbolic LU factorization (with automatic reordering) successful? yes.
sp\: is UMFPACK's numeric LU factorization successful? yes.
sp\: is UMFPACK's triangular solve successful? yes.
sp\: UMFPACK Statistics:
UMFPACK V5.4.0 (May 20, 2009), Info:
    matrix entry defined as:          double
    Int (generic integer) defined as: UF_long
    BLAS library used: Fortran BLAS.  size of BLAS integer: 8
    MATLAB:                           yes.
    CPU timer:                        ANSI clock ( ) routine.
    number of rows in matrix A:       3468
    number of columns in matrix A:    3468
    entries in matrix A:              60252
    memory usage reported in:         16-byte Units
    size of int:                      4 bytes
    size of UF_long:                  8 bytes
    size of pointer:                  8 bytes
    size of numerical entry:          8 bytes

    strategy used:                    symmetric
    ordering used:                    amd on A+A'
    modify Q during factorization:    no
    prefer diagonal pivoting:         yes
    pivots with zero Markowitz cost:               1284
    submatrix S after removing zero-cost pivots:
        number of "dense" rows:                    0
        number of "dense" columns:                 0
        number of empty rows:                      0
        number of empty columns                    0
        submatrix S square and diagonal preserved
    pattern of square submatrix S:
        number rows and columns                    2184
        symmetry of nonzero pattern:               0.904903
        nz in S+S' (excl. diagonal):               62184
        nz on diagonal of matrix S:                2184
        fraction of nz on diagonal:                1.000000
    AMD statistics, for strict diagonal pivoting:
        est. flops for LU factorization:           2.76434e+007
        est. nz in L+U (incl. diagonal):           306216
        est. largest front (# entries):            31329
        est. max nz in any column of L:            177
        number of "dense" rows/columns in S+S':    0
    symbolic factorization defragmentations:       0
    symbolic memory usage (Units):                 174698
    symbolic memory usage (MBytes):                2.7
    Symbolic size (Units):                         9196
    Symbolic size (MBytes):                        0
    symbolic factorization CPU time (sec):         0.00
    symbolic factorization wallclock time(sec):    0.00

    matrix scaled: yes (divided each row by sum of abs values in each row)
    minimum sum (abs (rows of A)):              1.00000e+000
    maximum sum (abs (rows of A)):              9.75375e+003

    symbolic/numeric factorization:      upper bound               actual      %
    variable-sized part of Numeric object:
        initial size (Units)                  149803               146332    98%
        peak size (Units)                    1037500               202715    20%
        final size (Units)                    787803               154127    20%
    Numeric final size (Units)                806913               171503    21%
    Numeric final size (MBytes)                 12.3                  2.6    21%
    peak memory usage (Units)                1083860               249075    23%
    peak memory usage (MBytes)                  16.5                  3.8    23%
    numeric factorization flops         5.22115e+008         2.59546e+007     5%
    nz in L (incl diagonal)                   593172               145107    24%
    nz in U (incl diagonal)                   835128               154044    18%
    nz in L+U (incl diagonal)                1424832               295683    21%
    largest front (# entries)                 348768                30798     9%
    largest # rows in front                      519                  175    34%
    largest # columns in front                   672                  177    26%

    initial allocation ratio used:                 0.309
    # of forced updates due to frontal growth:     1
    number of off-diagonal pivots:                 0
    nz in L (incl diagonal), if none dropped       145107
    nz in U (incl diagonal), if none dropped       154044
    number of small entries dropped                0
    nonzeros on diagonal of U:                     3468
    min abs. value on diagonal of U:               4.80e-002
    max abs. value on diagonal of U:               1.00e+000
    estimate of reciprocal of condition number:    4.80e-002
    indices in compressed pattern:                 13651
    numerical values stored in Numeric object:     295806
    numeric factorization defragmentations:        0
    numeric factorization reallocations:           0
    costly numeric factorization reallocations:    0
    numeric factorization CPU time (sec):          0.05
    numeric factorization wallclock time (sec):    0.00
    numeric factorization mflops (CPU time):       552.22

    solve flops:                                   1.78396e+006
    iterative refinement steps taken:              1
    iterative refinement steps attempted:          1
    sparse backward error omega1:                  1.80e-016
    sparse backward error omega2:                  0.00e+000
    solve CPU time (sec):                          0.00
    solve wall clock time (sec):                   0.00

    total symbolic + numeric + solve flops:        2.77385e+007

观察线条：

numeric factorization flops         5.22115e+008         2.59546e+007     5%
solve flops:                                   1.78396e+006
total symbolic + numeric + solve flops:        2.77385e+007

表示M的因式分解用了2.59546e+007/2.77385e+007 = 93.6%的求解时间

我想在我的迭代之外提前计算因式分解，然后只运行最后一个阶段，这大约需要 6.5% CPU 时间。

我知道如何计算因式分解 ([L,U,P,Q,R] = lu(M);)，但我不知道如何将其输出用作求解器的输入。

我想本着以下精神经营一些东西：

[L,U,P,Q,R] = lu(M);
for ii=1:num_iter
  dx = solve_pre_factored(M,P,Q,R,x);
  x = x+dx;
end

有没有办法在 Matlab 中做到这一点？

Answer 1

您必须问问自己，LU 分解中的所有这些矩阵都做了什么。

如 the documentation 所述：

[L,U,P,Q,R] = lu(A) returns unit lower triangular matrix L, upper triangular matrix U, permutation matrices P and Q, and a diagonal scaling matrix R so that P*(R\A)Q = LU for sparse non-empty A. Typically, but not always, the row-scaling leads to a sparser and more stable factorization. The statement lu(A,'matrix') returns identical output values.

因此，在更多的数学术语中，我们有 PR^-1AQ = LU，因此 A = RP^-1LUQ^-1

那么x = M\x可以按以下步骤重写：

y = R^-1x
z = P y
u = L^-1z
v = U^-1u
w = Q v
x = w

要反转 U、L 和 R，您可以使用 \，它将识别它们是三角形（R 是对角线）矩阵 -正如监控应该确认的那样，并为它们使用适当的简单求解器。

因此以更密集和 matlab 编写的方式：x = Q*(U\(L\(P*(R\x))));

正如您所问，这样做将完全是求解器 \ 内部发生的事情，只有一个因式分解。

然而，正如评论中所述，大量的反演计算 N = M^-1 一次会变得更快，然后只计算一次一个简单的矩阵向量乘法，比上面解释的过程简单得多。初始计算 inv(M) 更长且 has some limitations，因此这种权衡还取决于矩阵的属性。

是否可以使用预先计算的因式分解来加速 backslash\mldivide 与稀疏矩阵

Is it possible to use pre-calculated factorization to accelerate backslash\mldivide with sparse matrix

math

matlab

linear-algebra

umfpack