是否可以使用预先计算的因式分解来加速 backslash\mldivide 与稀疏矩阵
Is it possible to use pre-calculated factorization to accelerate backslash\mldivide with sparse matrix
我执行了多次求解线性方程组的迭代:Mx=b
具有大而稀疏的 M。
M 在迭代之间不会改变,但 b 会改变。我已经尝试了几种方法,到目前为止发现 backslash\mldivide 是最有效和最准确的。
下面的代码与我正在做的非常相似:
for ii=1:num_iter
x = M\x;
x = x+dx;
end
现在我想利用 M 是固定的这一事实来进一步加速计算。
设置标志spparms('spumoni',2)
允许求解器算法的详细信息。
我运行了以下代码:
spparms('spumoni',2);
x = M\B;
输出(监控):
sp\: bandwidth = 2452+1+2452.
sp\: is A diagonal? no.
sp\: is band density (0.01) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with Control parameters:
UMFPACK V5.4.0 (May 20, 2009), Control:
Matrix entry defined as: double
Int (generic integer) defined as: UF_long
0: print level: 2
1: dense row parameter: 0.2
"dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries)
2: dense column parameter: 0.2
"dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
3: pivot tolerance: 0.1
4: block size for dense matrix kernels: 32
5: strategy: 0 (auto)
6: initial allocation ratio: 0.7
7: max iterative refinement steps: 2
12: 2-by-2 pivot tolerance: 0.01
13: Q fixed during numerical factorization: 0 (auto)
14: AMD dense row/col parameter: 10
"dense" rows/columns have > max (16, (10)*sqrt(n)) entries
Only used if the AMD ordering is used.
15: diagonal pivot tolerance: 0.001
Only used if diagonal pivoting is attempted.
16: scaling: 1 (divide each row by sum of abs. values in each row)
17: frontal matrix allocation ratio: 0.5
18: drop tolerance: 0
19: AMD and COLAMD aggressive absorption: 1 (yes)
The following options can only be changed at compile-time:
8: BLAS library used: Fortran BLAS. size of BLAS integer: 8
9: compiled for MATLAB
10: CPU timer is ANSI C clock (may wrap around).
11: compiled for normal operation (debugging disabled)
computer/operating system: Microsoft Windows
size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
sp\: is UMFPACK's symbolic LU factorization (with automatic reordering) successful? yes.
sp\: is UMFPACK's numeric LU factorization successful? yes.
sp\: is UMFPACK's triangular solve successful? yes.
sp\: UMFPACK Statistics:
UMFPACK V5.4.0 (May 20, 2009), Info:
matrix entry defined as: double
Int (generic integer) defined as: UF_long
BLAS library used: Fortran BLAS. size of BLAS integer: 8
MATLAB: yes.
CPU timer: ANSI clock ( ) routine.
number of rows in matrix A: 3468
number of columns in matrix A: 3468
entries in matrix A: 60252
memory usage reported in: 16-byte Units
size of int: 4 bytes
size of UF_long: 8 bytes
size of pointer: 8 bytes
size of numerical entry: 8 bytes
strategy used: symmetric
ordering used: amd on A+A'
modify Q during factorization: no
prefer diagonal pivoting: yes
pivots with zero Markowitz cost: 1284
submatrix S after removing zero-cost pivots:
number of "dense" rows: 0
number of "dense" columns: 0
number of empty rows: 0
number of empty columns 0
submatrix S square and diagonal preserved
pattern of square submatrix S:
number rows and columns 2184
symmetry of nonzero pattern: 0.904903
nz in S+S' (excl. diagonal): 62184
nz on diagonal of matrix S: 2184
fraction of nz on diagonal: 1.000000
AMD statistics, for strict diagonal pivoting:
est. flops for LU factorization: 2.76434e+007
est. nz in L+U (incl. diagonal): 306216
est. largest front (# entries): 31329
est. max nz in any column of L: 177
number of "dense" rows/columns in S+S': 0
symbolic factorization defragmentations: 0
symbolic memory usage (Units): 174698
symbolic memory usage (MBytes): 2.7
Symbolic size (Units): 9196
Symbolic size (MBytes): 0
symbolic factorization CPU time (sec): 0.00
symbolic factorization wallclock time(sec): 0.00
matrix scaled: yes (divided each row by sum of abs values in each row)
minimum sum (abs (rows of A)): 1.00000e+000
maximum sum (abs (rows of A)): 9.75375e+003
symbolic/numeric factorization: upper bound actual %
variable-sized part of Numeric object:
initial size (Units) 149803 146332 98%
peak size (Units) 1037500 202715 20%
final size (Units) 787803 154127 20%
Numeric final size (Units) 806913 171503 21%
Numeric final size (MBytes) 12.3 2.6 21%
peak memory usage (Units) 1083860 249075 23%
peak memory usage (MBytes) 16.5 3.8 23%
numeric factorization flops 5.22115e+008 2.59546e+007 5%
nz in L (incl diagonal) 593172 145107 24%
nz in U (incl diagonal) 835128 154044 18%
nz in L+U (incl diagonal) 1424832 295683 21%
largest front (# entries) 348768 30798 9%
largest # rows in front 519 175 34%
largest # columns in front 672 177 26%
initial allocation ratio used: 0.309
# of forced updates due to frontal growth: 1
number of off-diagonal pivots: 0
nz in L (incl diagonal), if none dropped 145107
nz in U (incl diagonal), if none dropped 154044
number of small entries dropped 0
nonzeros on diagonal of U: 3468
min abs. value on diagonal of U: 4.80e-002
max abs. value on diagonal of U: 1.00e+000
estimate of reciprocal of condition number: 4.80e-002
indices in compressed pattern: 13651
numerical values stored in Numeric object: 295806
numeric factorization defragmentations: 0
numeric factorization reallocations: 0
costly numeric factorization reallocations: 0
numeric factorization CPU time (sec): 0.05
numeric factorization wallclock time (sec): 0.00
numeric factorization mflops (CPU time): 552.22
solve flops: 1.78396e+006
iterative refinement steps taken: 1
iterative refinement steps attempted: 1
sparse backward error omega1: 1.80e-016
sparse backward error omega2: 0.00e+000
solve CPU time (sec): 0.00
solve wall clock time (sec): 0.00
total symbolic + numeric + solve flops: 2.77385e+007
观察线条:
numeric factorization flops 5.22115e+008 2.59546e+007 5%
solve flops: 1.78396e+006
total symbolic + numeric + solve flops: 2.77385e+007
表示M的因式分解用了2.59546e+007/2.77385e+007 = 93.6%的求解时间
我想在我的迭代之外提前计算因式分解,然后只运行最后一个阶段,这大约需要 6.5% CPU 时间。
我知道如何计算因式分解 ([L,U,P,Q,R] = lu(M);
),但我不知道如何将其输出用作求解器的输入。
我想本着以下精神经营一些东西:
[L,U,P,Q,R] = lu(M);
for ii=1:num_iter
dx = solve_pre_factored(M,P,Q,R,x);
x = x+dx;
end
有没有办法在 Matlab 中做到这一点?
您必须问问自己,LU 分解中的所有这些矩阵都做了什么。
如 the documentation 所述:
[L,U,P,Q,R] = lu(A) returns unit lower triangular matrix L, upper triangular matrix U, permutation matrices P and Q, and a diagonal scaling matrix R so that P*(R\A)Q = LU for sparse non-empty A. Typically, but not always, the row-scaling leads to a sparser and more stable factorization. The statement lu(A,'matrix') returns identical output values.
因此,在更多的数学术语中,我们有 PR
-1
AQ = LU
,因此 A = RP
-1
LUQ
-1
那么x = M\x
可以按以下步骤重写:
y = R
-1
x
z = P y
u = L
-1
z
v = U
-1
u
w = Q v
x = w
要反转 U
、L
和 R
,您可以使用 \
,它将识别它们是三角形(R
是对角线)矩阵 -正如监控应该确认的那样,并为它们使用适当的简单求解器。
因此以更密集和 matlab 编写的方式:x = Q*(U\(L\(P*(R\x))));
正如您所问,这样做将完全是求解器 \
内部发生的事情,只有一个因式分解。
然而,正如评论中所述,大量的反演计算 N = M
-1
一次会变得更快,然后只计算一次一个简单的矩阵向量乘法,比上面解释的过程简单得多。初始计算 inv(M)
更长且 has some limitations,因此这种权衡还取决于矩阵的属性。
我执行了多次求解线性方程组的迭代:Mx=b
具有大而稀疏的 M。
M 在迭代之间不会改变,但 b 会改变。我已经尝试了几种方法,到目前为止发现 backslash\mldivide 是最有效和最准确的。
下面的代码与我正在做的非常相似:
for ii=1:num_iter
x = M\x;
x = x+dx;
end
现在我想利用 M 是固定的这一事实来进一步加速计算。
设置标志spparms('spumoni',2)
允许求解器算法的详细信息。
我运行了以下代码:
spparms('spumoni',2);
x = M\B;
输出(监控):
sp\: bandwidth = 2452+1+2452.
sp\: is A diagonal? no.
sp\: is band density (0.01) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with Control parameters:
UMFPACK V5.4.0 (May 20, 2009), Control:
Matrix entry defined as: double
Int (generic integer) defined as: UF_long
0: print level: 2
1: dense row parameter: 0.2
"dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries)
2: dense column parameter: 0.2
"dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
3: pivot tolerance: 0.1
4: block size for dense matrix kernels: 32
5: strategy: 0 (auto)
6: initial allocation ratio: 0.7
7: max iterative refinement steps: 2
12: 2-by-2 pivot tolerance: 0.01
13: Q fixed during numerical factorization: 0 (auto)
14: AMD dense row/col parameter: 10
"dense" rows/columns have > max (16, (10)*sqrt(n)) entries
Only used if the AMD ordering is used.
15: diagonal pivot tolerance: 0.001
Only used if diagonal pivoting is attempted.
16: scaling: 1 (divide each row by sum of abs. values in each row)
17: frontal matrix allocation ratio: 0.5
18: drop tolerance: 0
19: AMD and COLAMD aggressive absorption: 1 (yes)
The following options can only be changed at compile-time:
8: BLAS library used: Fortran BLAS. size of BLAS integer: 8
9: compiled for MATLAB
10: CPU timer is ANSI C clock (may wrap around).
11: compiled for normal operation (debugging disabled)
computer/operating system: Microsoft Windows
size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
sp\: is UMFPACK's symbolic LU factorization (with automatic reordering) successful? yes.
sp\: is UMFPACK's numeric LU factorization successful? yes.
sp\: is UMFPACK's triangular solve successful? yes.
sp\: UMFPACK Statistics:
UMFPACK V5.4.0 (May 20, 2009), Info:
matrix entry defined as: double
Int (generic integer) defined as: UF_long
BLAS library used: Fortran BLAS. size of BLAS integer: 8
MATLAB: yes.
CPU timer: ANSI clock ( ) routine.
number of rows in matrix A: 3468
number of columns in matrix A: 3468
entries in matrix A: 60252
memory usage reported in: 16-byte Units
size of int: 4 bytes
size of UF_long: 8 bytes
size of pointer: 8 bytes
size of numerical entry: 8 bytes
strategy used: symmetric
ordering used: amd on A+A'
modify Q during factorization: no
prefer diagonal pivoting: yes
pivots with zero Markowitz cost: 1284
submatrix S after removing zero-cost pivots:
number of "dense" rows: 0
number of "dense" columns: 0
number of empty rows: 0
number of empty columns 0
submatrix S square and diagonal preserved
pattern of square submatrix S:
number rows and columns 2184
symmetry of nonzero pattern: 0.904903
nz in S+S' (excl. diagonal): 62184
nz on diagonal of matrix S: 2184
fraction of nz on diagonal: 1.000000
AMD statistics, for strict diagonal pivoting:
est. flops for LU factorization: 2.76434e+007
est. nz in L+U (incl. diagonal): 306216
est. largest front (# entries): 31329
est. max nz in any column of L: 177
number of "dense" rows/columns in S+S': 0
symbolic factorization defragmentations: 0
symbolic memory usage (Units): 174698
symbolic memory usage (MBytes): 2.7
Symbolic size (Units): 9196
Symbolic size (MBytes): 0
symbolic factorization CPU time (sec): 0.00
symbolic factorization wallclock time(sec): 0.00
matrix scaled: yes (divided each row by sum of abs values in each row)
minimum sum (abs (rows of A)): 1.00000e+000
maximum sum (abs (rows of A)): 9.75375e+003
symbolic/numeric factorization: upper bound actual %
variable-sized part of Numeric object:
initial size (Units) 149803 146332 98%
peak size (Units) 1037500 202715 20%
final size (Units) 787803 154127 20%
Numeric final size (Units) 806913 171503 21%
Numeric final size (MBytes) 12.3 2.6 21%
peak memory usage (Units) 1083860 249075 23%
peak memory usage (MBytes) 16.5 3.8 23%
numeric factorization flops 5.22115e+008 2.59546e+007 5%
nz in L (incl diagonal) 593172 145107 24%
nz in U (incl diagonal) 835128 154044 18%
nz in L+U (incl diagonal) 1424832 295683 21%
largest front (# entries) 348768 30798 9%
largest # rows in front 519 175 34%
largest # columns in front 672 177 26%
initial allocation ratio used: 0.309
# of forced updates due to frontal growth: 1
number of off-diagonal pivots: 0
nz in L (incl diagonal), if none dropped 145107
nz in U (incl diagonal), if none dropped 154044
number of small entries dropped 0
nonzeros on diagonal of U: 3468
min abs. value on diagonal of U: 4.80e-002
max abs. value on diagonal of U: 1.00e+000
estimate of reciprocal of condition number: 4.80e-002
indices in compressed pattern: 13651
numerical values stored in Numeric object: 295806
numeric factorization defragmentations: 0
numeric factorization reallocations: 0
costly numeric factorization reallocations: 0
numeric factorization CPU time (sec): 0.05
numeric factorization wallclock time (sec): 0.00
numeric factorization mflops (CPU time): 552.22
solve flops: 1.78396e+006
iterative refinement steps taken: 1
iterative refinement steps attempted: 1
sparse backward error omega1: 1.80e-016
sparse backward error omega2: 0.00e+000
solve CPU time (sec): 0.00
solve wall clock time (sec): 0.00
total symbolic + numeric + solve flops: 2.77385e+007
观察线条:
numeric factorization flops 5.22115e+008 2.59546e+007 5%
solve flops: 1.78396e+006
total symbolic + numeric + solve flops: 2.77385e+007
表示M的因式分解用了2.59546e+007/2.77385e+007 = 93.6%的求解时间
我想在我的迭代之外提前计算因式分解,然后只运行最后一个阶段,这大约需要 6.5% CPU 时间。
我知道如何计算因式分解 ([L,U,P,Q,R] = lu(M);
),但我不知道如何将其输出用作求解器的输入。
我想本着以下精神经营一些东西:
[L,U,P,Q,R] = lu(M);
for ii=1:num_iter
dx = solve_pre_factored(M,P,Q,R,x);
x = x+dx;
end
有没有办法在 Matlab 中做到这一点?
您必须问问自己,LU 分解中的所有这些矩阵都做了什么。
如 the documentation 所述:
[L,U,P,Q,R] = lu(A) returns unit lower triangular matrix L, upper triangular matrix U, permutation matrices P and Q, and a diagonal scaling matrix R so that P*(R\A)Q = LU for sparse non-empty A. Typically, but not always, the row-scaling leads to a sparser and more stable factorization. The statement lu(A,'matrix') returns identical output values.
因此,在更多的数学术语中,我们有 PR
-1
AQ = LU
,因此 A = RP
-1
LUQ
-1
那么x = M\x
可以按以下步骤重写:
y = R
-1
x
z = P y
u = L
-1
z
v = U
-1
u
w = Q v
x = w
要反转 U
、L
和 R
,您可以使用 \
,它将识别它们是三角形(R
是对角线)矩阵 -正如监控应该确认的那样,并为它们使用适当的简单求解器。
因此以更密集和 matlab 编写的方式:x = Q*(U\(L\(P*(R\x))));
正如您所问,这样做将完全是求解器 \
内部发生的事情,只有一个因式分解。
然而,正如评论中所述,大量的反演计算 N = M
-1
一次会变得更快,然后只计算一次一个简单的矩阵向量乘法,比上面解释的过程简单得多。初始计算 inv(M)
更长且 has some limitations,因此这种权衡还取决于矩阵的属性。