分配矩阵时出现 Fortran Seg Fault
Fortran Seg Fault when assigning Matrices
[更新] 更改了代码和几句话以反映我在第二条评论中解释的实现。代码应该用下面的行编译,但是,我有一个旧的 gfortran,可能没有看到你可能会看到的一些错误。
gfortran BLU_implementation_copy.f90 -o BLU_implementation_copy.x
我在 运行 使用 Fortran 90 时遇到难以置信的不一致段错误。
我的代码的总体 objective 是将一个随机生成的、复杂的、对称的、对角线占优的矩阵分解成可以轻松并行化的块。精简版(相关功能除外)见文末
我 运行在函数 mat2tiles 中将原始矩阵分解为块(块)时遇到错误。具体来说,这行代码一直失败:
o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N)))
这是在索引 i,j 处将输入矩阵 X 的一个块分配给矩阵 o。它实现了这一点,因为 o 的每个索引都是通过派生类型的矩阵 NblockxNblock:
type cell
complex*16 :: block(Nblock,Nblock)
end type cell
type(cell) :: o(M/Nblock+rem,N/Nblock+rem)
对于某些矩阵大小和图块大小,一切都完美无缺。对于较大的尺寸,错误开始出现在接近矩阵尺寸的图块尺寸中,并最终达到矩阵尺寸一半的尺寸。例如,我发现发生这种情况的最小实例是一个 20x20 矩阵,其图块大小为 18。但这不会一直发生。如果我尝试通常 运行 较小的矩阵并建立到那个大小,它 运行s。如果我做一个更大的尺寸来分割故障,然后 运行 20x20 和 18,它会分割故障。我发现的最小一致参数是 25x25,图块大小为 23。即使是这个大小,如果我在 o(i,j) 分配行之后打印出它失败的块:
print*, o(2,1)%block
它突然 运行 毫无问题地完成了整个过程。取出打印语句使其再次出现段错误。
最后,最烦人的一个[已修复 - 参考我的第二条评论],如果我做一个 2000x2000 的矩阵,瓦片大小为 1000 或更大,我在进入函数后立即遇到段错误(这意味着它发生在变量分配)。这使我相信问题可能源于矩阵的分配方式,尤其是因为我使用的是派生类型。但是当我尝试诊断矩阵的大小和内容时,一切似乎都很正常。
program BLU_implementation
implicit none
integer :: rem
integer, parameter :: M=20, N=20, Nblock=19
real*8 :: start, finish
complex*16 :: A(M,N)
type cell
complex*16 :: block(Nblock,Nblock)
end type cell
!determines if cell matrix doesn't need to be bigger
rem=1
if (modulo(M,Nblock)==0) then
rem=0
endif
call cpu_time(start)
call functionCalling(A, M, N, Nblock, rem)
call cpu_time(finish)
print*, 'overall time:', finish-start, 'seconds'
contains
!==================================================================================================================
subroutine functionCalling(A, M, N, Nblock, rem)
implicit none
integer :: IPIV, INFO, M, N, Nblock, rem
real*8 :: start, finish
complex*16 :: A(M,N)
type(cell) :: C(M/Nblock+rem,N/Nblock+rem)
call cpu_time(start)
A= CSPDmatrixFill(A,M,N)
call cpu_time(finish)
print*, 'matrix fill time:', finish-start, 'seconds'
call cpu_time(start)
C= mat2tiles(A,M,N,Nblock,rem)
call cpu_time(finish)
print*, 'tiling time:', finish-start, 'seconds'
end subroutine
!===================================================================================================================
! generates a complex, symmetric, positive-definite matrix (based off of another's code)
function CSPDmatrixFill(A, M, N) result (Matrix)
! initialization
implicit none
integer :: i, j
integer :: M, N
real*8 :: x, xi
complex*16 :: A(M, N), Matrix(M, N), EYE(M, N), MT(N, M)
EYE=0
forall(j=1:M) EYE(j,j)=1
! execution
call random_seed
do i=1, M
do j=1, N
call random_number (x)
call random_number(xi)
Matrix(i,j) = cmplx(x,xi)
end do
end do
! construct a symmetric matrix ( O(n^2) )
call Mtranspose(Matrix, M, N)
Matrix = Matrix+MT
! make positive definite (diagonally dominant)
Matrix = Matrix + N*EYE
end function CSPDmatrixFill
!======================================================================================================
subroutine Mtranspose(A, i, j)
! takes a matrix and the two parameters used to make the matrix: A(i,j)
! returns a matrix with switched indices: A(j,i)
implicit none
integer :: i, j
complex*16 :: A(i,j), MT(j,i)
MT=A(j,i)
return
end subroutine Mtranspose
!=======================================================================================================
!MAT2TILES - breaks up an array into a cell array of adjacent sub-arrays of equal sizes
!
! O=mat2tiles(X,M,N,Nblock)
!
!will produce a cell array o containing adjacent chunks of the array X(M,N)
!with each chunk of dimensions NblockxNblock. If Nblock does
!not divide evenly into size(X,i), then the chunks at the upper boundary of
!X along dimension i will have bogus values that do not affect the factorization
!in the places where the matrix doesn't occupy. (according to older versions. Might have changed with some edits)
!
function mat2tiles(X,M,N,Nblock,rem) result(o)
! initialization
implicit none
integer :: a, b, i, j, M, N, Nblock, rem
complex*16 :: X(M,N)
type(cell) :: o(M/Nblock+rem,N/Nblock+rem)
! diagnostic print statements
print*,size(o(1,1)%block), size(o(1,2)%block), size(o(2,1)%block), size(o(2,2)%block)
print*, 'got to start'
! turn matrix x into cell matrix o
do j=1, N/Nblock+rem
if (j==1) then
b=j
else
b=b+Nblock
endif
do i=1, M/Nblock+rem
if (i==1) then
a=i
else
a=a+Nblock
endif
! diagnostic print statement
print*, 'writing to o: i:', i, 'j:', j, 'i*Nblock:', i*Nblock, 'j*Nblock:', j*Nblock, 'min of i:', min(i*Nblock, M), &
'min of j:', min(j*Nblock, N), 'a:', a, 'b:', b
o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N)))
enddo
enddo
! diagnostic print statement
print*, 'got to end'
return
end function mat2tiles
!==================================================================================================
end program
在将问题缩小到使用数字与相同数字的变量之后,发现 Fort运行 不喜欢将矩阵分配给不同维度的矩阵,即使它适合内部。这很奇怪,因为 M
、N
和 Nblock
的较小值完美地解决了这个问题。
解决方案是简单地为 i=1
和 j=1
定义 o(i,j)%block(1:Nblock,1:Nblock)=x(dim1:dim2,etc)
而不是 o(i,j)=cell(x(dim1:dim2,etc)
并且对 [=18= 的所有情况下的每个实例稍作修改] 和 j
.
正确的代码(对于矩阵大小和分块大小的所有情况)如下所示:
if (i==M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:M-(i-1)*Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i==M/Nblock+rem .AND. j/=N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:M-(i-1)*Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i/=M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else
o(i,j)%block(1:Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
end if
通过此更正,我的代码可以在新旧版本的 gfort运行 上运行。然而,有趣的是,Mac OS X 版本的 gfort运行 从未 运行 进入此段错误,只有 Linux 版本。
[更新] 更改了代码和几句话以反映我在第二条评论中解释的实现。代码应该用下面的行编译,但是,我有一个旧的 gfortran,可能没有看到你可能会看到的一些错误。
gfortran BLU_implementation_copy.f90 -o BLU_implementation_copy.x
我在 运行 使用 Fortran 90 时遇到难以置信的不一致段错误。
我的代码的总体 objective 是将一个随机生成的、复杂的、对称的、对角线占优的矩阵分解成可以轻松并行化的块。精简版(相关功能除外)见文末
我 运行在函数 mat2tiles 中将原始矩阵分解为块(块)时遇到错误。具体来说,这行代码一直失败:
o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N)))
这是在索引 i,j 处将输入矩阵 X 的一个块分配给矩阵 o。它实现了这一点,因为 o 的每个索引都是通过派生类型的矩阵 NblockxNblock:
type cell
complex*16 :: block(Nblock,Nblock)
end type cell
type(cell) :: o(M/Nblock+rem,N/Nblock+rem)
对于某些矩阵大小和图块大小,一切都完美无缺。对于较大的尺寸,错误开始出现在接近矩阵尺寸的图块尺寸中,并最终达到矩阵尺寸一半的尺寸。例如,我发现发生这种情况的最小实例是一个 20x20 矩阵,其图块大小为 18。但这不会一直发生。如果我尝试通常 运行 较小的矩阵并建立到那个大小,它 运行s。如果我做一个更大的尺寸来分割故障,然后 运行 20x20 和 18,它会分割故障。我发现的最小一致参数是 25x25,图块大小为 23。即使是这个大小,如果我在 o(i,j) 分配行之后打印出它失败的块:
print*, o(2,1)%block
它突然 运行 毫无问题地完成了整个过程。取出打印语句使其再次出现段错误。 最后,最烦人的一个[已修复 - 参考我的第二条评论],如果我做一个 2000x2000 的矩阵,瓦片大小为 1000 或更大,我在进入函数后立即遇到段错误(这意味着它发生在变量分配)。这使我相信问题可能源于矩阵的分配方式,尤其是因为我使用的是派生类型。但是当我尝试诊断矩阵的大小和内容时,一切似乎都很正常。
program BLU_implementation
implicit none
integer :: rem
integer, parameter :: M=20, N=20, Nblock=19
real*8 :: start, finish
complex*16 :: A(M,N)
type cell
complex*16 :: block(Nblock,Nblock)
end type cell
!determines if cell matrix doesn't need to be bigger
rem=1
if (modulo(M,Nblock)==0) then
rem=0
endif
call cpu_time(start)
call functionCalling(A, M, N, Nblock, rem)
call cpu_time(finish)
print*, 'overall time:', finish-start, 'seconds'
contains
!==================================================================================================================
subroutine functionCalling(A, M, N, Nblock, rem)
implicit none
integer :: IPIV, INFO, M, N, Nblock, rem
real*8 :: start, finish
complex*16 :: A(M,N)
type(cell) :: C(M/Nblock+rem,N/Nblock+rem)
call cpu_time(start)
A= CSPDmatrixFill(A,M,N)
call cpu_time(finish)
print*, 'matrix fill time:', finish-start, 'seconds'
call cpu_time(start)
C= mat2tiles(A,M,N,Nblock,rem)
call cpu_time(finish)
print*, 'tiling time:', finish-start, 'seconds'
end subroutine
!===================================================================================================================
! generates a complex, symmetric, positive-definite matrix (based off of another's code)
function CSPDmatrixFill(A, M, N) result (Matrix)
! initialization
implicit none
integer :: i, j
integer :: M, N
real*8 :: x, xi
complex*16 :: A(M, N), Matrix(M, N), EYE(M, N), MT(N, M)
EYE=0
forall(j=1:M) EYE(j,j)=1
! execution
call random_seed
do i=1, M
do j=1, N
call random_number (x)
call random_number(xi)
Matrix(i,j) = cmplx(x,xi)
end do
end do
! construct a symmetric matrix ( O(n^2) )
call Mtranspose(Matrix, M, N)
Matrix = Matrix+MT
! make positive definite (diagonally dominant)
Matrix = Matrix + N*EYE
end function CSPDmatrixFill
!======================================================================================================
subroutine Mtranspose(A, i, j)
! takes a matrix and the two parameters used to make the matrix: A(i,j)
! returns a matrix with switched indices: A(j,i)
implicit none
integer :: i, j
complex*16 :: A(i,j), MT(j,i)
MT=A(j,i)
return
end subroutine Mtranspose
!=======================================================================================================
!MAT2TILES - breaks up an array into a cell array of adjacent sub-arrays of equal sizes
!
! O=mat2tiles(X,M,N,Nblock)
!
!will produce a cell array o containing adjacent chunks of the array X(M,N)
!with each chunk of dimensions NblockxNblock. If Nblock does
!not divide evenly into size(X,i), then the chunks at the upper boundary of
!X along dimension i will have bogus values that do not affect the factorization
!in the places where the matrix doesn't occupy. (according to older versions. Might have changed with some edits)
!
function mat2tiles(X,M,N,Nblock,rem) result(o)
! initialization
implicit none
integer :: a, b, i, j, M, N, Nblock, rem
complex*16 :: X(M,N)
type(cell) :: o(M/Nblock+rem,N/Nblock+rem)
! diagnostic print statements
print*,size(o(1,1)%block), size(o(1,2)%block), size(o(2,1)%block), size(o(2,2)%block)
print*, 'got to start'
! turn matrix x into cell matrix o
do j=1, N/Nblock+rem
if (j==1) then
b=j
else
b=b+Nblock
endif
do i=1, M/Nblock+rem
if (i==1) then
a=i
else
a=a+Nblock
endif
! diagnostic print statement
print*, 'writing to o: i:', i, 'j:', j, 'i*Nblock:', i*Nblock, 'j*Nblock:', j*Nblock, 'min of i:', min(i*Nblock, M), &
'min of j:', min(j*Nblock, N), 'a:', a, 'b:', b
o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N)))
enddo
enddo
! diagnostic print statement
print*, 'got to end'
return
end function mat2tiles
!==================================================================================================
end program
在将问题缩小到使用数字与相同数字的变量之后,发现 Fort运行 不喜欢将矩阵分配给不同维度的矩阵,即使它适合内部。这很奇怪,因为 M
、N
和 Nblock
的较小值完美地解决了这个问题。
解决方案是简单地为 i=1
和 j=1
定义 o(i,j)%block(1:Nblock,1:Nblock)=x(dim1:dim2,etc)
而不是 o(i,j)=cell(x(dim1:dim2,etc)
并且对 [=18= 的所有情况下的每个实例稍作修改] 和 j
.
正确的代码(对于矩阵大小和分块大小的所有情况)如下所示:
if (i==M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:M-(i-1)*Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i==M/Nblock+rem .AND. j/=N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:M-(i-1)*Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i/=M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then
o(i,j)%block(1:Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else
o(i,j)%block(1:Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
end if
通过此更正,我的代码可以在新旧版本的 gfort运行 上运行。然而,有趣的是,Mac OS X 版本的 gfort运行 从未 运行 进入此段错误,只有 Linux 版本。