在 MATLAB R2018a 及更新版本中无需数据复制即可将复数转换为实数

Question

自MATLAB R2018a以来，复值矩阵在内部存储为单个数据块，每个矩阵元素的实部和虚部彼此相邻存储——他们称之为"interleaved complex" . （以前这样的矩阵有两个数据块，一个用于所有实部，一个用于所有虚部 -- "separate complex"。）

我想，既然存储现在允许它，那么应该可以将复数值数组转换为具有两倍元素的实数值数组，而无需复制数据。

MATLAB 有一个函数 typecast，它可以在不复制数据的情况下将数组转换为不同的类型。例如，它可用于将具有 16 个 8 位值的数组转换为具有 2 个双精度浮点数的数组。它在不复制数据的情况下执行此操作，位模式被重新解释为新类型。

遗憾的是，此函数对复值数组根本不起作用。

我想复制这段代码：

A = fftn(randn(40,60,20)); % some random complex-valued array
assert(~isreal(A))

sz = size(A);
B = reshape(A,1,[]);        % make into a vector
B = cat(1,real(B),imag(B)); % interleave real and imaginary values
B = reshape(B,[2,sz]);      % reshape back to original shape, with a new first dimension
assert(isreal(B))

矩阵 A 和 B 具有（在 R2018a 和更新版本中）完全相同的数据，顺序完全相同。但是，要达到 B，我们必须复制数据两次。

我尝试创建一个执行此操作的 MEX 文件，但我不知道如何创建一个引用输入数组中数据的新数组。这个 MEX 文件可以工作，但会导致 MATLAB 在清除变量时崩溃，因为有两个数组引用相同的数据而没有意识到它们共享数据（即引用计数没有递增）。

// Build with:
//    mex -R2018a typecast_complextoreal.cpp

#include <mex.h>

#if MX_HAS_INTERLEAVED_COMPLEX==0
#error "This MEX-file must be compiled with the -R2018a flag"
#endif

#include <vector>

void mexFunction(int /*nlhs*/, mxArray* plhs[], int nrhs, const mxArray* prhs[]) {

   // Validate input
   if(nrhs != 1) {
      mexErrMsgTxt("One input argument expected");
   }
   if(!mxIsDouble(prhs[0]) && !mxIsSingle(prhs[0])) {
    mexErrMsgTxt("Only floating-point arrays are supported");
   }

   // Get input array sizes
   mwSize nDims = mxGetNumberOfDimensions(prhs[0]);
   mwSize const* inSizes = mxGetDimensions(prhs[0]);

   // Create a 0x0 output matrix of the same type, but real-valued
   std::vector<mwSize> outSizes(nDims + 1, 0);
   plhs[0] = mxCreateNumericMatrix(0, 0, mxGetClassID(prhs[0]), mxREAL);

   // Set the output array data pointer to the input array's
   // NOTE! This is illegal, and causes MATLAB to crash when freeing both
   // input and output arrays, because it tries to free the same data
   // twice
   mxSetData(plhs[0], mxGetData(prhs[0]));

   // Set the output array sizes
   outSizes[0] = mxIsComplex(prhs[0]) ? 2 : 1;
   for(size_t ii = 0; ii < nDims; ++ii) {
      outSizes[ii + 1] = inSizes[ii];
   }
   mxSetDimensions(plhs[0], outSizes.data(), outSizes.size());
}

我很想听听关于如何从这里开始的任何想法。我不一定需要修复 MEX 文件，如果解决方案是纯 MATLAB 代码，那就更好了。

Answer 1

这个问题让我想起了一些关于通过 MEX 进行就地内存编辑的博客文章，其中提到了以下内容：

[M]odifying the original data directly is both discouraged and not officially supported. Doing it incorrectly can easily crash Matlab. ^ref

This is, in the best case, an unmaintainable mess. ^ref

话虽如此，我没有适合您的解决方案，但我或许可以建议一个解决方法。

看到 MATLAB 如何允许我们调用 python 库，我们可以在 python 中执行这些操作，并且仅当我们到达 python 中进行的阶段时才将数据带回 MATLAB =51=] 是不可能的或不需要的。根据这个阶段的时间，这个想法可能是一个有效的方法，也可能是一个完全无用的建议。

看看下面的例子，它应该是不言自明的：

np = py.importlib.import_module('numpy');
sp = py.importlib.import_module('scipy.fftpack');

% Create a double array in python:
arrC = sp.fftn(np.random.rand(uint8(4), uint8(3), uint8(2)));
%{
arrC = 
  Python ndarray with properties:

           T: [1×1 py.numpy.ndarray]
        base: [1×1 py.NoneType]
      ctypes: [1×1 py.numpy.core._internal._ctypes]
        data: [1×4 py.memoryview]
       dtype: [1×1 py.numpy.dtype]
       flags: [1×1 py.numpy.flagsobj]
        flat: [1×1 py.numpy.flatiter]
        imag: [1×1 py.numpy.ndarray]
    itemsize: [1×1 py.int]
      nbytes: [1×1 py.int]
        ndim: [1×1 py.int]
        real: [1×1 py.numpy.ndarray]
       shape: [1×3 py.tuple]
        size: [1×1 py.int]
     strides: [1×3 py.tuple]
    [[[ 13.99586491+0.j           0.70305071+0.j        ]
      [ -1.33719563-1.3820106j   -0.74083670+0.25893033j]
      [ -1.33719563+1.3820106j   -0.74083670-0.25893033j]]

     [[ -0.43914391+0.8336674j    0.08835445-0.50821244j]
      [  1.07089829-0.35245746j   0.44890850-0.9650458j ]
      [  2.09813180+1.34942678j  -1.20877832+0.71191772j]]

     [[ -2.93525342+0.j          -0.69644042+0.j        ]
      [  0.16165913-1.29739125j  -0.84443177+0.26884365j]
      [  0.16165913+1.29739125j  -0.84443177-0.26884365j]]

     [[ -0.43914391-0.8336674j    0.08835445+0.50821244j]
      [  2.09813180-1.34942678j  -1.20877832-0.71191772j]
      [  1.07089829+0.35245746j   0.44890850+0.9650458j ]]]
%}

% Make sure that python sees it as a "complex double" (aka complex128)
assert( isequal(arrC.dtype, np.dtype(np.complex128)) );

% Return a (real) double view:
arrR = arrC.view(np.float64);
%{
arrR = 
  Python ndarray with properties:

           T: [1×1 py.numpy.ndarray]
        base: [1×1 py.numpy.ndarray]
      ctypes: [1×1 py.numpy.core._internal._ctypes]
        data: [1×4 py.memoryview]
       dtype: [1×1 py.numpy.dtype]
       flags: [1×1 py.numpy.flagsobj]
        flat: [1×1 py.numpy.flatiter]
        imag: [1×1 py.numpy.ndarray]
    itemsize: [1×1 py.int]
      nbytes: [1×1 py.int]
        ndim: [1×1 py.int]
        real: [1×1 py.numpy.ndarray]
       shape: [1×3 py.tuple]
        size: [1×1 py.int]
     strides: [1×3 py.tuple]
    [[[ 13.99586491   0.           0.70305071   0.        ]
      [ -1.33719563  -1.3820106   -0.7408367    0.25893033]
      [ -1.33719563   1.3820106   -0.7408367   -0.25893033]]

     [[ -0.43914391   0.8336674    0.08835445  -0.50821244]
      [  1.07089829  -0.35245746   0.4489085   -0.9650458 ]
      [  2.0981318    1.34942678  -1.20877832   0.71191772]]

     [[ -2.93525342   0.          -0.69644042   0.        ]
      [  0.16165913  -1.29739125  -0.84443177   0.26884365]
      [  0.16165913   1.29739125  -0.84443177  -0.26884365]]

     [[ -0.43914391  -0.8336674    0.08835445   0.50821244]
      [  2.0981318   -1.34942678  -1.20877832  -0.71191772]
      [  1.07089829   0.35245746   0.4489085    0.9650458 ]]]
%}

% Do something else with it in python
...

% Bring data to MATLAB:
sz = cellfun(@int64, cell(arrR.shape));
B = permute(reshape(double(py.array.array('d', arrR.flatten('C').tolist())),sz),[2,1,3]);

最后一个阶段可能效率低到足以抵消之前不复制数据的 performance/memory 收益，因此将它保留到最后可能是个好主意，并尽可能减少数据尽可能提前。

在尝试这个的过程中，我开始意识到在 python 中传输非标量 complex data from MATLAB to Python and back (just compare the MATLAB output for np.asarray(1+1i) vs np.asarray([1+1i, 1+1i])). Possibly, the reason for this is the limited support for complex non-ndarray arrays 涉及很多困难¹。

如果你（与我相反）知道如何处理 memoryview 对象（即 data 字段的内容 ndarray objects) - 你可以获得一个指针，也许将它传递给 C 以获得一些有用的结果。

¹ 这是可能的，但您必须 num2cell 您的数据，以便它可以作为 python 列表传输（反之亦然）。 editing some MATLAB files.

也可以改善这种情况

Answer 2

这是来自 documentation 的措辞：

Since many mathematical libraries use an interleaved complex representation, using the same representation in your MEX functions eliminates the need to translate data. This simplifies your code and potentially speeds up the processing when large data sets are involved.

根据文档中的 example，每个复数类型似乎是一个具有两个实部和虚部的结构：

typedef struct {
    double real;
    double imag;
} mxComplexDouble ;

所以我们想 (re-interpret) 将 mxComplexDouble 的数组转换为数学库中使用的其他双精度复数类型。

例如，一个数学图书馆可能将其复数类型定义为：

typedef struct {
    double real;
    double imag;
} other_complex ;

并且它定义了一个函数来使用它自己的复杂类型的数组。

void do_something(const other_complex* math_array);

这里我们要将一个 MATLAB 复数数组发送到数学库：

mxComplexDouble matlab_array[5];
other_complex* math_array =  (other_complex*)matlab_array;
do_something(math_array);

因为只需要一个指针转换，可以认为是加快了处理速度。

但是关于 strict aliasing rule * math_array 的任何使用都会导致 未定义的行为 。甚至禁止转换为双打数组：

double* double_array = (double*)matlab_array; 
printf("%f",double_array[0]); // Unedfined behavior

因此我们需要分配一个新数组并使用memcpy逐字节复制数据。这是防止未定义行为的安全方法。

严格别名规则的唯一例外是在 c 和 c++ 中分别在 headers complex.h and complex 中定义的标准复杂数据类型，并且只有浮点数 [float，可以定义 double 和 long double] 复杂类型。所以我们可以安全地将 std::complex<double>* 转换为 double*.

结论：

严格的别名规则

typecast 可能已被禁止用于新的 MATLAB 复杂数据类型，但正如所解释的那样，新的复杂数据类型不能在其他图书馆中使用如此便宜。只使用 memcpy 可能被认为是复制整个数据的有效方法。
对其他数据类型使用typcast似乎值得怀疑。我可以（而且可能应该）假设 MATLAB 使用了一些编译器技巧来防止未定义的行为，否则当我们访问数据元素时，如果 MATLAB 没有逐字节复制它，那么它会导致未定义的行为。（请注意，无论如何在 int32 和 uint32 等兼容类型之间进行转换以及将任何类型转换为 C++ char 类型都具有明确定义的行为）。此外，它要求使用适当的选项编译所有 mex 文件以禁用严格的别名。但是我们目前有大量已编译的 mex 文件，如果我们将类型转换的结果发送给它，则会导致未定义的行为。因此应尽可能限制使用 typecast。

*我发现一些博客比 SO post 更好地解释了这个概念。例如 here or here or here.

Answer 3

查看此 FEX 提交，它可以在不复制数据的情况下进行复杂的 --> 2 实数重新解释（它甚至可以在不复制的情况下指向数据的内部连续子部分）：

https://www.mathworks.com/matlabcentral/fileexchange/65842-sharedchild-creates-a-shared-data-copy-of-a-contiguous-subsection-of-an-existing-variable

如果您只是在 R2018a 及更高版本中读写交错的复杂数据文件，请参阅此 FEX 提交：

https://www.mathworks.com/matlabcentral/fileexchange/77530-freadcomplex-and-fwritecomplex

在 MATLAB R2018a 及更新版本中无需数据复制即可将复数转换为实数

Casting complex to real without data copy in MATLAB R2018a and newer

arrays

matlab

types

casting

complex-numbers

结论：