为什么两个 TBytes 不能使用重叠数据？

Question

考虑以下 XE6 代码。目的是将 ThingData 写入 Thing1 和 Thing2 的控制台，但事实并非如此。这是为什么？

program BytesFiddle;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

type
  TThing = class
  private
    FBuf : TBytes;
    FData : TBytes;
    function GetThingData: TBytes;
    function GetThingType: Byte;
  public
    property ThingType : Byte read GetThingType;
    property ThingData : TBytes read GetThingData;

    constructor CreateThing(const AThingType : Byte; const AThingData: TBytes);
  end;

{ TThing1 }

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + 1);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));

  FData := @FBuf[1];
  SetLength(FData, Length(FBuf) - 1);
end;

function TThing.GetThingData: TBytes;
begin
  Result := FData;
end;

function TThing.GetThingType: Byte;
begin
  Result := FBuf[0];
end;

var
  Thing1, Thing2 : TThing;

begin
  try
    Thing1 := TThing.CreateThing(0, TEncoding.UTF8.GetBytes('Sneetch'));
    Thing2 := TThing.CreateThing(1, TEncoding.UTF8.GetBytes('Star Belly Sneetch'));

    Writeln(TEncoding.UTF8.GetString(Thing2.ThingData));
    Writeln(Format('Type %d', [Thing2.ThingType]));

    Writeln(TEncoding.UTF8.GetString(Thing1.ThingData));
    Writeln(Format('Type %d', [Thing1.ThingType]));

    ReadLn;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

Answer 1

让我向您介绍这段代码失败的方式以及编译器如何让您搬起石头砸自己的脚。

如果您使用调试器单步执行代码，您会看到发生了什么。

Thing1初始化后可以看到FData全为0.
奇怪的是 Thing2 没问题。
因此错误在CreateThing。让我们进一步调查...

在奇怪命名的构造函数中 CreateThing 你有以下行：

FData := @FBuf[1];

这看起来像是一个简单的赋值，但实际上是对 DynArrayAssign

的调用

Project97.dpr.32: FData := @FBuf[1];
0042373A 8B45FC           mov eax,[ebp-]
0042373D 83C008           add eax,
00423743 8B5204           mov edx,[edx+]
00423746 42               inc edx
00423747 8B0DE03C4000     mov ecx,[[=11=]403ce0]
0042374D E8E66DFEFF       call @DynArrayAsg      <<-- lots of stuff happening here.

DynArrayAsg 执行的检查之一是检查源动态数组是否为空。
DynArrayAsg 还做一些您需要注意的其他事情。

先来看看the structure of a dynamic array；它不仅仅是一个指向数组的简单指针！

Offset 32/64  |   Contents     
--------------+--------------------------------------------------------------
-8/-12        | 32 bit reference count
-4/-8         | 32 or 64 bit length indicator 
 0/ 0         | data of the array.

执行 FData = @FBuf[1] 你弄乱了动态数组的前缀字段。
@Fbuf[1]前面的4个字节被解释为长度。
对于 Thing1，这些是：

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 00 00 00  08 00 00 00  00  'S' 'n' .....
FData:    00 00 00 08  00 00 00 00  .............. //Hey that's a zero length.

糟糕，当 DynArrayAsg 开始调查时，它发现它认为是分配的源的长度为零，即它认为源是空的并且没有分配任何东西。 FData 保持不变！

Thing2 是否按预期工作？
看起来确实如此，但实际上它以相当糟糕的方式失败了，让我告诉你。

您已成功地欺骗运行时相信 @FBuf[1] 是对动态数组的有效引用。
因此，FData 指针已更新为指向 FBuf[1]（到目前为止一切顺利），并且 FData 的引用计数增加了 1（不好），运行时也增加了将动态数组保存到它认为 FData 的正确大小的内存块（坏）。

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 01 00 00  13 00 00 00  01  'S' 'n' .....
FData:    01 00 00 13  00 00 00 01  'S' ..............

糟糕 FData 现在的引用计数为 318,767,105，长度为 16,777,216 字节。
FBuf 也增加了它的长度，但它的引用计数现在是 257。

这就是为什么您需要调用 SetLength 来撤消内存的大量过度分配。不过，这仍然不能修复引用计数。
过度分配可能会导致内存不足错误（尤其是在 64 位上），而古怪的引用计数会导致内存泄漏，因为您的数组将永远不会被释放。

解决方法
根据 David 的回答：启用类型化检查指针：{$TYPEDADDRESS ON}

您可以通过将 FData 定义为普通 PAnsiChar 或 PByte 来修复代码。
如果您确保始终使用双零终止对 FBuf 的分配，FData 将按预期工作。

像这样使 FData 成为 TBuffer：

TBuffer = record
private
  FData : PByte;
  function GetLength: cardinal;
  function GetType: byte;
public
  class operator implicit(const A: TBytes): TBuffer;
  class operator implicit(const A: TBuffer): PByte;
  property Length: cardinal read GetLength;
  property DataType: byte read GetType;
end;

像这样重写CreateThing：

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + Sizeof(AThingType) + 2);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));
  FBuf[Lengh(FBuf)-1]:= 0;
  FBuf[Lengh(FBuf)-2]:= 0;  //trailing zeros for compatibility with pansichar

  FData := FBuf;  //will call the implicit class operator.
end;

class operator TBuffer.implicit(const A: TBytes): TBuffer;
begin
  Result.FData:= PByte(@A[1]);
end;

我不明白所有关于试图智取编译器的废话。
为什么不像这样声明 FData：

type
  TMyData = record
    DataType: byte;
    Buffer: Ansistring;  
    ....

并使用它。

Answer 2

通过启用类型检查指针可以很容易地发现问题。将此添加到代码的顶部：

{$TYPEDADDRESS ON}

documentation 说：

The $T directive controls the types of pointer values generated by the @ operator and the compatibility of pointer types.

In the {$T-} state, the result of the @ operator is always an untyped pointer (Pointer) that is compatible with all other pointer types. When @ is applied to a variable reference in the {$T+} state, the result is a typed pointer that is compatible only with Pointer and with other pointers to the type of the variable.

In the {$T-} state, distinct pointer types other than Pointer are incompatible (even if they are pointers to the same type). In the {$T+} state, pointers to the same type are compatible.

通过该更改，您的程序无法编译。此行失败：

FData := @FBuf[1];

错误信息是：

E2010 Incompatible types: 'System.TArray<System.Byte>' and 'Pointer'

现在，FData 是 TArray<Byte> 类型，但 @FBuf[1] 不是动态数组，而是指向动态数组中间字节的指针。两者不相容。通过在不对指针进行类型检查的默认模式下运行，编译器允许您犯下这个可怕的错误。为什么这是默认模式完全超出了我的范围。

动态数组不仅仅是指向第一个元素的指针——还有长度和引用计数等元数据。该元数据存储在与第一个元素的偏移处。因此，您的整个设计都是有缺陷的。将类型代码存储在单独的变量中，而不是作为动态数组的一部分。

Answer 3

动态数组内部是指针，赋值兼容指针；但赋值右侧唯一正确的指针是 nil 或另一个动态数组。 FData := @FBuf[1]; 显然是错误的，但有趣的是 FData := @FBuf[0]; 可能没问题，即使启用了 $TYPEDADDRESS。

以下代码在 Delphi XE 中按预期编译和工作：

program Project19;

{$APPTYPE CONSOLE}
{$TYPEDADDRESS ON}

uses
  SysUtils;

procedure Test;
var
  A, B: TBytes;

begin
  A:= TBytes.Create(11,22,33);
  B:= @A[0];
  Writeln(B[1]);
end;

begin
  try
    Test;
    readln;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

似乎编译器"knows"认为@A[0]是一个动态数组，而不仅仅是一个指针。

Answer 4

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
var
  Buffer : array of Byte;
begin
  SetLength(Buffer, Length(AThingData) + Sizeof(AThingType));
  Buffer[0] := AThingType;
  Move(AThingData[0], Buffer[1], Length(AThingData));

  SetLength(FBuf, Length(Buffer));
  Move(Buffer[0], FBuf[0], Length(Buffer));
  SetLength(FData, Length(AThingData));
  Move(Buffer[1], FData[0], Length(AThingData));
end;

为什么两个 TBytes 不能使用重叠数据？

Why can't two TBytes use overlapping data?

delphi

dynamic-arrays