使用转换为 "wrong" 类型的指针算法

Question

我有一个结构数组，我有一个指向其中一个结构的成员的指针。我想知道数组的哪个元素包含该成员。这里有两种方法：

#include <array>
#include <string>

struct xyz
{
    float x, y;
    std::string name;
};

typedef std::array<xyz, 3> triangle;

// return which vertex the given coordinate is part of
int vertex_a(const triangle& tri, const float* coord)
{
    return reinterpret_cast<const xyz*>(coord) - tri.data();
}

int vertex_b(const triangle& tri, const float* coord)
{
    std::ptrdiff_t offset = reinterpret_cast<const char*>(coord) - reinterpret_cast<const char*>(tri.data());
    return offset / sizeof(xyz);
}

这是一个测试驱动程序：

#include <iostream>

int main()
{
    triangle tri{{{12.3, 45.6}, {7.89, 0.12}, {34.5, 6.78}}};
    for (const xyz& coord : tri) {
        std::cout
            << vertex_a(tri, &coord.x) << ' '
            << vertex_b(tri, &coord.x) << ' '
            << vertex_a(tri, &coord.y) << ' '
            << vertex_b(tri, &coord.y) << '\n';
    }
}

两种方法都产生了预期的结果：

0 0 0 0
1 1 1 1
2 2 2 2

但它们是有效代码吗？

特别是我想知道 vertex_a() 是否可能通过将 float* y 转换为 xyz* 来调用未定义的行为，因为结果实际上并不指向 struct xyz。这种担忧促使我写了 vertex_b()，我认为这是安全的（是吗？）。

这是 GCC 6.3 使用 -O3 生成的代码：

vertex_a(std::array<xyz, 3ul> const&, float const*):
    movq    %rsi, %rax
    movabsq $-3689348814741910323, %rsi ; 0xCCC...CD
    subq    %rdi, %rax
    sarq    , %rax
    imulq   %rsi, %rax

vertex_b(std::array<xyz, 3ul> const&, float const*):
    subq    %rdi, %rsi
    movabsq $-3689348814741910323, %rdx ; 0xCCC...CD
    movq    %rsi, %rax
    mulq    %rdx
    movq    %rdx, %rax
    shrq    , %rax

Answer 1

vertex_a 确实违反了严格的别名规则（您的 float 中的 none 是有效的 xyz，而在您的示例中有 50% 是无效的即使在 xyz 的开头，即使没有填充）。

vertex_b 可以说，creative 对标准的解释。虽然你对 const char* 的转换是合理的，但在数组的其余部分周围执行算术运算有点狡猾。从历史上看，我得出的结论是这种事情具有未定义的行为，因为在这种情况下 "the object" 是 xyz，而不是数组。但是，我现在倾向于其他人的解释，即这将始终有效，并且在实践中不会期望任何其他东西。

Answer 2

根据标准，两者均无效。

在 vertex_a 中，您可以将指向 xyz::x 的指针转换为指向 xyz 的指针，因为它们是 pointer-interconvertible:

Two objects a and b are pointer-interconvertible if [...] one is a standard-layout class object and the other is the first non-static data member of that object [...]

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast.

但是您不能将指向 xyz::y 的指针转换为指向 xyz 的指针。该操作未定义。

在 vertex_b 中，您要减去指向 const char 的两个指针。该操作在 [expr.add] 中定义为：

If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined

您的表达式未指向 char 数组的元素，因此行为未定义。

Answer 3

也许更稳健的方法是将类型签名更改为 xyz::T*（T 是模板参数，因此您可以根据需要采用 xyz::x 或 xyz::y）而不是 float*

然后您可以使用 offsetof(struct xyz,T) 自信地计算结构开始的位置，这种方式应该更能适应未来对其定义的更改。

然后剩下的就跟你现在做的一样：一旦你有一个指向结构开始的指针，找到它在数组中的偏移量就是一个有效的指针减法。

涉及一些指针问题。但这是一种使用的方法。例如请参阅 linux 内核中的 container_of() 宏。 https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/067/6717/6717s2.html

Answer 4

vertex_b 完全没问题。您可能只需要优化 return offset / sizeof(xyz);，因为您将 std::ptrdiff_t 除以 std::size_t，并将结果隐式转换为 int。按照书本，此行为是实现定义的。 std::ptrdiff_t 是有符号的，std::size_t 是无符号的，除法结果可能大于 INT_MAX（非常不可能），某些 platforms/compilers.

上的数组大小很大

为了摆脱烦恼，您可以输入 assert()s and/or #errors 来检查 PTRDIFF_MIN, PTRDIFF_MAX, SIZE_MAX , INT_MIN 和 INT_MAX, 但我个人不会那么在意。

使用转换为 "wrong" 类型的指针算法

Pointer arithmetic using cast to "wrong" type

c++

pointer-arithmetic

undefined-behavior

language-lawyer