比较已释放的指针会调用 UB 吗?

Does comparing a pointer that has been free'd invoke UB?

这似乎是一种相当常见的模式,例如在 hexchat 中(可能无法编译,另请参阅 plugin docs。另请注意 hexchat_plugin_get_info 尚未永远使用,因此为简单起见,我将其省略):

static hexchat_plugin *ph;
static int timer_cb(void *userdata) {
    if (hexchat_set_context(ph, userdata)) { /* <-- is this line UB? */
        /* omitted */
    }
    return 0;
}
static int do_ub(char *word[], char *word_eol[], void *userdata) {
    void *context = hexchat_get_context(ph);
    hexchat_hook_timer(ph, 1000, timer_cb, context);
    hexchat_command(ph, "close"); /* free the context - in practice this would be done by another plugin or by the user, not like this, but for the purposes of this example this simulates the user closing the context. */
    return HEXCHAT_EAT_ALL;
}
int hexchat_plugin_init(hexchat_plugin *plugin_handle, char **plugin_name, char **plugin_desc, char **plugin_version, char *arg) {
    *plugin_name = "do_ub";
    *plugin_desc = "does ub when you /do_ub";
    *plugin_version = "1.0.0";
    ph = plugin_handle;
    /* etc */
    hexchat_hook_command(ph, "do_ub", 0, do_ub, "does UB", NULL);
    return 1;
}

timer_cb 中的行使 hexchat 比较(在这个例子中可能是自由的 - 绝对是自由的,参见 do_ub 中的注释)指针与另一个指针,如果你遵循来自 here (plugin.c#L1089, hexchat_set_context) you'll end up in here (hexchat.c#L191, is_session)。要调用此代码,请在 hexchat 中 运行 /do_ub

相关代码:

int
hexchat_set_context (hexchat_plugin *ph, hexchat_context *context)
{
    if (is_session (context))
    {
        ph->context = context;
        return 1;
    }
    return 0;
}

int
is_session (session * sess)
{
    return g_slist_find (sess_list, sess) ? 1 : 0;
}

这种东西是UB吗?

在指针指向的对象达到其生命周期结束后使用指针的值是不确定,如C11 Standard draft 6.2.4p2 (Storage durations of objects)中所述(重点是我的) :

The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

并且使用它的值(仅用于任何事情)是明确的未定义行为,如Annex J.2(未定义行为):

中所述

The behavior is undefined in the following circumstances: [...] The value of a pointer to an object whose lifetime has ended is used (6.2.4).

是的,使用已为 任何东西 释放的指针值——即使是看似无害的比较——严格来说也是未定义的行为。它不太可能在实践中造成任何实际问题,但我认为值得避免。

另见 C FAQ list, question 7.21

tl;dr:执行某些操作(例如比较指针而不考虑由此识别的对象的生命周期)的能力是一种流行的扩展,绝大多数编译器都可以配置为支持禁用优化。然而,标准并未强制支持它,积极的优化器可能会破坏依赖它的代码。

编写标准时,有一些分段内存平台试图将指针加载到寄存器中会导致系统检索有关指针所在内存区域的信息。如果此类信息不再可用,则尝试检索它可能会在标准管辖范围之外产生任意后果。对于标准要求涉及此类指针的比较除了产生 0 或 1 之外没有副作用会使该语言在此类平台上不切实际。

虽然该标准的作者无疑意识到能够使用与任意指针的比较(需要注意的是结果可能不是特别有意义)是每个针对传统硬件的实现都支持的有用功能,他们认为没有必要将其视为 "popular extension" 质量实施支持,只要这样做是有用和实用的。

来自 C89 基本原理,第 11 页第 23 行:

The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe. The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard. Informative Annex J of the Standard catalogs those behaviors which fall into one of these three categories.

不幸的是,尽管当今使用的几乎所有平台都可以基本上以零成本 (*) 支持此类语义,但一些编译器作者认为他们希望假设代码永远不会对释放的指针做任何事情比任何值都重要程序员可以从传统平台上基本上普遍支持的扩展中获得。除非可以保证任何使用自己的代码的人都会禁用假冒的 "optimizations" ,这些假冒的 "optimizations" 是那些试图摆脱有用扩展语言的过度急切的优化器的作者强加的,否则可能不得不编写额外的代码来解决缺少这样的扩展。

(*) 在某些情况下,函数向外部代码公开多个指向它已分配和释放的存储区域的指针,因此编译器必须坚持行为保证它们将比较相等的指针需要实际执行会泄漏指针的存储操作;将指针视为不确定的将允许消除商店。然而,在人为设计的场景之外,通过消除此类带有泄漏到外部世界的指针的存储而节省的成本很少会对性能产生任何有意义的影响。