c ++使用空指针访问静态成员
c++ access static members using null pointer
最近尝试了以下程序,它编译、运行良好并产生预期的输出,而不是任何运行时错误。
#include <iostream>
class demo
{
public:
static void fun()
{
std::cout<<"fun() is called\n";
}
static int a;
};
int demo::a=9;
int main()
{
demo* d=nullptr;
d->fun();
std::cout<<d->a;
return 0;
}
如果使用未初始化的指针访问 class and/or 结构成员行为未定义,但为什么也允许使用空指针访问静态成员。我的程序有什么坏处吗?
你在这里看到的是我认为在 C++ 语言和属于同一通用编程语言家族的许多其他语言的规范中的一个考虑不周和不幸的设计选择。
这些语言允许您使用对 class 实例的引用来引用 class 的静态成员。实例引用的实际值当然会被忽略,因为不需要实例来访问静态成员。
因此,在 d->fun();
中,编译器仅在编译期间 使用 d
指针 来确定您指的是 demo
class,然后忽略它。编译器不会发出任何代码来取消引用指针,因此它在运行时将为 NULL 的事实并不重要。
因此,您看到的情况完全符合语言规范,我认为规范在这方面受到了影响,因为它允许发生不合逻辑的事情:使用实例引用来引用静态成员。
P.S。大多数语言的大多数编译器实际上都能够针对此类内容发出警告。我不知道你的编译器,但你可能想检查一下,因为你没有收到警告,因为你没有收到警告,这可能意味着你没有启用足够的警告。
来自 C++ 草案标准 N3337:
9.4 Static members
2 A static
member s of class X
may be referred to using the qualified-id expression X::s
; it is not necessary to use the class member access syntax (5.2.5) to refer to a static
member. A static
member may be referred
to using the class member access syntax, in which case the object expression is evaluated.
并且在关于对象表达式的部分...
5.2.5 Class member access
4 If E2 is declared to have type “reference to T,” then E1.E2 is an lvalue; the type of E1.E2 is T. Otherwise,
one of the following rules applies.
— If E2
is a static
data member and the type of E2
is T
, then E1.E2
is an lvalue; the expression designates the named member of the class. The type of E1.E2
is T.
根据标准的最后一段,表达式:
d->fun();
std::cout << d->a;
之所以有效,是因为它们都指定了 class 的命名成员,而不管 d
.
的值如何
TL;DR:您的示例定义明确。仅仅取消引用空指针并不会调用 UB。
关于这个话题有很多争论,基本上归结为通过空指针间接寻址本身是否是UB。
您的示例中发生的唯一有问题的事情是对象表达式的评估。特别是,根据 [expr.ref]/2:
,d->a
等同于 (*d).a
The expression E1->E2
is converted to the equivalent form
(*(E1)).E2
; the remainder of 5.2.5 will address only the first
option (dot).
*d
刚刚评价:
The postfix expression before the dot or arrow is evaluated;65 the
result of that evaluation, together with the id-expression, determines
the result of the entire postfix expression.
65) If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary
to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.
让我们提取代码的关键部分。考虑表达式语句
*d;
在此语句中,*d
是根据[stmt.expr]的弃值表达式。所以 *d
被单独评估 1,就像 d->a
.
因此,如果 *d;
有效,或者换句话说,表达式 *d
的求值,那么您的示例也是如此。
通过空指针的间接寻址本身会导致未定义的行为吗?
有一个开放的 CWG 问题 #232,创建于 15 多年前,它确实涉及这个问题。提出了一个非常重要的论点。报告以
开头
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
请注意,所提到的示例已更改为涵盖 const
对象的修改,并且 [dcl.ref] 中的说明 - 虽然仍然存在 - 不是规范性的。删除了规范性段落以避免承诺。
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*
" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
If the lvalue expression is obtained by applying the unary * operator
to a pointer and the pointer is a null pointer value (4.10
[conv.ptr]), the typeid expression throws the bad_typeid exception
(18.7.3 [bad.typeid]).
这是不一致的,应该清理。
最后一点尤为重要。 [expr.typeid] 中的引用仍然存在并且属于多态 class 类型的泛左值,如下例中的情况:
int main() try {
// Polymorphic type
class A
{
virtual ~A(){}
};
typeid( *((A*)0) );
}
catch (std::bad_typeid)
{
std::cerr << "bad_exception\n";
}
此程序的行为定义明确(将抛出并捕获异常),并且表达式 *((A*)0)
被计算 因为它不是部分一个未计算的操作数。现在如果通过空指针间接诱导 UB,则表达式写为
*((A*)0);
会这样做,导致 UB,与 typeid
场景相比,这似乎是荒谬的。 如果仅对上述表达式求值,因为每个丢弃值表达式都是1,那么在第二个片段 UB 中进行求值的关键区别在哪里? 没有现有的实现可以分析 typeid
-操作数,找到最里面的相应解引用并用检查包围它的操作数 - 也会有性能损失。
该期中的注释随后以以下内容结束了简短的讨论:
We agreed that the approach in the standard seems okay: p = 0; *p;
is not inherently an error. An lvalue-to-rvalue conversion would give
it undefined behavior.
即委员会同意这一点。尽管这份报告中提出的引入所谓“empty lvalues”的决议从未被采纳……
However, “not modifiable” is a compile-time concept, while in fact
this deals with runtime values and thus should produce undefined
behavior instead. Also, there are other contexts in which lvalues can
occur, such as the left operand of . or .*, which should also be
restricted. Additional drafting is required.
…不影响理。话又说回来,应该注意的是,这个问题甚至早于 C++03,这使得当我们接近 C++17 时它的说服力会降低。
CWG 问题 #315 似乎也涵盖了您的案例:
Another instance to consider is that of invoking a member function
from a null pointer:
struct A { void f () { } };
int main ()
{
A* ap = 0;
ap->f ();
}
[…]
Rationale (October 2003):
We agreed the example should be allowed. p->f()
is rewritten as
(*p).f()
according to 5.2.5 [expr.ref]. *p
is not an error when
p
is null unless the lvalue is converted to an rvalue (4.1
[conv.lval]), which it isn't here.
根据这个基本原理,如果没有进一步的左值到右值转换(=访问存储值)、引用绑定、值计算等,通过空指针的间接寻址本身不会调用 UB。 (注意:使用空指针调用 非静态 成员函数应该调用 UB,尽管 [class.mfct.non-static]/2 只是模糊地禁止它。基本原理已经过时在这方面。)
即仅评估 *d
不足以调用 UB。不需要对象的身份,也不需要它以前存储的值。另一方面,例如
*p = 123;
未定义,因为存在左操作数的值计算,[expr.ass]/1:
In all cases, the assignment is sequenced after the value computation
of the right and left operands
因为左操作数应该是一个泛左值,所以泛左值引用的对象的标识必须按照 [intro.execution]/12 中表达式求值的定义来确定,这是不可能的(因此导致 UB)。
1 [表达式]/11:
In some contexts, an expression only appears for its side effects.
Such an expression is called a discarded-value expression. The
expression is evaluated and its value is discarded. […]. The lvalue-to-rvalue conversion (4.1) is
applied if and only if the expression is a glvalue of
volatile-qualified type and […]
runs fine and produces expected output instead of any runtime error.
这是一个基本的假设错误。您正在做的是 未定义的行为,这意味着您对 "expected output" 的 任何种类 的声明是错误的。
附录: 请注意,虽然存在 CWG 缺陷 (#315) report that is closed as "in agreement" of not making the above UB, it relies on the positive closing of another CWG defect (#232) 仍处于活动状态,因此将其中的 none 添加到标准。
让我引用 James McNellis to an answer 对类似 Stack Overflow 问题的评论的一部分:
I don't think CWG defect 315 is as "closed" as its presence on the "closed issues" page implies. The rationale says that it should be allowed because "*p is not an error when p is null unless the lvalue is converted to an rvalue." However, that relies on the concept of an "empty lvalue," which is part of the proposed resolution to CWG defect 232, but which has not been adopted.
表达式 d->fun
和 d->a()
都会导致计算 *d
([expr.ref]/2).
来自 [expr.unary.op]/1 的一元运算符 *
的完整定义是:
The unary *
operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
对于表达式 d
没有 "object or function to which the expression points" 。因此本段没有定义 *d
的行为。
因此代码因遗漏而未定义,因为评估 *d
的行为未在标准中的任何地方定义。
最近尝试了以下程序,它编译、运行良好并产生预期的输出,而不是任何运行时错误。
#include <iostream>
class demo
{
public:
static void fun()
{
std::cout<<"fun() is called\n";
}
static int a;
};
int demo::a=9;
int main()
{
demo* d=nullptr;
d->fun();
std::cout<<d->a;
return 0;
}
如果使用未初始化的指针访问 class and/or 结构成员行为未定义,但为什么也允许使用空指针访问静态成员。我的程序有什么坏处吗?
你在这里看到的是我认为在 C++ 语言和属于同一通用编程语言家族的许多其他语言的规范中的一个考虑不周和不幸的设计选择。
这些语言允许您使用对 class 实例的引用来引用 class 的静态成员。实例引用的实际值当然会被忽略,因为不需要实例来访问静态成员。
因此,在 d->fun();
中,编译器仅在编译期间 使用 d
指针 来确定您指的是 demo
class,然后忽略它。编译器不会发出任何代码来取消引用指针,因此它在运行时将为 NULL 的事实并不重要。
因此,您看到的情况完全符合语言规范,我认为规范在这方面受到了影响,因为它允许发生不合逻辑的事情:使用实例引用来引用静态成员。
P.S。大多数语言的大多数编译器实际上都能够针对此类内容发出警告。我不知道你的编译器,但你可能想检查一下,因为你没有收到警告,因为你没有收到警告,这可能意味着你没有启用足够的警告。
来自 C++ 草案标准 N3337:
9.4 Static members
2 A
static
member s of classX
may be referred to using the qualified-id expressionX::s
; it is not necessary to use the class member access syntax (5.2.5) to refer to astatic
member. Astatic
member may be referred to using the class member access syntax, in which case the object expression is evaluated.
并且在关于对象表达式的部分...
5.2.5 Class member access
4 If E2 is declared to have type “reference to T,” then E1.E2 is an lvalue; the type of E1.E2 is T. Otherwise, one of the following rules applies.
— If
E2
is astatic
data member and the type ofE2
isT
, thenE1.E2
is an lvalue; the expression designates the named member of the class. The type ofE1.E2
is T.
根据标准的最后一段,表达式:
d->fun();
std::cout << d->a;
之所以有效,是因为它们都指定了 class 的命名成员,而不管 d
.
TL;DR:您的示例定义明确。仅仅取消引用空指针并不会调用 UB。
关于这个话题有很多争论,基本上归结为通过空指针间接寻址本身是否是UB。
您的示例中发生的唯一有问题的事情是对象表达式的评估。特别是,根据 [expr.ref]/2:
d->a
等同于 (*d).a
The expression
E1->E2
is converted to the equivalent form(*(E1)).E2
; the remainder of 5.2.5 will address only the first option (dot).
*d
刚刚评价:
The postfix expression before the dot or arrow is evaluated;65 the result of that evaluation, together with the id-expression, determines the result of the entire postfix expression.
65) If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.
让我们提取代码的关键部分。考虑表达式语句
*d;
在此语句中,*d
是根据[stmt.expr]的弃值表达式。所以 *d
被单独评估 1,就像 d->a
.
因此,如果 *d;
有效,或者换句话说,表达式 *d
的求值,那么您的示例也是如此。
通过空指针的间接寻址本身会导致未定义的行为吗?
有一个开放的 CWG 问题 #232,创建于 15 多年前,它确实涉及这个问题。提出了一个非常重要的论点。报告以
开头At least a couple of places in the IS state that indirection through a null pointer produces undefined behavior: 1.9 [intro.execution] paragraph 4 gives "dereferencing the null pointer" as an example of undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses this supposedly undefined behavior as justification for the nonexistence of "null references."
请注意,所提到的示例已更改为涵盖 const
对象的修改,并且 [dcl.ref] 中的说明 - 虽然仍然存在 - 不是规范性的。删除了规范性段落以避免承诺。
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary "
*
" operator, does not say that the behavior is undefined if the operand is a null pointer, as one might expect. Furthermore, at least one passage gives dereferencing a null pointer well-defined behavior: 5.2.8 [expr.typeid] paragraph 2 saysIf the lvalue expression is obtained by applying the unary * operator to a pointer and the pointer is a null pointer value (4.10 [conv.ptr]), the typeid expression throws the bad_typeid exception (18.7.3 [bad.typeid]).
这是不一致的,应该清理。
最后一点尤为重要。 [expr.typeid] 中的引用仍然存在并且属于多态 class 类型的泛左值,如下例中的情况:
int main() try {
// Polymorphic type
class A
{
virtual ~A(){}
};
typeid( *((A*)0) );
}
catch (std::bad_typeid)
{
std::cerr << "bad_exception\n";
}
此程序的行为定义明确(将抛出并捕获异常),并且表达式 *((A*)0)
被计算 因为它不是部分一个未计算的操作数。现在如果通过空指针间接诱导 UB,则表达式写为
*((A*)0);
会这样做,导致 UB,与 typeid
场景相比,这似乎是荒谬的。 如果仅对上述表达式求值,因为每个丢弃值表达式都是1,那么在第二个片段 UB 中进行求值的关键区别在哪里? 没有现有的实现可以分析 typeid
-操作数,找到最里面的相应解引用并用检查包围它的操作数 - 也会有性能损失。
该期中的注释随后以以下内容结束了简短的讨论:
We agreed that the approach in the standard seems okay:
p = 0; *p;
is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior.
即委员会同意这一点。尽管这份报告中提出的引入所谓“empty lvalues”的决议从未被采纳……
However, “not modifiable” is a compile-time concept, while in fact this deals with runtime values and thus should produce undefined behavior instead. Also, there are other contexts in which lvalues can occur, such as the left operand of . or .*, which should also be restricted. Additional drafting is required.
…不影响理。话又说回来,应该注意的是,这个问题甚至早于 C++03,这使得当我们接近 C++17 时它的说服力会降低。
CWG 问题 #315 似乎也涵盖了您的案例:
Another instance to consider is that of invoking a member function from a null pointer:
struct A { void f () { } }; int main () { A* ap = 0; ap->f (); }
[…]
Rationale (October 2003):
We agreed the example should be allowed.
p->f()
is rewritten as(*p).f()
according to 5.2.5 [expr.ref].*p
is not an error whenp
is null unless the lvalue is converted to an rvalue (4.1 [conv.lval]), which it isn't here.
根据这个基本原理,如果没有进一步的左值到右值转换(=访问存储值)、引用绑定、值计算等,通过空指针的间接寻址本身不会调用 UB。 (注意:使用空指针调用 非静态 成员函数应该调用 UB,尽管 [class.mfct.non-static]/2 只是模糊地禁止它。基本原理已经过时在这方面。)
即仅评估 *d
不足以调用 UB。不需要对象的身份,也不需要它以前存储的值。另一方面,例如
*p = 123;
未定义,因为存在左操作数的值计算,[expr.ass]/1:
In all cases, the assignment is sequenced after the value computation of the right and left operands
因为左操作数应该是一个泛左值,所以泛左值引用的对象的标识必须按照 [intro.execution]/12 中表达式求值的定义来确定,这是不可能的(因此导致 UB)。
1 [表达式]/11:
In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression. The expression is evaluated and its value is discarded. […]. The lvalue-to-rvalue conversion (4.1) is applied if and only if the expression is a glvalue of volatile-qualified type and […]
runs fine and produces expected output instead of any runtime error.
这是一个基本的假设错误。您正在做的是 未定义的行为,这意味着您对 "expected output" 的 任何种类 的声明是错误的。
附录: 请注意,虽然存在 CWG 缺陷 (#315) report that is closed as "in agreement" of not making the above UB, it relies on the positive closing of another CWG defect (#232) 仍处于活动状态,因此将其中的 none 添加到标准。
让我引用 James McNellis to an answer 对类似 Stack Overflow 问题的评论的一部分:
I don't think CWG defect 315 is as "closed" as its presence on the "closed issues" page implies. The rationale says that it should be allowed because "*p is not an error when p is null unless the lvalue is converted to an rvalue." However, that relies on the concept of an "empty lvalue," which is part of the proposed resolution to CWG defect 232, but which has not been adopted.
表达式 d->fun
和 d->a()
都会导致计算 *d
([expr.ref]/2).
来自 [expr.unary.op]/1 的一元运算符 *
的完整定义是:
The unary
*
operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
对于表达式 d
没有 "object or function to which the expression points" 。因此本段没有定义 *d
的行为。
因此代码因遗漏而未定义,因为评估 *d
的行为未在标准中的任何地方定义。