在 C++ 标准中,格式良好是否意味着代码可以编译?
In the C++ standard does well-formed means that the code compiles?
C++ 标准将 well-formed programs
定义为
C ++ program constructed according to the syntax rules, diagnosable
semantic rules, and the one-definition rule
我想知道是否所有格式正确的程序都可以编译(如果不是这样,什么类型的错误会导致格式正确的程序和可编译问题之间的区别)。例如,包含歧义错误的程序是否会被视为格式正确?
I am wondering if all well-formed programs compile or not
当然不是,实际上。
一个典型的例子是当你要求 optimizations on a huge translation unit 包含长 C++ 函数时。
(但理论上是的)
当然参见 n3337 C++11 standard, or the C++17 标准。
这发生在我身上(旧)GCC MELT project. I was generating C++ code compiled by GCC, basically using transpiler (or source to source compilation) techniques on Lispy DSL of my invention to generate the C++ code of GCC plugins. See also and that。
实际上,如果你生成一个单个十万条语句的C++函数,编译器在优化它时会遇到麻烦。
在 GUI 代码生成器中可以生成大型 C++ 函数(例如 FLUID), or with some parser generators such as ANTLR (when the underlying input grammar is badly designed), interface generators such as SWIG, or by using preprocessors such as GPP or GNU m4 (like GNU autoconf does). C++ template
expansion may also produce arbitrarily large functions (e.g. when you combine several C++ container templates and ask the GCC compiler to optimize 在 link 时使用 g++ -flto -O2
)
我做过benchmark,前十年实验观察,编译一个n语句的C++函数可能需要O(n2 ) time (and IIRC O(n log n) space) with g++ -O3
. Notice that a good optimizing C++ compiler has to do register allocation, loop unrolling, inline expansion, that some ABIs (including on Linux/x86-64) mandate passing or returning small struct
-s (or instances of small class
-s) thru registers. All these optimizations require trade-offs and are hitting some combinatorial explosion wall: in practice, compiler optimization is at least an intractable problem, and probably an undecidable one. See also the related Rice's theorem and read the Dragon Book.
你可以改编我的 manydl.c program (generating more or less random C code compiled as several plugins then dlopen
-ing them on Linux) to emit C++. You'll then be able to do some GCC compiler benchmarks, since that manydl
program is able to generate hundred thousands plugins containing lots of more or less random C functions. See Drepper's paper how to write shared libraries and be aware of libgccjit.
另见 blog of the late Jacques Pitrat (1934-oct.2019) for an example of a C program generating the half millions lines of its own C code, whose design is explained in this paper and that book。
一个结构良好的程序可以有未定义的行为。
在注释中,因此在技术上不具有权威性,但似乎有意终止编译(或标准称其为“翻译”)在可能的UB范围内:
[intro.defs]
undefined behavior
behavior for which this document imposes no requirements
[ Note: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data.
Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.
Evaluation of a constant expression never exhibits behavior explicitly specified as undefined in [intro] through [cpp] of this document ([expr.const]).
— end note
]
也有实际的实施限制:
[implemits]
Because computers are finite, C++ implementations are inevitably limited in the size of the programs they can successfully process.
Every implementation shall document those limitations where known. This documentation may cite fixed limits where they exist, say how to compute variable limits as a function of available resources, or say that fixed limits do not exist or are unknown.
此外,编译器可能存在并且确实存在错误。格式良好只是意味着符合标准的编译器应该编译它(在上述限制范围内)。有问题的编译器不一定符合标准。
最后,标准文档本身是not perfect。如果对规则的含义存在分歧,那么一个程序可能在一种解释下为良构,而在另一种解释下为良构。
如果编译器与程序员或另一个编译器意见不一致,那么它可能无法编译出对方认为格式良好的程序。
C++ 标准将 well-formed programs
定义为
C ++ program constructed according to the syntax rules, diagnosable semantic rules, and the one-definition rule
我想知道是否所有格式正确的程序都可以编译(如果不是这样,什么类型的错误会导致格式正确的程序和可编译问题之间的区别)。例如,包含歧义错误的程序是否会被视为格式正确?
I am wondering if all well-formed programs compile or not
当然不是,实际上。
一个典型的例子是当你要求 optimizations on a huge translation unit 包含长 C++ 函数时。
(但理论上是的)
当然参见 n3337 C++11 standard, or the C++17 标准。
这发生在我身上(旧)GCC MELT project. I was generating C++ code compiled by GCC, basically using transpiler (or source to source compilation) techniques on Lispy DSL of my invention to generate the C++ code of GCC plugins. See also
实际上,如果你生成一个单个十万条语句的C++函数,编译器在优化它时会遇到麻烦。
在 GUI 代码生成器中可以生成大型 C++ 函数(例如 FLUID), or with some parser generators such as ANTLR (when the underlying input grammar is badly designed), interface generators such as SWIG, or by using preprocessors such as GPP or GNU m4 (like GNU autoconf does). C++ template
expansion may also produce arbitrarily large functions (e.g. when you combine several C++ container templates and ask the GCC compiler to optimize 在 link 时使用 g++ -flto -O2
)
我做过benchmark,前十年实验观察,编译一个n语句的C++函数可能需要O(n2 ) time (and IIRC O(n log n) space) with g++ -O3
. Notice that a good optimizing C++ compiler has to do register allocation, loop unrolling, inline expansion, that some ABIs (including on Linux/x86-64) mandate passing or returning small struct
-s (or instances of small class
-s) thru registers. All these optimizations require trade-offs and are hitting some combinatorial explosion wall: in practice, compiler optimization is at least an intractable problem, and probably an undecidable one. See also the related Rice's theorem and read the Dragon Book.
你可以改编我的 manydl.c program (generating more or less random C code compiled as several plugins then dlopen
-ing them on Linux) to emit C++. You'll then be able to do some GCC compiler benchmarks, since that manydl
program is able to generate hundred thousands plugins containing lots of more or less random C functions. See Drepper's paper how to write shared libraries and be aware of libgccjit.
另见 blog of the late Jacques Pitrat (1934-oct.2019) for an example of a C program generating the half millions lines of its own C code, whose design is explained in this paper and that book。
一个结构良好的程序可以有未定义的行为。
在注释中,因此在技术上不具有权威性,但似乎有意终止编译(或标准称其为“翻译”)在可能的UB范围内:
[intro.defs]
undefined behavior
behavior for which this document imposes no requirements
[ Note: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data.Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.
Evaluation of a constant expression never exhibits behavior explicitly specified as undefined in [intro] through [cpp] of this document ([expr.const]). — end note ]
也有实际的实施限制:
[implemits]
Because computers are finite, C++ implementations are inevitably limited in the size of the programs they can successfully process.
Every implementation shall document those limitations where known. This documentation may cite fixed limits where they exist, say how to compute variable limits as a function of available resources, or say that fixed limits do not exist or are unknown.
此外,编译器可能存在并且确实存在错误。格式良好只是意味着符合标准的编译器应该编译它(在上述限制范围内)。有问题的编译器不一定符合标准。
最后,标准文档本身是not perfect。如果对规则的含义存在分歧,那么一个程序可能在一种解释下为良构,而在另一种解释下为良构。
如果编译器与程序员或另一个编译器意见不一致,那么它可能无法编译出对方认为格式良好的程序。