C 和 C++ 中几乎相同的代码在执行时间上存在巨大差异 (x9)

Question

我试图从 www.spoj.com 解决这个练习：FCTRL - Factorial

你不必真正阅读它，如果你很好奇就去做吧:)

首先我用 C++ 实现了它（这是我的解决方案）：

#include <iostream>
using namespace std;

int main() {
    unsigned int num_of_inputs;
    unsigned int fact_num;
    unsigned int num_of_trailing_zeros;

    std::ios_base::sync_with_stdio(false); // turn off synchronization with the C library’s stdio buffers (from 

    cin >> num_of_inputs;

    while (num_of_inputs--)
    {
        cin >> fact_num;

        num_of_trailing_zeros = 0;

        for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
            num_of_trailing_zeros += fact_num/fives;

        cout << num_of_trailing_zeros << "\n";
    }

    return 0;
}

我上传它作为 g++ 5.1

的解决方案

结果是：时间 0.18 内存 3.3M

但是后来看到一些评论说他们的time execution不到0.1。由于我无法考虑更快的算法，因此我尝试在 C:

中实现相同的代码

#include <stdio.h>

int main() {
    unsigned int num_of_inputs;
    unsigned int fact_num;
    unsigned int num_of_trailing_zeros;

    scanf("%d", &num_of_inputs);

    while (num_of_inputs--)
    {
        scanf("%d", &fact_num);

        num_of_trailing_zeros = 0;

        for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
            num_of_trailing_zeros += fact_num/fives;

        printf("%d", num_of_trailing_zeros);
        printf("%s","\n");
    }

    return 0;
}

我上传它作为 gcc 5.1

的解决方案

这次的结果是：Time0.02Mem2.1M

现在代码 几乎相同，我按照建议 here 将 std::ios_base::sync_with_stdio(false); 添加到 C++ 代码以关闭与 C 的同步库的 stdio 缓冲区。我还将 printf("%d\n", num_of_trailing_zeros); 拆分为 printf("%d", num_of_trailing_zeros); printf("%s","\n"); 以补偿 cout << num_of_trailing_zeros << "\n";.

中 operator<< 的双重调用

但我仍然看到 x9 更好的性能 和 C 代码与 C++ 代码相比更低的内存使用率。

这是为什么？

编辑

我在 C 代码中将 unsigned long 修改为 unsigned int。它应该是 unsigned int，上面显示的结果与新的 (unsigned int) 版本有关。

Answer 1

两个程序做的事情完全一样。它们使用完全相同的算法，并且鉴于其低复杂性，它们的性能主要取决于输入和输出处理的效率。

一侧使用 scanf("%d", &fact_num); 扫描输入，另一侧使用 cin >> fact_num; 扫描输入，这两种方式的成本似乎都不是很高。事实上，在 C++ 中它的成本应该更低，因为转换类型在编译时是已知的，并且 C++ 编译器可以直接调用正确的解析器。这同样适用于输出。您甚至特意为 printf("%s","\n"); 编写单独的调用，但 C 编译器足以将其编译为对 putchar('\n');.

的调用

所以从I/O和计算的复杂度来看，C++版本应该比C版本快。

完全禁用 stdout 的缓冲会减慢 C 实现的速度，甚至比 C++ 版本还要慢。 AlexLop 在最后一个 printf 之后使用 fflush(stdout); 进行的另一项测试产生了与 C++ 版本相似的性能。它不像完全禁用缓冲那么慢，因为输出是以小块而不是一次一个字节的形式写入系统的。

这似乎指向您的 C++ 库中的特定行为：我怀疑您的系统对 cin 和 cout 的实现在从 [ 请求输入时将输出刷新为 cout =17=]。一些 C 库也这样做，但通常只在 reading/writing 进出终端时才这样做。 www.spoj.com 站点进行的基准测试可能会将输入和输出重定向到文件。

AlexLop 做了另一个测试：一次读取向量中的所有输入，然后计算和写入所有输出有助于理解为什么 C++ 版本慢得多。它提高了 C 版本的性能，这证明了我的观点并消除了对 C++ 格式化代码的怀疑。

Blastfurnace 的另一项测试，将所有输出存储在 std::ostringstream 中，并在最后一次爆炸中刷新，确实将 C++ 性能提高到基本 C 版本的性能。 QED.

Interlacing input from cin and output to cout seems to cause very inefficient I/O handling, defeating the stream buffering scheme. reducing performance by a factor of 10.

PS：您的算法对于 fact_num >= UINT_MAX / 5 是不正确的，因为 fives *= 5 在变成 > fact_num 之前会溢出并回绕。如果这些类型之一大于 unsigned int，您可以通过将 fives 设为 unsigned long 或 unsigned long long 来更正此问题。还使用 %u 作为 scanf 格式。你很幸运 www.spoj.com 的人在他们的基准测试中并不太严格。

编辑：vitaux 后来解释说，这种行为确实是 C++ 标准强制要求的。 cin 默认绑定到 cout。来自 cin 的输入缓冲区需要重新填充的输入操作将导致 cout 刷新挂起的输出。在 OP 的实现中，cin 似乎系统地刷新 cout，这有点矫枉过正并且效率明显低下。

Ilya Popov 为此提供了一个简单的解决方案：cin 可以通过施展除 std::ios_base::sync_with_stdio(false); 之外的另一个魔法咒语从 cout 解开：

cin.tie(nullptr);

另请注意，当使用 std::endl 而不是 '\n' 在 cout 上生成行尾时，也会发生这种强制刷新。将输出行更改为更加 C++ 惯用和无辜的外观 cout << num_of_trailing_zeros << endl; 会以同样的方式降低性能。

Answer 2

问题是，引用 cppreference:

any input from std::cin, output to std::cerr, or program termination forces a call to std::cout.flush()

这很容易测试：如果你替换

cin >> fact_num;

和

scanf("%d", &fact_num);

与 cin >> num_of_inputs 相同，但保持 cout 您将在 C++ 版本（或者更确切地说 IOStream 版本）中获得与 C 版本中几乎相同的性能：

如果保留 cin 但替换

，也会发生同样的情况

cout << num_of_trailing_zeros << "\n";

和

printf("%d", num_of_trailing_zeros);
printf("%s","\n");

一个简单的解决方案是解开 cout 和 cin，如 Ilya Popov 所述：

cin.tie(nullptr);

在某些情况下，允许标准库实现省略对 flush 的调用，但并非总是如此。这是 C++14 27.7.2.1.3 的引述（感谢 chqrlie）：

Class basic_istream::sentry : First, if is.tie() is not a null pointer, the function calls is.tie()->flush() to synchronize the output sequence with any associated external C stream. Except that this call can be suppressed if the put area of is.tie() is empty. Further an implementation is allowed to defer the call to flush until a call of is.rdbuf()->underflow() occurs. If no such call occurs before the sentry object is destroyed, the call to flush may be eliminated entirely.

Answer 3

同时使用 cin 和 cout 时使 iostream 更快的另一个技巧是调用

cin.tie(nullptr);

默认情况下，当您从 cin 输入任何内容时，它会刷新 cout。如果您执行交错的输入和输出，它会显着损害性能。这是为命令行界面使用而完成的，您在其中显示一些提示然后等待数据：

std::string name;
cout << "Enter your name:";
cin >> name;

在这种情况下，您需要确保在开始等待输入之前确实显示了提示。使用上面的线打破平局，cin 和 cout 变得独立。

自 C++11 起，使用 iostream 实现更好性能的另一种方法是将 std::getline 与 std::stoi 一起使用，如下所示：

std::string line;
for (int i = 0; i < n && std::getline(std::cin, line); ++i)
{
    int x = std::stoi(line);
}

这种方式在性能上可以接近C风格，甚至可以超越scanf。使用 getchar 尤其是 getchar_unlocked 与手写解析一起使用仍然可以提供更好的性能。

PS。我写过a post比较C++中几种输入数字的方法，对在线评委有用，但是只有俄语，不好意思。然而，代码示例和最终 table 应该是可以理解的。

C 和 C++ 中几乎相同的代码在执行时间上存在巨大差异 (x9)

Big difference (x9) in the execution time between almost identical code in C and C++

c

c++

performance

gcc

iostream