为什么我的 python 3 实现比我用 C++ 编写的实现快得多?
Why is my python 3 implementation much faster than the one I wrote in C++?
我知道 C++ 应该比 Python3 快得多,因为它是编译语言而不是解释语言。
我写了2个
两个使用 Monte Carlo Simulation 计算 Pi 的程序,一个在 Python 3 中,另一个在 C++ 中。
Python 比 C++ 快大约 16 倍。如下图所示,重复值为 (10,000,000),Python 需要 8.5 秒,而 C++ 需要 137.4 秒。
我是 C++ 的新手,但我在网上找不到解释此行为的帖子。
根据 this post C++ 通常应该比 Python 快 10 到 100 倍,这显然不是我的情况。
请帮助我理解为什么 Python 在我的案例中比 C++ 快得多。
我的结果:
Monte Carlo C++ 中的模拟(Pi 的估计)
Monte Carlo Python 3 中的模拟(Pi 估计)
Python源代码:
from random import random
import time
import sys
class MonteCarloSimulator(object):
def __init__(self, value):
self.value = value
if sys.platform == "win32":
self.G = ''
self.R = ''
self.END = ''
else:
self.G = '3[92m'
self.R = '3[1;31m'
self.END = '3[0m'
def unit_circle(self, x, y):
if (x ** 2 + y ** 2) <= 1:
return True
else:
return False
def simulate(self):
print("\nProcessing calculations with a repetition value of " + self.R +
str(self.value) + self.END + " times.")
area_of_circle = 0
area_of_square = 0
start = time.clock()
for i in range(1, self.value):
x = random()
y = random()
if self.unit_circle(x, y):
area_of_circle += 1
area_of_square += 1
pi = (area_of_circle * 4) / area_of_square
runtime = time.clock() - start
print("\tCalculated Pi = " + self.G + str(pi) + self.END +
" ({0} seconds, {1} minutes)".format(round(runtime, 10),
round(runtime / 60, 10)))
print("Estimated Num of Pi is off by", abs(pi - 3.14159265359))
def main():
values = [1000, 10000, 100000, 1000000, 10000000, 100000000,1000000000, 10000000000]
for value in values: MonteCarloSimulator(value).simulate()
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("\nQuitting...")
sys.exit(1)
C++ 源代码:
#include <iostream> // std library
#include <random> // random number generator
#include <ctime> // calculating runtime
#include <cmath> // absolute value function
#include "MonteCarloSimmulation.hpp" // function prototypes
using namespace std;
const double g_PI {3.141592653589793238463};
int main()
{
// repitition values
long values[5] = {1000, 10000, 100000, 1000000, 10000000};//, 100000000, 1000000000, 10000000000};
// runs the simulation with the different repetition values
for (auto value : values)
simulate(value);
cout << "\nPress return to exit";
cin.get();
return 0;
}
/**
* The actual simulation
*/
void simulate(unsigned long value)
{
// start time for calculating runtime
const clock_t startTime = clock();
// area's variables
unsigned long area_of_circle = 0;
unsigned long area_of_square = 0;
// print the repitiion value
cout << "\nProcessing calculations with a repetition value of " << value <<
" times." << endl;
for (unsigned long i = 0; i != value; i++)
{
// gets random values from 0 to 1 for (x) and (y)
float x = randomFloat();
float y = randomFloat();
// checks if (x, y) are in a unit circle, if so increment circle area
if (unit_circle(x, y))
area_of_circle++;
area_of_square++;
}
// pi = area of circle * 4 / area of square
double calculatedPi = static_cast<double>(area_of_circle * 4) / area_of_square;
float endTime = static_cast<float>(clock() - startTime) / CLOCKS_PER_SEC;
// prints the value of calculated pi
cout << "\tCalculated Value of Pi: " << calculatedPi <<
" (" << endTime << " seconds, " << endTime/60 << " minutes)" << endl;
// difference between the calc value and pi
cout << "Estimated Num of Pi is off by " << abs(calculatedPi - g_PI) << '\n';
}
/**
* returns a random number from 0 to 1
*/
float randomFloat()
{
random_device rd;
default_random_engine generator(rd()); // rd() provides a random seed
uniform_real_distribution<float> distribution(0,1);
float x = distribution(generator);
return x;
}
/**
* checks if the two input parameters are inside a unit circle
*/
bool unit_circle(float x, float y)
{
if ((x*x + y*y) <= 1)
return true;
else
return false;
}
主要问题是您正在为 C++ 代码中的每个随机数重新播种随机数生成器。此外,您没有在启用优化的情况下进行编译 (-O3
)。
我将随机数生成器的初始化移到了 randomFloat
函数之外(同样,您可以在函数内部使用 static
变量):
random_device rd;
default_random_engine generator(rd()); // rd() provides a random seed
uniform_real_distribution<float> distribution(0,1);
float randomFloat() {
float x = distribution(generator);
return x;
}
并使用 -O3
编译,现在 C++ 比 Python
快得多
另一种可能是 python 和 C++ 代码使用不同的随机数生成器。 Python random
module (C code here) uses a MT19937 Mersenne Twister random number generator that is a fast PRNG optimized specifically for numerical problems such as Monte Carlo; the algorithm of default_random_engine
in C++ is implementation-defined. As pointed out by Melak47, you can force the use of MT19937 C++ 中的 PRNG 具有:
mt19937 generator(rd());
或
mt19937_64 generator(rd());
P.S., Python 优于 C++ 并非闻所未闻; C++ 算法重视通用性,而 Python 算法通常针对某些用例进行了相当优化。例如,请参阅 .
上的这个问题
主要成本是您的 randomFloat() C++ 方法。
构建 random_device、default_random_engine 和 uniform_real_distribution 每次迭代都非常浪费。
通过使这些静态化,我能够将 C++ 实现的速度提高 100 倍以上。但是你最好注入它们,或者将其包装在 class 中并制作它们实例成员。
#include <iostream> // std library
#include <random> // random number generator
#include <ctime> // calculating runtime
#include <cmath> // absolute value function
using namespace std;
const double g_PI {3.141592653589793238463};
void simulate(unsigned long value);
float randomFloat();
bool unit_circle(float x, float y);
int main()
{
// repitition values
long values[5] = {1000, 10000, 100000, 1000000, 10000000};//, 100000000, 1000000000, 10000000000};
// runs the simulation with the different repetition values
for (auto value : values)
simulate(value);
cout << "\nPress return to exit";
cin.get();
return 0;
}
/**
* The actual simulation
*/
void simulate(unsigned long value)
{
// start time for calculating runtime
const clock_t startTime = clock();
// area's variables
unsigned long area_of_circle = 0;
unsigned long area_of_square = 0;
// print the repitiion value
cout << "\nProcessing calculations with a repetition value of " << value <<
" times." << endl;
for (unsigned long i = 0; i != value; i++)
{
// gets random values from 0 to 1 for (x) and (y)
float x = randomFloat();
float y = randomFloat();
// checks if (x, y) are in a unit circle, if so increment circle area
if (unit_circle(x, y))
area_of_circle++;
area_of_square++;
}
// pi = area of circle * 4 / area of square
double calculatedPi = static_cast<double>(area_of_circle * 4) / area_of_square;
float endTime = static_cast<float>(clock() - startTime) / CLOCKS_PER_SEC;
// prints the value of calculated pi
cout << "\tCalculated Value of Pi: " << calculatedPi <<
" (" << endTime << " seconds, " << endTime/60 << " minutes)" << endl;
// difference between the calc value and pi
cout << "Estimated Num of Pi is off by " << abs(calculatedPi - g_PI) << '\n';
}
/**
* returns a random number from 0 to 1
*/
float randomFloat()
{
static random_device rd;
static default_random_engine generator(rd()); // rd() provides a random seed
static uniform_real_distribution<float> distribution(0,1);
float x = distribution(generator);
return x;
}
/**
* checks if the two input parameters are inside a unit circle
*/
bool unit_circle(float x, float y)
{
if ((x*x + y*y) <= 1)
return true;
else
return false;
}
原始实现日志
Processing calculations with a repetition value of 1000 times.
Calculated Value of Pi: 3.08 (0.019227 seconds, 0.00032045 minutes)
Estimated Num of Pi is off by 0.0615927
Processing calculations with a repetition value of 10000 times.
Calculated Value of Pi: 3.124 (0.162044 seconds, 0.00270073 minutes)
Estimated Num of Pi is off by 0.0175927
Processing calculations with a repetition value of 100000 times.
Calculated Value of Pi: 3.14568 (1.72181 seconds, 0.0286968 minutes)
Estimated Num of Pi is off by 0.00408735
//Couldn't be bothered to wait :P
使用静态随机生成器
Processing calculations with a repetition value of 1000 times.
Calculated Value of Pi: 3.136 (0.000144 seconds, 2.4e-06 minutes)
Estimated Num of Pi is off by 0.00559265
Processing calculations with a repetition value of 10000 times.
Calculated Value of Pi: 3.1824 (0.000596 seconds, 9.93333e-06 minutes)
Estimated Num of Pi is off by 0.0408073
Processing calculations with a repetition value of 100000 times.
Calculated Value of Pi: 3.14044 (0.005889 seconds, 9.815e-05 minutes)
Estimated Num of Pi is off by 0.00115265
Processing calculations with a repetition value of 1000000 times.
Calculated Value of Pi: 3.14278 (0.058896 seconds, 0.0009816 minutes)
Estimated Num of Pi is off by 0.00118335
Processing calculations with a repetition value of 10000000 times.
Calculated Value of Pi: 3.14165 (0.589034 seconds, 0.00981723 minutes)
Estimated Num of Pi is off by 6.09464e-05
并不是要回答您为什么 python 更快的问题,只是为了表明 python 可以更快、更整洁地获取此问题的事件。
在 python 中加快速度的可能性:
使用 numpy 向量化:
import numpy as np
def pi(N):
x, y = np.random.uniform(-1, 1, size=(2, N))
in_circle = np.count_nonzero(x**2 + y**2 <= 1)
return 4 * in_circle / N
和/或 numba 即时编译:
from numba import jit
import random
@jit
def pi(N):
in_circle = 0
for i in range(N):
x = 2 * random.random() - 1
y = 2 * random.random() - 1
if x**2 + y**2 <= 1:
in_circle += 1
return 4 * in_circle / N
我知道 C++ 应该比 Python3 快得多,因为它是编译语言而不是解释语言。
我写了2个 两个使用 Monte Carlo Simulation 计算 Pi 的程序,一个在 Python 3 中,另一个在 C++ 中。
Python 比 C++ 快大约 16 倍。如下图所示,重复值为 (10,000,000),Python 需要 8.5 秒,而 C++ 需要 137.4 秒。
我是 C++ 的新手,但我在网上找不到解释此行为的帖子。
根据 this post C++ 通常应该比 Python 快 10 到 100 倍,这显然不是我的情况。
请帮助我理解为什么 Python 在我的案例中比 C++ 快得多。
我的结果:
Monte Carlo C++ 中的模拟(Pi 的估计)
Monte Carlo Python 3 中的模拟(Pi 估计)
Python源代码:
from random import random
import time
import sys
class MonteCarloSimulator(object):
def __init__(self, value):
self.value = value
if sys.platform == "win32":
self.G = ''
self.R = ''
self.END = ''
else:
self.G = '3[92m'
self.R = '3[1;31m'
self.END = '3[0m'
def unit_circle(self, x, y):
if (x ** 2 + y ** 2) <= 1:
return True
else:
return False
def simulate(self):
print("\nProcessing calculations with a repetition value of " + self.R +
str(self.value) + self.END + " times.")
area_of_circle = 0
area_of_square = 0
start = time.clock()
for i in range(1, self.value):
x = random()
y = random()
if self.unit_circle(x, y):
area_of_circle += 1
area_of_square += 1
pi = (area_of_circle * 4) / area_of_square
runtime = time.clock() - start
print("\tCalculated Pi = " + self.G + str(pi) + self.END +
" ({0} seconds, {1} minutes)".format(round(runtime, 10),
round(runtime / 60, 10)))
print("Estimated Num of Pi is off by", abs(pi - 3.14159265359))
def main():
values = [1000, 10000, 100000, 1000000, 10000000, 100000000,1000000000, 10000000000]
for value in values: MonteCarloSimulator(value).simulate()
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("\nQuitting...")
sys.exit(1)
C++ 源代码:
#include <iostream> // std library
#include <random> // random number generator
#include <ctime> // calculating runtime
#include <cmath> // absolute value function
#include "MonteCarloSimmulation.hpp" // function prototypes
using namespace std;
const double g_PI {3.141592653589793238463};
int main()
{
// repitition values
long values[5] = {1000, 10000, 100000, 1000000, 10000000};//, 100000000, 1000000000, 10000000000};
// runs the simulation with the different repetition values
for (auto value : values)
simulate(value);
cout << "\nPress return to exit";
cin.get();
return 0;
}
/**
* The actual simulation
*/
void simulate(unsigned long value)
{
// start time for calculating runtime
const clock_t startTime = clock();
// area's variables
unsigned long area_of_circle = 0;
unsigned long area_of_square = 0;
// print the repitiion value
cout << "\nProcessing calculations with a repetition value of " << value <<
" times." << endl;
for (unsigned long i = 0; i != value; i++)
{
// gets random values from 0 to 1 for (x) and (y)
float x = randomFloat();
float y = randomFloat();
// checks if (x, y) are in a unit circle, if so increment circle area
if (unit_circle(x, y))
area_of_circle++;
area_of_square++;
}
// pi = area of circle * 4 / area of square
double calculatedPi = static_cast<double>(area_of_circle * 4) / area_of_square;
float endTime = static_cast<float>(clock() - startTime) / CLOCKS_PER_SEC;
// prints the value of calculated pi
cout << "\tCalculated Value of Pi: " << calculatedPi <<
" (" << endTime << " seconds, " << endTime/60 << " minutes)" << endl;
// difference between the calc value and pi
cout << "Estimated Num of Pi is off by " << abs(calculatedPi - g_PI) << '\n';
}
/**
* returns a random number from 0 to 1
*/
float randomFloat()
{
random_device rd;
default_random_engine generator(rd()); // rd() provides a random seed
uniform_real_distribution<float> distribution(0,1);
float x = distribution(generator);
return x;
}
/**
* checks if the two input parameters are inside a unit circle
*/
bool unit_circle(float x, float y)
{
if ((x*x + y*y) <= 1)
return true;
else
return false;
}
主要问题是您正在为 C++ 代码中的每个随机数重新播种随机数生成器。此外,您没有在启用优化的情况下进行编译 (-O3
)。
我将随机数生成器的初始化移到了 randomFloat
函数之外(同样,您可以在函数内部使用 static
变量):
random_device rd;
default_random_engine generator(rd()); // rd() provides a random seed
uniform_real_distribution<float> distribution(0,1);
float randomFloat() {
float x = distribution(generator);
return x;
}
并使用 -O3
编译,现在 C++ 比 Python
另一种可能是 python 和 C++ 代码使用不同的随机数生成器。 Python random
module (C code here) uses a MT19937 Mersenne Twister random number generator that is a fast PRNG optimized specifically for numerical problems such as Monte Carlo; the algorithm of default_random_engine
in C++ is implementation-defined. As pointed out by Melak47, you can force the use of MT19937 C++ 中的 PRNG 具有:
mt19937 generator(rd());
或
mt19937_64 generator(rd());
P.S., Python 优于 C++ 并非闻所未闻; C++ 算法重视通用性,而 Python 算法通常针对某些用例进行了相当优化。例如,请参阅
主要成本是您的 randomFloat() C++ 方法。
构建 random_device、default_random_engine 和 uniform_real_distribution 每次迭代都非常浪费。
通过使这些静态化,我能够将 C++ 实现的速度提高 100 倍以上。但是你最好注入它们,或者将其包装在 class 中并制作它们实例成员。
#include <iostream> // std library
#include <random> // random number generator
#include <ctime> // calculating runtime
#include <cmath> // absolute value function
using namespace std;
const double g_PI {3.141592653589793238463};
void simulate(unsigned long value);
float randomFloat();
bool unit_circle(float x, float y);
int main()
{
// repitition values
long values[5] = {1000, 10000, 100000, 1000000, 10000000};//, 100000000, 1000000000, 10000000000};
// runs the simulation with the different repetition values
for (auto value : values)
simulate(value);
cout << "\nPress return to exit";
cin.get();
return 0;
}
/**
* The actual simulation
*/
void simulate(unsigned long value)
{
// start time for calculating runtime
const clock_t startTime = clock();
// area's variables
unsigned long area_of_circle = 0;
unsigned long area_of_square = 0;
// print the repitiion value
cout << "\nProcessing calculations with a repetition value of " << value <<
" times." << endl;
for (unsigned long i = 0; i != value; i++)
{
// gets random values from 0 to 1 for (x) and (y)
float x = randomFloat();
float y = randomFloat();
// checks if (x, y) are in a unit circle, if so increment circle area
if (unit_circle(x, y))
area_of_circle++;
area_of_square++;
}
// pi = area of circle * 4 / area of square
double calculatedPi = static_cast<double>(area_of_circle * 4) / area_of_square;
float endTime = static_cast<float>(clock() - startTime) / CLOCKS_PER_SEC;
// prints the value of calculated pi
cout << "\tCalculated Value of Pi: " << calculatedPi <<
" (" << endTime << " seconds, " << endTime/60 << " minutes)" << endl;
// difference between the calc value and pi
cout << "Estimated Num of Pi is off by " << abs(calculatedPi - g_PI) << '\n';
}
/**
* returns a random number from 0 to 1
*/
float randomFloat()
{
static random_device rd;
static default_random_engine generator(rd()); // rd() provides a random seed
static uniform_real_distribution<float> distribution(0,1);
float x = distribution(generator);
return x;
}
/**
* checks if the two input parameters are inside a unit circle
*/
bool unit_circle(float x, float y)
{
if ((x*x + y*y) <= 1)
return true;
else
return false;
}
原始实现日志
Processing calculations with a repetition value of 1000 times.
Calculated Value of Pi: 3.08 (0.019227 seconds, 0.00032045 minutes)
Estimated Num of Pi is off by 0.0615927
Processing calculations with a repetition value of 10000 times.
Calculated Value of Pi: 3.124 (0.162044 seconds, 0.00270073 minutes)
Estimated Num of Pi is off by 0.0175927
Processing calculations with a repetition value of 100000 times.
Calculated Value of Pi: 3.14568 (1.72181 seconds, 0.0286968 minutes)
Estimated Num of Pi is off by 0.00408735
//Couldn't be bothered to wait :P
使用静态随机生成器
Processing calculations with a repetition value of 1000 times.
Calculated Value of Pi: 3.136 (0.000144 seconds, 2.4e-06 minutes)
Estimated Num of Pi is off by 0.00559265
Processing calculations with a repetition value of 10000 times.
Calculated Value of Pi: 3.1824 (0.000596 seconds, 9.93333e-06 minutes)
Estimated Num of Pi is off by 0.0408073
Processing calculations with a repetition value of 100000 times.
Calculated Value of Pi: 3.14044 (0.005889 seconds, 9.815e-05 minutes)
Estimated Num of Pi is off by 0.00115265
Processing calculations with a repetition value of 1000000 times.
Calculated Value of Pi: 3.14278 (0.058896 seconds, 0.0009816 minutes)
Estimated Num of Pi is off by 0.00118335
Processing calculations with a repetition value of 10000000 times.
Calculated Value of Pi: 3.14165 (0.589034 seconds, 0.00981723 minutes)
Estimated Num of Pi is off by 6.09464e-05
并不是要回答您为什么 python 更快的问题,只是为了表明 python 可以更快、更整洁地获取此问题的事件。
在 python 中加快速度的可能性:
使用 numpy 向量化:
import numpy as np
def pi(N):
x, y = np.random.uniform(-1, 1, size=(2, N))
in_circle = np.count_nonzero(x**2 + y**2 <= 1)
return 4 * in_circle / N
和/或 numba 即时编译:
from numba import jit
import random
@jit
def pi(N):
in_circle = 0
for i in range(N):
x = 2 * random.random() - 1
y = 2 * random.random() - 1
if x**2 + y**2 <= 1:
in_circle += 1
return 4 * in_circle / N