C++ 程序字符串搜索速度可以达到 and/or 比 python 快吗?
Can C++ program string search as fast as and/or faster than python?
我不确定为什么我在用 python 编写的程序中搜索字符串比在 C++ 中编写的程序更快。有没有我遗漏的技巧?
生成用例
这是针对单行用例,但在实际用例中我关心多行。
#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
void main(void){
ofstream testfile;
unsigned int line_idx = 0;
testfile.open("testfile.txt");
for(line_idx = 0; line_idx < 50000u; line_idx++)
{
if(line_idx != 43268u )
{
testfile << line_idx << " dontcare" << std::endl;
}
else
{
testfile << line_idx << " care" << std::endl;
}
}
testfile.close();
}
正则表达式
使用正则表达式 ^(\d*)\s(care)$
C++ 程序 需要 13.954 秒
#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
void main(void){
double duration;
std::clock_t start;
ifstream testfile("testfile.txt", ios_base::in);
unsigned int line_idx = 0;
bool found = false;
string line;
regex ptrn("^(\d*)\s(care)$");
start = std::clock(); /* Debug time */
while (getline(testfile, line))
{
std::smatch matches;
if(regex_search(line, matches, ptrn))
{
found = true;
}
}
testfile.close();
duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
std::cout << " Total time: " << duration << std::endl;
}
Python 程序 需要 0.02200 秒
import sys, os # to navigate and open files
import re # to search file
import time # to benchmark
ptrn = re.compile(r'^(\d*)\s(care)$', re.MULTILINE)
start = time.time()
with open('testfile.txt','r') as testfile:
filetext = testfile.read()
matches = re.findall(ptrn, filetext)
print("Found? " + "Yes" if len(matches) == 1 else "No")
end = time.time()
print("Total time", end - start)
将 Ratah 的建议执行到 8.923
大约 5 秒的改进,通过将文件读取为单个字符串
double duration;
std::clock_t start;
ifstream testfile("testfile.txt", ios_base::in);
unsigned int line_idx = 0;
bool found = false;
string line;
regex ptrn("^(\d*)\s(care)$");
std::smatch matches;
start = std::clock(); /* Debug time */
std::string test_str((std::istreambuf_iterator<char>(testfile)),
std::istreambuf_iterator<char>());
if(regex_search(test_str, matches, ptrn))
{
found = true;
}
testfile.close();
duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
std::cout << " Total time: " << duration << std::endl;
根据 UKMonkey 的说明,将项目重新配置为发布,其中还包括 \O2 并将其降低到 0.086 秒
感谢 Jean-Francois Fabre、Ratah、UKMonkey
我不确定为什么我在用 python 编写的程序中搜索字符串比在 C++ 中编写的程序更快。有没有我遗漏的技巧?
生成用例
这是针对单行用例,但在实际用例中我关心多行。
#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
void main(void){
ofstream testfile;
unsigned int line_idx = 0;
testfile.open("testfile.txt");
for(line_idx = 0; line_idx < 50000u; line_idx++)
{
if(line_idx != 43268u )
{
testfile << line_idx << " dontcare" << std::endl;
}
else
{
testfile << line_idx << " care" << std::endl;
}
}
testfile.close();
}
正则表达式
使用正则表达式 ^(\d*)\s(care)$
C++ 程序 需要 13.954 秒
#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
void main(void){
double duration;
std::clock_t start;
ifstream testfile("testfile.txt", ios_base::in);
unsigned int line_idx = 0;
bool found = false;
string line;
regex ptrn("^(\d*)\s(care)$");
start = std::clock(); /* Debug time */
while (getline(testfile, line))
{
std::smatch matches;
if(regex_search(line, matches, ptrn))
{
found = true;
}
}
testfile.close();
duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
std::cout << " Total time: " << duration << std::endl;
}
Python 程序 需要 0.02200 秒
import sys, os # to navigate and open files
import re # to search file
import time # to benchmark
ptrn = re.compile(r'^(\d*)\s(care)$', re.MULTILINE)
start = time.time()
with open('testfile.txt','r') as testfile:
filetext = testfile.read()
matches = re.findall(ptrn, filetext)
print("Found? " + "Yes" if len(matches) == 1 else "No")
end = time.time()
print("Total time", end - start)
将 Ratah 的建议执行到 8.923
大约 5 秒的改进,通过将文件读取为单个字符串
double duration;
std::clock_t start;
ifstream testfile("testfile.txt", ios_base::in);
unsigned int line_idx = 0;
bool found = false;
string line;
regex ptrn("^(\d*)\s(care)$");
std::smatch matches;
start = std::clock(); /* Debug time */
std::string test_str((std::istreambuf_iterator<char>(testfile)),
std::istreambuf_iterator<char>());
if(regex_search(test_str, matches, ptrn))
{
found = true;
}
testfile.close();
duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
std::cout << " Total time: " << duration << std::endl;
根据 UKMonkey 的说明,将项目重新配置为发布,其中还包括 \O2 并将其降低到 0.086 秒
感谢 Jean-Francois Fabre、Ratah、UKMonkey