C++ 程序字符串搜索速度可以达到 and/or 比 python 快吗?

Can C++ program string search as fast as and/or faster than python?

我不确定为什么我在用 python 编写的程序中搜索字符串比在 C++ 中编写的程序更快。有没有我遗漏的技巧?

生成用例

这是针对单行用例,但在实际用例中我关心多行。

#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;
void main(void){
   ofstream testfile;
   unsigned int line_idx = 0;
   testfile.open("testfile.txt");
   for(line_idx = 0; line_idx < 50000u; line_idx++)
   {
      if(line_idx != 43268u )
      {
        testfile << line_idx << " dontcare" << std::endl;
      }
      else
      {
        testfile << line_idx << " care" << std::endl;
      }
   }
   testfile.close();
}

正则表达式 使用正则表达式 ^(\d*)\s(care)$

C++ 程序 需要 13.954 秒

#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

void main(void){
   double duration;
   std::clock_t start;
   ifstream testfile("testfile.txt", ios_base::in);
   unsigned int line_idx = 0;
   bool found = false;
   string line;
   regex ptrn("^(\d*)\s(care)$");

   start = std::clock();   /* Debug time */
   while (getline(testfile, line)) 
   {
      std::smatch matches;
      if(regex_search(line, matches, ptrn))
      {
         found = true;
      }
   }
   testfile.close();
   duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
   std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
   std::cout << " Total time: " <<  duration << std::endl;
}

Python 程序 需要 0.02200 秒

import sys, os       # to navigate and open files
import re            # to search file
import time          # to benchmark

ptrn  = re.compile(r'^(\d*)\s(care)$', re.MULTILINE)

start = time.time()
with open('testfile.txt','r') as testfile:
   filetext = testfile.read()
   matches = re.findall(ptrn, filetext)
   print("Found? " + "Yes" if len(matches) == 1 else "No")

end = time.time()
print("Total time", end - start)

将 Ratah 的建议执行到 8.923

大约 5 秒的改进,通过将文件读取为单个字符串

   double duration;
   std::clock_t start;
   ifstream testfile("testfile.txt", ios_base::in);
   unsigned int line_idx = 0;
   bool found = false;
   string line;
   regex ptrn("^(\d*)\s(care)$");
   std::smatch matches;

   start = std::clock();   /* Debug time */
   std::string test_str((std::istreambuf_iterator<char>(testfile)),
                 std::istreambuf_iterator<char>());

   if(regex_search(test_str, matches, ptrn))
   {
      found = true;
   }
   testfile.close();
   duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
   std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
   std::cout << " Total time: " <<  duration << std::endl;

根据 UKMonkey 的说明,将项目重新配置为发布,其中还包括 \O2 并将其降低到 0.086 秒

感谢 Jean-Francois Fabre、Ratah、UKMonkey