如何在 C++ 中使用正则表达式换行后不捕获空格

Question

我试图从 c/c++/java 文件中捕获注释，但我找不到跳过换行后可能存在的空格的方法。我的正则表达式模式是

regex reg("(//.*|/\*(.|\n)*?\*/)");

例如在下面的代码中（不要理会随机代码片段，它们可以是任何东西......）我正确地捕捉到了评论：

// my  program in C++
#include <iostream>
/** playing around in
a new programming language **/
using namespace std;

输出为：

// my  program in C++
/** playing around in
a new programming language **/

但是，当我的代码在多行注释中包含空格时，例如：

int main(){
        /* start always points to the first node of the linked list.
           temp is used to point to the last node of the linked list.*/
        node *start,*temp;
        start = (node *)malloc(sizeof(node));
        temp = start;
        temp -> next = NULL;
        temp -> prev = NULL;
        /* Here in this code, we take the first node as a dummy node.
           The first node does not contain data, but it used because to avoid handling special cases
           in insert and delete functions.
         */
        printf("1. Insert\n");

我捕获：

/* start always points to the first node of the linked list.
           temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
           The first node does not contain data, but it used because to avoid handling special cases
           in insert and delete functions.
         */

而不是：

/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/

如何在正则表达式模式中绕过它来避免这种情况？

注意：如果可能的话，我想避免字符串操纵器等，只需修改正则表达式即可。

Answer 1

正在转换我上面的评论。

无法匹配不连续的文本。相反，您可以将文本的一部分与正则表达式匹配，然后 post- 使用另一个正则表达式或字符串操作处理匹配（或捕获）的值。

这是一个例子（不是最好的，只是为了展示概念）：

string data("int main(){// Singleline content\n        /* start always points to the first node of the linked list.\n           temp is used to point to the last node of the linked list.*/\n        node *start,*temp;\n        start = (node *)malloc(sizeof(node));\n        temp = start;\n        temp -> next = NULL;\n        temp -> prev = NULL;\n        /* Here in this code, we take the first node as a dummy node.\n           The first node does not contain data, but it used because to avoid handling special cases\n           in insert and delete functions.\n         */\n        printf(\"1. Insert\n\");");
    //std::cout << "Data: " << data << std::endl;
    std::regex pattern(R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)");
    std::smatch result;

    while (regex_search(data, result, pattern)) {
        std::cout << std::regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "") << std::endl;
        data = result.suffix().str();
    }

见the IDEONE demo

注意：原始字符串文字简化了正则表达式定义。

R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)" 匹配 // + 任何 0+ 个字符但换行符（单行注释）并且 /\*[^*]*\*+(?:[^/*][^*]*\*+)*/ 匹配 /* 后跟 0+ 个非 *s 后跟 1+ *s 后跟 0+ 字符序列而不是 / 和 * 然后 0+ 非 * 然后1+ *s（多行注释）。这个多行注释比你的注释更有效率，因为它是按照 acc 编写的。展开循环技术。

我删除了行上的第一个水平空格与regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "")：(^|\n)[^\S\r\n]+匹配并捕获字符串开头的锚点或换行符，后跟除非空白、CR 和 LF 以外的 1+ 个字符。

如何在 C++ 中使用正则表达式换行后不捕获空格

How to not capture whitespaces after a new line with regex in c++

c++

regex

newline

removing-whitespace