如何在 C++ 中使用正则表达式换行后不捕获空格
How to not capture whitespaces after a new line with regex in c++
我试图从 c/c++/java 文件中捕获注释,但我找不到跳过换行后可能存在的空格的方法。
我的正则表达式模式是
regex reg("(//.*|/\*(.|\n)*?\*/)");
例如在下面的代码中(不要理会随机代码片段,它们可以是任何东西......)我正确地捕捉到了评论:
// my program in C++
#include <iostream>
/** playing around in
a new programming language **/
using namespace std;
输出为:
// my program in C++
/** playing around in
a new programming language **/
但是,当我的代码在多行注释中包含空格时,例如:
int main(){
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
node *start,*temp;
start = (node *)malloc(sizeof(node));
temp = start;
temp -> next = NULL;
temp -> prev = NULL;
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
printf("1. Insert\n");
我捕获:
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
而不是:
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
如何在正则表达式模式中绕过它来避免这种情况?
注意:如果可能的话,我想避免字符串操纵器等,只需修改正则表达式即可。
正在转换我上面的评论。
无法匹配不连续的文本。相反,您可以将文本的一部分与正则表达式匹配,然后 post- 使用另一个正则表达式或字符串操作处理匹配(或捕获)的值。
这是一个例子(不是最好的,只是为了展示概念):
string data("int main(){// Singleline content\n /* start always points to the first node of the linked list.\n temp is used to point to the last node of the linked list.*/\n node *start,*temp;\n start = (node *)malloc(sizeof(node));\n temp = start;\n temp -> next = NULL;\n temp -> prev = NULL;\n /* Here in this code, we take the first node as a dummy node.\n The first node does not contain data, but it used because to avoid handling special cases\n in insert and delete functions.\n */\n printf(\"1. Insert\n\");");
//std::cout << "Data: " << data << std::endl;
std::regex pattern(R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)");
std::smatch result;
while (regex_search(data, result, pattern)) {
std::cout << std::regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "") << std::endl;
data = result.suffix().str();
}
注意:原始字符串文字简化了正则表达式定义。
R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)"
匹配 //
+ 任何 0+ 个字符但换行符(单行注释)并且 /\*[^*]*\*+(?:[^/*][^*]*\*+)*/
匹配 /*
后跟 0+ 个非 *
s 后跟 1+ *
s 后跟 0+ 字符序列而不是 /
和 *
然后 0+ 非 *
然后1+ *
s(多行注释)。这个多行注释比你的注释更有效率,因为它是按照 acc 编写的。展开循环技术。
我删除了行上的第一个水平空格与regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "")
:(^|\n)[^\S\r\n]+
匹配并捕获字符串开头的锚点或换行符,后跟除非空白、CR 和 LF 以外的 1+ 个字符。
我试图从 c/c++/java 文件中捕获注释,但我找不到跳过换行后可能存在的空格的方法。 我的正则表达式模式是
regex reg("(//.*|/\*(.|\n)*?\*/)");
例如在下面的代码中(不要理会随机代码片段,它们可以是任何东西......)我正确地捕捉到了评论:
// my program in C++
#include <iostream>
/** playing around in
a new programming language **/
using namespace std;
输出为:
// my program in C++
/** playing around in
a new programming language **/
但是,当我的代码在多行注释中包含空格时,例如:
int main(){
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
node *start,*temp;
start = (node *)malloc(sizeof(node));
temp = start;
temp -> next = NULL;
temp -> prev = NULL;
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
printf("1. Insert\n");
我捕获:
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
而不是:
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
如何在正则表达式模式中绕过它来避免这种情况?
注意:如果可能的话,我想避免字符串操纵器等,只需修改正则表达式即可。
正在转换我上面的评论。
无法匹配不连续的文本。相反,您可以将文本的一部分与正则表达式匹配,然后 post- 使用另一个正则表达式或字符串操作处理匹配(或捕获)的值。
这是一个例子(不是最好的,只是为了展示概念):
string data("int main(){// Singleline content\n /* start always points to the first node of the linked list.\n temp is used to point to the last node of the linked list.*/\n node *start,*temp;\n start = (node *)malloc(sizeof(node));\n temp = start;\n temp -> next = NULL;\n temp -> prev = NULL;\n /* Here in this code, we take the first node as a dummy node.\n The first node does not contain data, but it used because to avoid handling special cases\n in insert and delete functions.\n */\n printf(\"1. Insert\n\");");
//std::cout << "Data: " << data << std::endl;
std::regex pattern(R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)");
std::smatch result;
while (regex_search(data, result, pattern)) {
std::cout << std::regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "") << std::endl;
data = result.suffix().str();
}
注意:原始字符串文字简化了正则表达式定义。
R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)"
匹配 //
+ 任何 0+ 个字符但换行符(单行注释)并且 /\*[^*]*\*+(?:[^/*][^*]*\*+)*/
匹配 /*
后跟 0+ 个非 *
s 后跟 1+ *
s 后跟 0+ 字符序列而不是 /
和 *
然后 0+ 非 *
然后1+ *
s(多行注释)。这个多行注释比你的注释更有效率,因为它是按照 acc 编写的。展开循环技术。
我删除了行上的第一个水平空格与regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "")
:(^|\n)[^\S\r\n]+
匹配并捕获字符串开头的锚点或换行符,后跟除非空白、CR 和 LF 以外的 1+ 个字符。