减少正则表达式代码中所需指针变量的数量

Question

我终于设法让我的正则表达式函数工作了，但我想知道我是否可以将主函数中的指针声明数量减少到一个。比如我要转换：

int main(){
    regex* r=calloc(1,300000);
    regex* rr=r;
    if (regexmatch("T/2/b","([A-Z]+)/([0-9]+)/([a-z]+)",10,&rr)==0){
    printres(r);
    }
    free(r);
    return 0;
}

类似于：

int main(){
    regex* r=calloc(1,300000);
    if (regexmatch("T/2/b","([A-Z]+)/([0-9]+)/([a-z]+)",10,&r)==0){
    printres(r);
    }
    free(r);
    return 0;
}

但就目前而言，这不起作用，因为 regexmatch 函数似乎更改了导致程序在 free(r);

处崩溃的变量地址

我什至尝试在函数中的最后一个 return 语句之前添加 reg=rp; 希望我将结构变量地址重置为函数首次调用时的地址，但那没有没用。

我该怎么做才能解决这个问题？或者是我在主函数中使用两个指针的唯一选择？

这是我的代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>

typedef struct{
    char str[1000];
} regex;

long regexmatch(const char* str,const char* regexs,const size_t nummatch,regex** rp){
    regex** reg=rp;
    regex_t r;regmatch_t match[nummatch];
    if (regcomp(&r,regexs,REG_EXTENDED) != 0){return -1;}
    if (regexec(&r,str,nummatch,match,0)!=0){regfree(&r);return -1;}
    regfree(&r);size_t i=0;
    for (i=0;i<nummatch;i++){
        if (match[i].rm_so > -1){
            unsigned long sz=match[i].rm_eo-match[i].rm_so;
            if (sz > 0 && sz < 1000){
            memcpy((**reg).str,(char*)(str+match[i].rm_so),sz);
            (*reg)++;
            }
        }
    }
    (**reg).str[0]='[=12=]';
    return 0;
}

void printres(regex* r){
    printf("Matches\n");
    while (r->str[0] != '[=12=]'){
    printf("%s\n",r->str);
    r++;
    }
}

int main(){
    regex* r=calloc(1,300000);
    regex* rr=r;
    if (regexmatch("T/2/b","([A-Z]+)/([0-9]+)/([a-z]+)",10,&rr)==0){
    printres(r);
    }
    free(r);
    return 0;
}

Answer 1

在regexmatch中添加：regex* rp2=*rp;regex** reg=&rp2;

在您的代码中 (*reg)++; 正在修改 rp 的值。它相当于你代码中的 rp++ 因为 regex** reg=rp; rp 是由 &r 在 regexmatch 调用中设置的指向您的 calloc 的指针地址。你不想改变这个指针。所以我们使用另一个指针 rp2.

Answer 2

为什么要通过引用传递 rp？您显然不希望调用程序中的值发生变化，因此直接传递它会更简单。

事实上，您真正想要的参数是 regex 个对象的数组（[注 1]）。所以不要使用原型

long regexmatch(const char* str,
                const char* regexs,
                const size_t nummatch, /* This const is pointless */
                regex** rp);

使用更有意义：

long regexmatch(const char* str,
                const char* regexs,
                size_t nummatch,
                regex rp[nummatch]);

（实际上，这与使用 regex* rp 作为参数相同，但将其写为 rp[nummatch] 更易于自我记录。因为您使用空字符串作为终止符（这意味着你不能处理零长度捕获），你实际上需要 nummatch 至少比模式中的捕获数大一，所以它不是 100% 自我记录。

对原型进行更改后，您需要删除函数中的一级间接：

long regexmatch(const char* str,
                const char* regexs,
                size_t nummatch,
                regex reg[nummatch]){
    /* Compiling the regex is the same as in your code. I removed
     * the assignment of reg from rp, since the parameter is now
     * called reg.
     */

    size_t i=0;
    for (i=0;i<nummatch;i++){
        if (match[i].rm_so > -1){
            unsigned long sz=match[i].rm_eo-match[i].rm_so;
            if (sz > 0 && sz < 1000){
                memcpy(reg->str, (char*)(str+match[i].rm_so), sz);
                /* The above memcpy doesn't nul-terminate the string,
                 * so I added an explicit nul-termination.
                 */
                reg->str[sz] = 0;
                /* I think this should be outside the if statement. Personally,
                 * I'd put it in the increment clause of the for loop.
                 * See Note 2.
                 */
                reg++;  
            }
        }
    }
    reg->str[0] = 0;
    return 0;
}

(见live on ideone.)

备注

我发现将 regex 称为代表正则表达式捕获的基本上固定长度的字符串令人困惑。此外，我看不到将固定长度字符分配包装在 struct 中的意义，除非您计划分配给 regex 类型的值。但所有这些都与你的基本问题无关。
当您将捕获复制到您的正则表达式数组中时，您只需忽略空的、未设置的或太长的捕获。这意味着您无法通过查看数组中的第 i^th 元素来访问捕获 i。如果前一个捕获恰好为空，则捕获将位于位置 i - 1（或更早，如果多个捕获为空），并且根本无法访问空捕获。我不确定您的目标到底是什么，但这似乎很难使用。但是，由于您使用空字符串来指示捕获列表的结尾，因此您不能将空捕获插入到列表中。所以你真的可能想重新考虑 API.

一种更简洁的方法是使用指向字符串的指针数组（如 argv。）

减少正则表达式代码中所需指针变量的数量

Reducing number of required pointer variables in regex code

c

regex

struct

pointers

memory-address

备注