如何初始化指向输出缓冲区长度的指针?
How to initialize a pointer to the length of an output buffer?
我正在使用此代码使用 pcre2 库进行正则表达式替换:
PCRE2_SIZE outlengthptr=256; //this line
PCRE2_UCHAR* output_buffer; //this line
output_buffer=(PCRE2_UCHAR*)malloc(outlengthptr); //this line
uint32_t rplopts=PCRE2_SUBSTITUTE_GLOBAL;
int ret=pcre2_substitute(
re1234, /*Points to the compiled pattern*/
subject, /*Points to the subject string*/
subject_length, /*Length of the subject string*/
0, /*Offset in the subject at which to start matching*/
rplopts, /*Option bits*/
0, /*Points to a match data block, or is NULL*/
0, /*Points to a match context, or is NULL*/
replace, /*Points to the replacement string*/
replace_length, /*Length of the replacement string*/
output_buffer, /*Points to the output buffer*/
&outlengthptr /*Points to the length of the output buffer*/
);
但我似乎不明白如何正确定义 output_buffer
和指向它的长度 (outlengthptr
) 的指针。
当我给 outlengthptr
一个固定值时,代码工作,但它保持固定,即它不会更改为 output_buffer
的新长度。但是根据 pcre2_substitue()
specification 它应该被更改为 output_buffer
的新 lnegth:
The length, startoffset and rlength values are code units, not characters, as is the contents of the variable pointed at by outlengthptr
, which is updated to the actual length of the new string.
问题是:
- 当我将
outlengthptr
设置为固定值时,最终字符串会以固定长度被截断。
- 如果我不初始化变量
outlengthptr
,我会遇到分段错误。
这是函数的原型:
int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, PCRE2_SIZE startoffset, uint32_t options, pcre2_match_data *match_data, pcre2_match_context *mcontext, PCRE2_SPTR replacement, PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, PCRE2_SIZE *outlengthptr);
pcre2api page 说了以下内容(强调我的):
The function returns the number of replacements that were made. This may be zero if no matches were found, and is never greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL
is set. In the event of an error, a negative error code is returned. Except for PCRE2_ERROR_NOMATCH
(which is never returned), any errors from pcre2_match()
or the substring copying functions are passed straight back. PCRE2_ERROR_BADREPLACEMENT
is returned for an invalid replacement string (unrecognized sequence following a dollar sign), and PCRE2_ERROR_NOMEMORY
is returned if the output buffer is not big enough.
因此,从一个初始缓冲区开始,它应该可以容纳大部分结果 - 既不要太大也不要太小。这取决于您的应用程序。
例如,您可以尝试从输入字符串长度的 120% 作为启发式开始,因为这对于大多数常见的正则表达式替换用法来说似乎是一个合理的选择。
然后,使用此缓冲区调用函数,并将其大小传递给它。
- 如果你得到一个肯定的结果(或零),你就完成了。
- 如果得到
PCRE2_ERROR_NOMEMORY
,请将缓冲区大小加倍并重试(根据需要重复此步骤多次)
- 如果您得到不同的错误代码,请将其作为真正的错误案例进行相应处理。
我正在使用此代码使用 pcre2 库进行正则表达式替换:
PCRE2_SIZE outlengthptr=256; //this line
PCRE2_UCHAR* output_buffer; //this line
output_buffer=(PCRE2_UCHAR*)malloc(outlengthptr); //this line
uint32_t rplopts=PCRE2_SUBSTITUTE_GLOBAL;
int ret=pcre2_substitute(
re1234, /*Points to the compiled pattern*/
subject, /*Points to the subject string*/
subject_length, /*Length of the subject string*/
0, /*Offset in the subject at which to start matching*/
rplopts, /*Option bits*/
0, /*Points to a match data block, or is NULL*/
0, /*Points to a match context, or is NULL*/
replace, /*Points to the replacement string*/
replace_length, /*Length of the replacement string*/
output_buffer, /*Points to the output buffer*/
&outlengthptr /*Points to the length of the output buffer*/
);
但我似乎不明白如何正确定义 output_buffer
和指向它的长度 (outlengthptr
) 的指针。
当我给 outlengthptr
一个固定值时,代码工作,但它保持固定,即它不会更改为 output_buffer
的新长度。但是根据 pcre2_substitue()
specification 它应该被更改为 output_buffer
的新 lnegth:
The length, startoffset and rlength values are code units, not characters, as is the contents of the variable pointed at by
outlengthptr
, which is updated to the actual length of the new string.
问题是:
- 当我将
outlengthptr
设置为固定值时,最终字符串会以固定长度被截断。 - 如果我不初始化变量
outlengthptr
,我会遇到分段错误。
这是函数的原型:
int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, PCRE2_SIZE startoffset, uint32_t options, pcre2_match_data *match_data, pcre2_match_context *mcontext, PCRE2_SPTR replacement, PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, PCRE2_SIZE *outlengthptr);
pcre2api page 说了以下内容(强调我的):
The function returns the number of replacements that were made. This may be zero if no matches were found, and is never greater than 1 unless
PCRE2_SUBSTITUTE_GLOBAL
is set. In the event of an error, a negative error code is returned. Except forPCRE2_ERROR_NOMATCH
(which is never returned), any errors frompcre2_match()
or the substring copying functions are passed straight back.PCRE2_ERROR_BADREPLACEMENT
is returned for an invalid replacement string (unrecognized sequence following a dollar sign), andPCRE2_ERROR_NOMEMORY
is returned if the output buffer is not big enough.
因此,从一个初始缓冲区开始,它应该可以容纳大部分结果 - 既不要太大也不要太小。这取决于您的应用程序。
例如,您可以尝试从输入字符串长度的 120% 作为启发式开始,因为这对于大多数常见的正则表达式替换用法来说似乎是一个合理的选择。
然后,使用此缓冲区调用函数,并将其大小传递给它。
- 如果你得到一个肯定的结果(或零),你就完成了。
- 如果得到
PCRE2_ERROR_NOMEMORY
,请将缓冲区大小加倍并重试(根据需要重复此步骤多次) - 如果您得到不同的错误代码,请将其作为真正的错误案例进行相应处理。