将字符串与指定格式匹配的最佳方法是什么？

Question

我要匹配字符串的格式是“from:<%s>”或“FROM:<%s>”。 %s 可以是代表电子邮件地址的任意长度的字符。

我一直在使用sscanf(input, "%*[fromFROM:<]%[@:-,.A-Za-z0-9]>", output)。但它没有捕捉到最后一个“>”丢失的情况。有没有一种干净的方法来检查输入字符串的格式是否正确？

Answer 1

您无法直接判断格式字符串中的尾部文字字符是否匹配； sscanf()) 无法直接报告他们的缺席。但是，有几个技巧可以完成这项工作：

选项 1：

int n = 0;
if (sscanf("%*[fromFROM:<]%[@:-,.A-Za-z0-9]>%n", email, &n) != 1)
    …error…
else if (n == 0)
    …missing >…

选项 2：

char c = '[=11=]';
if (sscanf("%*[fromFROM:<]%[@:-,.A-Za-z0-9]%c", email, &c) != 2)
    …error — malformed prefix or > missing…
else if (c != '>')
    …error — something other than > after email address…

请注意，'from' scan-set 将匹配 ROFF 或 MorfROM 或 <FROM:morf 作为电子邮件地址的前缀。这未免太慷慨了。事实上，它会匹配： from:<foofoomoo of from:<foofoomoo@example.com>，这是一个更严重的问题，尤其是当你把匹配的 material 全部扔掉时。您可能应该捕获值并更具体：

char c = '[=12=]';
char from[5];
if (sscanf("%4[fromFROM]:<%[@:-,.A-Za-z0-9]%[>]", from, email, &c) != 3)
    …error…
else if (strcasecmp(from, "FROM") != 0)
    …not from…
else if (c != '>')
    …missing >…

或者您可以将 strcmp() 与 from 和 FROM 进行比较，如果您需要的话。这里的选项很多。请注意 strcasecmp() is a POSIX-specific function; Microsoft provides the equivalent stricmp().

Answer 2

关于字符串的第一部分，如果你只想接受 FROM:< 或 from:< ，那么你可以简单地使用函数 strncmp 来实现这两种可能性。但是请注意，这意味着例如 From:< 将不被接受。在您的问题中，您暗示这就是您希望程序的行为方式，但我不确定是否确实如此。

一般来说，我不建议使用 sscanf 函数来完成如此复杂的任务，因为该函数不是很灵活。此外，在 ISO C 中，不保证在使用 %[] 格式说明符时支持字符范围（尽管大多数常见平台可能支持它）。因此，我建议“手动”检查字符串的各个部分：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>

bool is_valid_string( const char *line )
{
    const char *p;

    //verify that string starts with "from:<" or "FROM:<"
    if (
        strncmp( line, "from:<", 6 ) != 0
        &&
        strncmp( line, "FROM:<", 6 ) != 0
    )
    {
        return false;
    }

    //verify that there are no invalid characters before the `>`
    for ( p = line + 6; *p != '>'; p++ )
    {
        if ( *p == '[=10=]' )
            return false;

        if ( isalpha( (unsigned char)*p ) )
            continue;

        if ( isdigit( (unsigned char)*p ) )
            continue;

        if ( strchr( "@:-,.", *p) != NULL )
            continue;

        return false;
    }

    //jump past the '>' character
    p++;

    //verify that we are now at the end of the string
    if ( *p != '[=10=]' )
        return false;

    return true;
}

int main( void )
{
    char line[200];

    //read one line of input
    if ( fgets( line, sizeof line, stdin ) == NULL )
    {
        printf( "Input failure!\n" );
        exit( EXIT_FAILURE );
    }

    //remove newline character
    line[strcspn(line,"\n")] = '[=10=]';

    //call function and print result
    if ( is_valid_string ( line ) )
        printf( "VALID\n" );
    else
        printf( "INVALID\n" );
}

这个程序有以下输出：

This is an invalid string.
INVALID

from:<john.doe@example.com
INVALID

from:<john.doe@example.com>
VALID

FROM:<john.doe@example.com
INVALID

FROM:<john.doe@example.com>
VALID

FROM:<john.doe@example!!!!.com>            
INVALID

FROM:<john.doe@example.com>invalid
INVALID

Answer 3

使用"%n"。它记录 input[] 的扫描偏移量，如果扫描到那么远。

用于：

检测包含>.
的扫描成功
检测到额外垃圾。

不需要检查 sscanf() 的 return 值。

也使用宽度限制。

char output[100];
int n = 0;
// sscanf(input, "%*[fromFROM:<]%[@:-,.A-Za-z0-9]>", output);
sscanf(input, "%*[fromFROM]:<%99[@:-,.A-Za-z0-9]>%n", output);
//                            ^^ width           ^^
if (n == 0 || input[n] != '[=10=]') {
  puts("Error, scan incomplete or extra junk
}  else [
  puts("Success");
}

如果尾随 white-space，如 '\n'，可以，请使用 " %n"。

将字符串与指定格式匹配的最佳方法是什么？

What is the best way to match a string to specified format?

c

formatting