Rails 6- 符合 RFC5322 的电子邮件验证

Rails 6- RFC5322 compliant Email validation

这是 PCRE 正则表达式。 https://regex101.com/r/gJ7pU0/1 可以验证电子邮件地址。

ruby 是否有符合 RFC5322 的正则表达式? Ruby 有 URI::MailTo::EMAIL_REGEXP,但我认为它不符合 RFC5322。

另一个 post 提到了这个 'mail' gem,但我没有看到用它验证电子邮件地址的方法。

https://github.com/mikel/mail/tree/6b0ebb142c476bf7c00524effe513a4f151f59ab

符合 PERC RFC5322 标准

(?(DEFINE)
    (?<addr_spec> (?&local_part) @ (?&domain) )
    (?<local_part> (?&dot_atom) | (?&quoted_string) | (?&obs_local_part) )
    (?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
    (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dtext) )* (?&FWS)? \] (?&CFWS)? )
    (?<dtext> [\x21-\x5a] | [\x5e-\x7e] | (?&obs_dtext) )
    (?<quoted_pair> \ (?: (?&VCHAR) | (?&WSP) ) | (?&obs_qp) )
    (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)? )
    (?<dot_atom_text> (?&atext) (?: \. (?&atext) )* )
    (?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ )
    (?<atom> (?&CFWS)? (?&atext) (?&CFWS)? )
    (?<word> (?&atom) | (?&quoted_string) )
    (?<quoted_string> (?&CFWS)? " (?: (?&FWS)? (?&qcontent) )* (?&FWS)? " (?&CFWS)? )
    (?<qcontent> (?&qtext) | (?&quoted_pair) )
    (?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | (?&obs_qtext) )
    # comments and whitespace
    (?<FWS> (?: (?&WSP)* \r\n )? (?&WSP)+ | (?&obs_FWS) )
    (?<CFWS> (?: (?&FWS)? (?&comment) )+ (?&FWS)? | (?&FWS) )
    (?<comment> \( (?: (?&FWS)? (?&ccontent) )* (?&FWS)? \) )
    (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
    (?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | (?&obs_ctext) )
    # obsolete tokens
    (?<obs_domain> (?&atom) (?: \. (?&atom) )* )
    (?<obs_local_part> (?&word) (?: \. (?&word) )* )
    (?<obs_dtext> (?&obs_NO_WS_CTL) | (?&quoted_pair) )
    (?<obs_qp> \ (?: \x00 | (?&obs_NO_WS_CTL) | \n | \r ) )
    (?<obs_FWS> (?&WSP)+ (?: \r\n (?&WSP)+ )* )
    (?<obs_ctext> (?&obs_NO_WS_CTL) )
    (?<obs_qtext> (?&obs_NO_WS_CTL) )
    (?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f )
    # character class definitions
    (?<VCHAR> [\x21-\x7E] )
    (?<WSP> [ \t] )
)
^(?&addr_spec)$

PCRE 到 Onigmo recursion/subroutine 正则表达式转换为 straight-forward:

  • 移除不受支持的 (?(DEFINE)...) 结构
  • 将所有用于定义消费模式的命名组放在正则表达式的开头,并对所有这些应用一个 {0} 量词,这样它们就什么都不匹配了
  • (?&...)替换为\g<...>语法(我刚刚在Notepad++中用\(\?&(\w+)\)替换为\g<>)。

在 Ruby 中有效的最终表达式看起来像

re =/(?<addr_spec> \g<local_part> @ \g<domain> ){0}
(?<local_part> \g<dot_atom> | \g<quoted_string> | \g<obs_local_part> ){0}
(?<domain> \g<dot_atom> | \g<domain_literal> | \g<obs_domain> ){0}
(?<domain_literal> \g<CFWS>? \[ (?: \g<FWS>? \g<dtext> )* \g<FWS>? \] \g<CFWS>? ){0}
(?<dtext> [\x21-\x5a] | [\x5e-\x7e] | \g<obs_dtext> ){0}
(?<quoted_pair> \ (?: \g<VCHAR> | \g<WSP> ) | \g<obs_qp> ){0}
(?<dot_atom> \g<CFWS>? \g<dot_atom_text> \g<CFWS>? ){0}
(?<dot_atom_text> \g<atext> (?: \. \g<atext> )* ){0}
(?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ ){0}
(?<atom> \g<CFWS>? \g<atext> \g<CFWS>? ){0}
(?<word> \g<atom> | \g<quoted_string> ){0}
(?<quoted_string> \g<CFWS>? " (?: \g<FWS>? \g<qcontent> )* \g<FWS>? " \g<CFWS>? ){0}
(?<qcontent> \g<qtext> | \g<quoted_pair> ){0}
(?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | \g<obs_qtext> ){0}
# comments and whitespace
(?<FWS> (?: \g<WSP>* \r\n )? \g<WSP>+ | \g<obs_FWS> ){0}
(?<CFWS> (?: \g<FWS>? \g<comment> )+ \g<FWS>? | \g<FWS> ){0}
(?<comment> \( (?: \g<FWS>? \g<ccontent> )* \g<FWS>? \) ){0}
(?<ccontent> \g<ctext> | \g<quoted_pair> | \g<comment> ){0}
(?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | \g<obs_ctext> ){0}
# obsolete tokens
(?<obs_domain> \g<atom> (?: \. \g<atom> )* ){0}
(?<obs_local_part> \g<word> (?: \. \g<word> )* ){0}
(?<obs_dtext> \g<obs_NO_WS_CTL> | \g<quoted_pair> ){0}
(?<obs_qp> \ (?: \x00 | \g<obs_NO_WS_CTL> | \n | \r ) ){0}
(?<obs_FWS> \g<WSP>+ (?: \r\n \g<WSP>+ )* ){0}
(?<obs_ctext> \g<obs_NO_WS_CTL> ){0}
(?<obs_qtext> \g<obs_NO_WS_CTL> ){0}
(?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f ){0}
# character class definitions
(?<VCHAR> [\x21-\x7E] ){0}
(?<WSP> [ \t] ){0}
^\g<addr_spec>$/x

看到一个Ruby test:

p re.match?('+1~1+@iana.org')           # => true
p re.match?('test@[123.123.123.123')    # => false