Rails 6- 符合 RFC5322 的电子邮件验证
Rails 6- RFC5322 compliant Email validation
这是 PCRE 正则表达式。 https://regex101.com/r/gJ7pU0/1 可以验证电子邮件地址。
ruby 是否有符合 RFC5322 的正则表达式? Ruby 有 URI::MailTo::EMAIL_REGEXP
,但我认为它不符合 RFC5322。
另一个 post 提到了这个 'mail' gem,但我没有看到用它验证电子邮件地址的方法。
https://github.com/mikel/mail/tree/6b0ebb142c476bf7c00524effe513a4f151f59ab
符合 PERC RFC5322 标准
(?(DEFINE)
(?<addr_spec> (?&local_part) @ (?&domain) )
(?<local_part> (?&dot_atom) | (?"ed_string) | (?&obs_local_part) )
(?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
(?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dtext) )* (?&FWS)? \] (?&CFWS)? )
(?<dtext> [\x21-\x5a] | [\x5e-\x7e] | (?&obs_dtext) )
(?<quoted_pair> \ (?: (?&VCHAR) | (?&WSP) ) | (?&obs_qp) )
(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)? )
(?<dot_atom_text> (?&atext) (?: \. (?&atext) )* )
(?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ )
(?<atom> (?&CFWS)? (?&atext) (?&CFWS)? )
(?<word> (?&atom) | (?"ed_string) )
(?<quoted_string> (?&CFWS)? " (?: (?&FWS)? (?&qcontent) )* (?&FWS)? " (?&CFWS)? )
(?<qcontent> (?&qtext) | (?"ed_pair) )
(?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | (?&obs_qtext) )
# comments and whitespace
(?<FWS> (?: (?&WSP)* \r\n )? (?&WSP)+ | (?&obs_FWS) )
(?<CFWS> (?: (?&FWS)? (?&comment) )+ (?&FWS)? | (?&FWS) )
(?<comment> \( (?: (?&FWS)? (?&ccontent) )* (?&FWS)? \) )
(?<ccontent> (?&ctext) | (?"ed_pair) | (?&comment) )
(?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | (?&obs_ctext) )
# obsolete tokens
(?<obs_domain> (?&atom) (?: \. (?&atom) )* )
(?<obs_local_part> (?&word) (?: \. (?&word) )* )
(?<obs_dtext> (?&obs_NO_WS_CTL) | (?"ed_pair) )
(?<obs_qp> \ (?: \x00 | (?&obs_NO_WS_CTL) | \n | \r ) )
(?<obs_FWS> (?&WSP)+ (?: \r\n (?&WSP)+ )* )
(?<obs_ctext> (?&obs_NO_WS_CTL) )
(?<obs_qtext> (?&obs_NO_WS_CTL) )
(?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f )
# character class definitions
(?<VCHAR> [\x21-\x7E] )
(?<WSP> [ \t] )
)
^(?&addr_spec)$
PCRE 到 Onigmo recursion/subroutine 正则表达式转换为 straight-forward:
- 移除不受支持的
(?(DEFINE)...)
结构
- 将所有用于定义消费模式的命名组放在正则表达式的开头,并对所有这些应用一个
{0}
量词,这样它们就什么都不匹配了
- 将
(?&...)
替换为\g<...>
语法(我刚刚在Notepad++中用\(\?&(\w+)\)
替换为\g<>
)。
在 Ruby 中有效的最终表达式看起来像
re =/(?<addr_spec> \g<local_part> @ \g<domain> ){0}
(?<local_part> \g<dot_atom> | \g<quoted_string> | \g<obs_local_part> ){0}
(?<domain> \g<dot_atom> | \g<domain_literal> | \g<obs_domain> ){0}
(?<domain_literal> \g<CFWS>? \[ (?: \g<FWS>? \g<dtext> )* \g<FWS>? \] \g<CFWS>? ){0}
(?<dtext> [\x21-\x5a] | [\x5e-\x7e] | \g<obs_dtext> ){0}
(?<quoted_pair> \ (?: \g<VCHAR> | \g<WSP> ) | \g<obs_qp> ){0}
(?<dot_atom> \g<CFWS>? \g<dot_atom_text> \g<CFWS>? ){0}
(?<dot_atom_text> \g<atext> (?: \. \g<atext> )* ){0}
(?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ ){0}
(?<atom> \g<CFWS>? \g<atext> \g<CFWS>? ){0}
(?<word> \g<atom> | \g<quoted_string> ){0}
(?<quoted_string> \g<CFWS>? " (?: \g<FWS>? \g<qcontent> )* \g<FWS>? " \g<CFWS>? ){0}
(?<qcontent> \g<qtext> | \g<quoted_pair> ){0}
(?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | \g<obs_qtext> ){0}
# comments and whitespace
(?<FWS> (?: \g<WSP>* \r\n )? \g<WSP>+ | \g<obs_FWS> ){0}
(?<CFWS> (?: \g<FWS>? \g<comment> )+ \g<FWS>? | \g<FWS> ){0}
(?<comment> \( (?: \g<FWS>? \g<ccontent> )* \g<FWS>? \) ){0}
(?<ccontent> \g<ctext> | \g<quoted_pair> | \g<comment> ){0}
(?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | \g<obs_ctext> ){0}
# obsolete tokens
(?<obs_domain> \g<atom> (?: \. \g<atom> )* ){0}
(?<obs_local_part> \g<word> (?: \. \g<word> )* ){0}
(?<obs_dtext> \g<obs_NO_WS_CTL> | \g<quoted_pair> ){0}
(?<obs_qp> \ (?: \x00 | \g<obs_NO_WS_CTL> | \n | \r ) ){0}
(?<obs_FWS> \g<WSP>+ (?: \r\n \g<WSP>+ )* ){0}
(?<obs_ctext> \g<obs_NO_WS_CTL> ){0}
(?<obs_qtext> \g<obs_NO_WS_CTL> ){0}
(?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f ){0}
# character class definitions
(?<VCHAR> [\x21-\x7E] ){0}
(?<WSP> [ \t] ){0}
^\g<addr_spec>$/x
看到一个Ruby test:
p re.match?('+1~1+@iana.org') # => true
p re.match?('test@[123.123.123.123') # => false
这是 PCRE 正则表达式。 https://regex101.com/r/gJ7pU0/1 可以验证电子邮件地址。
ruby 是否有符合 RFC5322 的正则表达式? Ruby 有 URI::MailTo::EMAIL_REGEXP
,但我认为它不符合 RFC5322。
另一个 post 提到了这个 'mail' gem,但我没有看到用它验证电子邮件地址的方法。
https://github.com/mikel/mail/tree/6b0ebb142c476bf7c00524effe513a4f151f59ab
符合 PERC RFC5322 标准
(?(DEFINE)
(?<addr_spec> (?&local_part) @ (?&domain) )
(?<local_part> (?&dot_atom) | (?"ed_string) | (?&obs_local_part) )
(?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
(?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dtext) )* (?&FWS)? \] (?&CFWS)? )
(?<dtext> [\x21-\x5a] | [\x5e-\x7e] | (?&obs_dtext) )
(?<quoted_pair> \ (?: (?&VCHAR) | (?&WSP) ) | (?&obs_qp) )
(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)? )
(?<dot_atom_text> (?&atext) (?: \. (?&atext) )* )
(?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ )
(?<atom> (?&CFWS)? (?&atext) (?&CFWS)? )
(?<word> (?&atom) | (?"ed_string) )
(?<quoted_string> (?&CFWS)? " (?: (?&FWS)? (?&qcontent) )* (?&FWS)? " (?&CFWS)? )
(?<qcontent> (?&qtext) | (?"ed_pair) )
(?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | (?&obs_qtext) )
# comments and whitespace
(?<FWS> (?: (?&WSP)* \r\n )? (?&WSP)+ | (?&obs_FWS) )
(?<CFWS> (?: (?&FWS)? (?&comment) )+ (?&FWS)? | (?&FWS) )
(?<comment> \( (?: (?&FWS)? (?&ccontent) )* (?&FWS)? \) )
(?<ccontent> (?&ctext) | (?"ed_pair) | (?&comment) )
(?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | (?&obs_ctext) )
# obsolete tokens
(?<obs_domain> (?&atom) (?: \. (?&atom) )* )
(?<obs_local_part> (?&word) (?: \. (?&word) )* )
(?<obs_dtext> (?&obs_NO_WS_CTL) | (?"ed_pair) )
(?<obs_qp> \ (?: \x00 | (?&obs_NO_WS_CTL) | \n | \r ) )
(?<obs_FWS> (?&WSP)+ (?: \r\n (?&WSP)+ )* )
(?<obs_ctext> (?&obs_NO_WS_CTL) )
(?<obs_qtext> (?&obs_NO_WS_CTL) )
(?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f )
# character class definitions
(?<VCHAR> [\x21-\x7E] )
(?<WSP> [ \t] )
)
^(?&addr_spec)$
PCRE 到 Onigmo recursion/subroutine 正则表达式转换为 straight-forward:
- 移除不受支持的
(?(DEFINE)...)
结构 - 将所有用于定义消费模式的命名组放在正则表达式的开头,并对所有这些应用一个
{0}
量词,这样它们就什么都不匹配了 - 将
(?&...)
替换为\g<...>
语法(我刚刚在Notepad++中用\(\?&(\w+)\)
替换为\g<>
)。
在 Ruby 中有效的最终表达式看起来像
re =/(?<addr_spec> \g<local_part> @ \g<domain> ){0}
(?<local_part> \g<dot_atom> | \g<quoted_string> | \g<obs_local_part> ){0}
(?<domain> \g<dot_atom> | \g<domain_literal> | \g<obs_domain> ){0}
(?<domain_literal> \g<CFWS>? \[ (?: \g<FWS>? \g<dtext> )* \g<FWS>? \] \g<CFWS>? ){0}
(?<dtext> [\x21-\x5a] | [\x5e-\x7e] | \g<obs_dtext> ){0}
(?<quoted_pair> \ (?: \g<VCHAR> | \g<WSP> ) | \g<obs_qp> ){0}
(?<dot_atom> \g<CFWS>? \g<dot_atom_text> \g<CFWS>? ){0}
(?<dot_atom_text> \g<atext> (?: \. \g<atext> )* ){0}
(?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ ){0}
(?<atom> \g<CFWS>? \g<atext> \g<CFWS>? ){0}
(?<word> \g<atom> | \g<quoted_string> ){0}
(?<quoted_string> \g<CFWS>? " (?: \g<FWS>? \g<qcontent> )* \g<FWS>? " \g<CFWS>? ){0}
(?<qcontent> \g<qtext> | \g<quoted_pair> ){0}
(?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | \g<obs_qtext> ){0}
# comments and whitespace
(?<FWS> (?: \g<WSP>* \r\n )? \g<WSP>+ | \g<obs_FWS> ){0}
(?<CFWS> (?: \g<FWS>? \g<comment> )+ \g<FWS>? | \g<FWS> ){0}
(?<comment> \( (?: \g<FWS>? \g<ccontent> )* \g<FWS>? \) ){0}
(?<ccontent> \g<ctext> | \g<quoted_pair> | \g<comment> ){0}
(?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | \g<obs_ctext> ){0}
# obsolete tokens
(?<obs_domain> \g<atom> (?: \. \g<atom> )* ){0}
(?<obs_local_part> \g<word> (?: \. \g<word> )* ){0}
(?<obs_dtext> \g<obs_NO_WS_CTL> | \g<quoted_pair> ){0}
(?<obs_qp> \ (?: \x00 | \g<obs_NO_WS_CTL> | \n | \r ) ){0}
(?<obs_FWS> \g<WSP>+ (?: \r\n \g<WSP>+ )* ){0}
(?<obs_ctext> \g<obs_NO_WS_CTL> ){0}
(?<obs_qtext> \g<obs_NO_WS_CTL> ){0}
(?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f ){0}
# character class definitions
(?<VCHAR> [\x21-\x7E] ){0}
(?<WSP> [ \t] ){0}
^\g<addr_spec>$/x
看到一个Ruby test:
p re.match?('+1~1+@iana.org') # => true
p re.match?('test@[123.123.123.123') # => false