如何使用 CAtl RegEx 使用 C++ 验证电子邮件地址
How to validate e-mail address with C++ using CAtlRegExp
我需要能够在 C++ 中验证各种格式的国际电子邮件地址。我一直在网上找到很多答案,不要削减它,我找到了一个适合我的解决方案,我想我会分享给任何使用 ATL Server Library
的人
一些背景。我从 post: Using a regular expression to validate an email address. Which pointed to http://emailregex.com/ that had a regular expression in various languages that supports the RFC 5322 Official Standard 的 Internet 消息传递格式开始。
提供的正则表达式是
(?:[a-z0-9!#$%&'+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'+/=?^_`{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?.)+a-z0-9?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])
我将 C++ 与 ATL Server Library 一起使用,它曾经是 Visual Studio 的一部分。微软此后将其作为开源放在 CodePlex 上。我们仍然将它用于一些模板库。我的目标是修改此正则表达式,使其与 CAtlRegEx
一起使用
ATL 中的正则表达式引擎 (CAtlRegExp) 非常基础。我能够按如下方式修改正则表达式:
^{([a-z0-9!#$%&'+/=?^_`{|}~\-]+(\.([a-z0-9!#$%&'+/=?^_`{|}~\-]+))*)@(((a-z0-9?\.)+a-z0-9?)|(\[(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\]))}$
唯一似乎丢失的是域名中的 Unicode 支持,我可以按照 How to: Verify that Strings Are in Valid Email Format article on MSDN by using IdnToAscii 中的 C# 示例解决这个问题。
在这种方法中,用户名和域名是从电子邮件地址中提取的。使用 IdnToAscii 将域名转换为 Ascii,然后将两者放回一起,然后通过正则表达式 运行。
请注意,为了便于阅读,省略了错误处理。需要代码来确保没有缓冲区溢出和其他错误处理。传递超过 255 个字符的电子邮件地址将导致此示例崩溃。
代码:
bool WINAPI LocalLooksLikeEmailAddress(LPCWSTR lpszEmailAddress)
{
bool bRetVal = true ;
const int ccbEmailAddressMaxLen = 255 ;
wchar_t achANSIEmailAddress[ccbEmailAddressMaxLen] = { L'[=10=]' } ;
ATL::CAtlRegExp<> regexp ;
ATL::CAtlREMatchContext<> regexpMatch ;
ATL::REParseError status = regexp.Parse(L"^{.+}@{.+}$", FALSE) ;
if (status == REPARSE_ERROR_OK) {
if (regexp.Match(lpszEmailAddress, ®expMatch) && regexpMatch.m_uNumGroups == 2) {
const CAtlREMatchContext<>::RECHAR* szStart = 0 ;
const CAtlREMatchContext<>::RECHAR* szEnd = 0 ;
regexpMatch.GetMatch(0, &szStart, &szEnd) ;
::wcsncpy_s(achANSIEmailAddress, szStart, (size_t)(szEnd - szStart)) ;
regexpMatch.GetMatch(1, &szStart, &szEnd) ;
wchar_t achDomainName[ccbEmailAddressMaxLen] = { L'[=10=]' } ;
::wcsncpy_s(achDomainName, szStart, (size_t)(szEnd - szStart)) ;
if (bRetVal) {
wchar_t achPunycode[ccbEmailAddressMaxLen] = { L'[=10=]' } ;
if (IdnToAscii(0, achDomainName, -1, achPunycode, ccbEmailAddressMaxLen) == 0)
bRetVal = false ;
else {
::wcscat_s(achANSIEmailAddress, L"@") ;
::wcscat_s(achANSIEmailAddress, achPunycode) ;
}
}
}
}
if (bRetVal) {
status = regexp.Parse(
L"^{([a-z0-9!#$%&'*+/=?^_`{|}~\-]+(\.([a-z0-9!#$%&'*+/=?^_`{|}~\-]+))*)@((([a-z0-9]([a-z0-9\-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9\-]*[a-z0-9])?)|(\[(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\]))}$"
, FALSE) ;
if (status == REPARSE_ERROR_OK) {
bRetVal = regexp.Match(achANSIEmailAddress, ®expMatch) != 0;
}
}
return bRetVal ;
}
值得一提的是,这种方法与 C# 中的结果不一致 MSDN article for two of the email addresses. Looking the original regular expression listed on http://emailregex.com suggests that the MSDN Article got it wrong, unless the specification has recently been changed. I decided to go with the regular expression mentioned on http://emailregex.com
这是我的单元测试,使用来自 MSDN Article
的相同电子邮件地址
#include <Windows.h>
#if _DEBUG
#define TESTEXPR(expr) _ASSERTE(expr)
#else
#define TESTEXPR(expr) if (!(expr)) throw ;
#endif
void main()
{
LPCWSTR validEmailAddresses[] = { L"david.jones@proseware.com",
L"d.j@server1.proseware.com",
L"jones@ms1.proseware.com",
L"j@proseware.com9",
L"js#internal@proseware.com",
L"j_9@[129.126.118.1]",
L"js*@proseware.com", // <== according to https://msdn.microsoft.com/en-us/library/01escwtf(v=vs.110).aspx this is invalid
// but according to http://emailregex.com/ that claims to support the RFC 5322 Official standard it's not.
// I'm going with valid
L"js@proseware.com9",
L"j.s@server1.proseware.com",
L"js@contoso.中国",
NULL } ;
LPCWSTR invalidEmailAddresses[] = { L"j.@server1.proseware.com",
L"\"j\\"s\\"\"@proseware.com", // <== according to https://msdn.microsoft.com/en-us/library/01escwtf(v=vs.110).aspx this is valid
// but according to http://emailregex.com/ that claims to support the RFC 5322 Official standard it's not.
// I'm going with Invalid
L"j..s@proseware.com",
L"js@proseware..com",
NULL } ;
for (LPCWSTR* emailAddress = validEmailAddresses ; *emailAddress != NULL ; ++emailAddress)
{
TESTEXPR(LocalLooksLikeEmailAddress(*emailAddress)) ;
}
for (LPCWSTR* emailAddress = invalidEmailAddresses ; *emailAddress != NULL ; ++emailAddress)
{
TESTEXPR(!LocalLooksLikeEmailAddress(*emailAddress)) ;
}
}
我需要能够在 C++ 中验证各种格式的国际电子邮件地址。我一直在网上找到很多答案,不要削减它,我找到了一个适合我的解决方案,我想我会分享给任何使用 ATL Server Library
的人一些背景。我从 post: Using a regular expression to validate an email address. Which pointed to http://emailregex.com/ that had a regular expression in various languages that supports the RFC 5322 Official Standard 的 Internet 消息传递格式开始。
提供的正则表达式是
(?:[a-z0-9!#$%&'+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'+/=?^_`{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?.)+a-z0-9?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])
我将 C++ 与 ATL Server Library 一起使用,它曾经是 Visual Studio 的一部分。微软此后将其作为开源放在 CodePlex 上。我们仍然将它用于一些模板库。我的目标是修改此正则表达式,使其与 CAtlRegEx
一起使用ATL 中的正则表达式引擎 (CAtlRegExp) 非常基础。我能够按如下方式修改正则表达式:
^{([a-z0-9!#$%&'+/=?^_`{|}~\-]+(\.([a-z0-9!#$%&'+/=?^_`{|}~\-]+))*)@(((a-z0-9?\.)+a-z0-9?)|(\[(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\]))}$
唯一似乎丢失的是域名中的 Unicode 支持,我可以按照 How to: Verify that Strings Are in Valid Email Format article on MSDN by using IdnToAscii 中的 C# 示例解决这个问题。
在这种方法中,用户名和域名是从电子邮件地址中提取的。使用 IdnToAscii 将域名转换为 Ascii,然后将两者放回一起,然后通过正则表达式 运行。
请注意,为了便于阅读,省略了错误处理。需要代码来确保没有缓冲区溢出和其他错误处理。传递超过 255 个字符的电子邮件地址将导致此示例崩溃。
代码:
bool WINAPI LocalLooksLikeEmailAddress(LPCWSTR lpszEmailAddress)
{
bool bRetVal = true ;
const int ccbEmailAddressMaxLen = 255 ;
wchar_t achANSIEmailAddress[ccbEmailAddressMaxLen] = { L'[=10=]' } ;
ATL::CAtlRegExp<> regexp ;
ATL::CAtlREMatchContext<> regexpMatch ;
ATL::REParseError status = regexp.Parse(L"^{.+}@{.+}$", FALSE) ;
if (status == REPARSE_ERROR_OK) {
if (regexp.Match(lpszEmailAddress, ®expMatch) && regexpMatch.m_uNumGroups == 2) {
const CAtlREMatchContext<>::RECHAR* szStart = 0 ;
const CAtlREMatchContext<>::RECHAR* szEnd = 0 ;
regexpMatch.GetMatch(0, &szStart, &szEnd) ;
::wcsncpy_s(achANSIEmailAddress, szStart, (size_t)(szEnd - szStart)) ;
regexpMatch.GetMatch(1, &szStart, &szEnd) ;
wchar_t achDomainName[ccbEmailAddressMaxLen] = { L'[=10=]' } ;
::wcsncpy_s(achDomainName, szStart, (size_t)(szEnd - szStart)) ;
if (bRetVal) {
wchar_t achPunycode[ccbEmailAddressMaxLen] = { L'[=10=]' } ;
if (IdnToAscii(0, achDomainName, -1, achPunycode, ccbEmailAddressMaxLen) == 0)
bRetVal = false ;
else {
::wcscat_s(achANSIEmailAddress, L"@") ;
::wcscat_s(achANSIEmailAddress, achPunycode) ;
}
}
}
}
if (bRetVal) {
status = regexp.Parse(
L"^{([a-z0-9!#$%&'*+/=?^_`{|}~\-]+(\.([a-z0-9!#$%&'*+/=?^_`{|}~\-]+))*)@((([a-z0-9]([a-z0-9\-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9\-]*[a-z0-9])?)|(\[(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)(((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\.)((2((5[0-5])|([0-4][0-9])))|(1[0-9][0-9])|([1-9]?[0-9]))\]))}$"
, FALSE) ;
if (status == REPARSE_ERROR_OK) {
bRetVal = regexp.Match(achANSIEmailAddress, ®expMatch) != 0;
}
}
return bRetVal ;
}
值得一提的是,这种方法与 C# 中的结果不一致 MSDN article for two of the email addresses. Looking the original regular expression listed on http://emailregex.com suggests that the MSDN Article got it wrong, unless the specification has recently been changed. I decided to go with the regular expression mentioned on http://emailregex.com
这是我的单元测试,使用来自 MSDN Article
的相同电子邮件地址#include <Windows.h>
#if _DEBUG
#define TESTEXPR(expr) _ASSERTE(expr)
#else
#define TESTEXPR(expr) if (!(expr)) throw ;
#endif
void main()
{
LPCWSTR validEmailAddresses[] = { L"david.jones@proseware.com",
L"d.j@server1.proseware.com",
L"jones@ms1.proseware.com",
L"j@proseware.com9",
L"js#internal@proseware.com",
L"j_9@[129.126.118.1]",
L"js*@proseware.com", // <== according to https://msdn.microsoft.com/en-us/library/01escwtf(v=vs.110).aspx this is invalid
// but according to http://emailregex.com/ that claims to support the RFC 5322 Official standard it's not.
// I'm going with valid
L"js@proseware.com9",
L"j.s@server1.proseware.com",
L"js@contoso.中国",
NULL } ;
LPCWSTR invalidEmailAddresses[] = { L"j.@server1.proseware.com",
L"\"j\\"s\\"\"@proseware.com", // <== according to https://msdn.microsoft.com/en-us/library/01escwtf(v=vs.110).aspx this is valid
// but according to http://emailregex.com/ that claims to support the RFC 5322 Official standard it's not.
// I'm going with Invalid
L"j..s@proseware.com",
L"js@proseware..com",
NULL } ;
for (LPCWSTR* emailAddress = validEmailAddresses ; *emailAddress != NULL ; ++emailAddress)
{
TESTEXPR(LocalLooksLikeEmailAddress(*emailAddress)) ;
}
for (LPCWSTR* emailAddress = invalidEmailAddresses ; *emailAddress != NULL ; ++emailAddress)
{
TESTEXPR(!LocalLooksLikeEmailAddress(*emailAddress)) ;
}
}