Javascript 外国名称的正则表达式
Javascript regex for foreign names
我做了一个简单的正则表达式来验证名字,我没想到的是一些用户有两个(或更多不常见的名字)并且他们有像 áéióú 这样的口音。
我也在考虑其他字符,例如西班牙语 ñ 或 ç。
到目前为止这是我的验证码:
function validateForm(element) {
var regex = /^[a-zA-Z\-]+$/;
var ctrl = document.getElementById(element).value;
if(ctrl == null || ctrl == '')
return;
if (!regex.test(ctrl)) {
alert(element + ' not valid');
document.getElementById(element).focus();
}
我不知道该怎么做,有什么提示吗?
名称非常难以验证,因为它们差异很大。正如您所指出的,名称可以连字符、space 分隔或包含大多数英文名称所没有的脚本,这使得几乎不可能考虑所有可能性。
话虽这么说...
有一些简单的事实适用于任何名称,无论是外国名称还是其他名称,我们可以测试这些条件。
以下是您可能想要或不想排除的一些内容。
- 空名
- 包含数字的名称
- 包含无意义字符(数字、+ _ ! @ # $ % ^ & * 等)的名称
- 以 space、连字符和撇号等特殊字符开头或结尾的名称
- 名称包含连续的、相同的特殊字符(例如 -- 或 -')
- 可能还有更多我没有提到的(如果你有任何想法,请发表评论,我做了很多这种验证,并且会热衷于其他人的意见)
注意: 我所说的特殊字符是指 'pseudo-alphanumeric' 符号,例如连字符、spaces、撇号 - 允许使用的字符在名称中但不是字母数字。
我的建议是 运行 为您要测试的每个条件使用一个单独的正则表达式,并且仅当所有条件都通过时才接受该名称。单独执行此操作还可以让您更好地确定究竟是什么原因导致输入的名称无效。
我有大约 15,000 个用户使用我使用类似的限制性验证构建的东西,我从来没有遇到过这种方法的问题。
编辑: 很明显,我解决这个问题的方法可能会排除名字中有数字或符号的人。你施加的条件越多,你给这些人带来的不便就越多。
请忽略此行上方的所有内容
关于这个问题,我诚实而谦虚地认为从名字中验证 'junk' 是明智的,因此我觉得删除我原来的答案对我来说是不诚实的。
我仍然会推荐它作为一种技术来验证更统一的输入,具有更好定义的标准,如日期、时间、本地电话号码。
相反,我想说的是,在评论中进行了辩论并阅读了其他答案后,我也得出结论,您不应该对名称施加任何条件。
只要您在将输入输入数据库之前对其进行清理,任何事情都会发生,这不是我的事,我什至不知道我为什么要关心。
现在请原谅我有一些代码需要修复...
我不会引用整个 Falsehoods Programmers Believe About Names,但足以说明一个简单的正则表达式无法准确捕捉人名的复杂性。
对于您提出的每条规则,保证有数百个例外。是否"names can't have spaces" 除了"van Buren",名字只有一个字母除了O'Reily,名字只有一个大写字母except "McDonnell",等等,甚至一个人会有名字和姓氏except "Cher","Prince","Bono",等等....
如第一条评论所述,名称唯一可能的正则表达式是:
/^.+$/
而且麻烦的事件,因为它意味着一个名字甚至有一个书面形式开头。
为了后代,我将在此处包含文章中的谎言列表:
Falsehoods Programmers Believe About Names
- People have exactly one canonical full name.
- People have exactly one full name which they go by.
- People have, at this point in time, exactly one canonical full name.
- People have, at this point in time, one full name which they go by.
- People have exactly N names, for any value of N.
- People’s names fit within a certain defined amount of space.
- People’s names do not change.
- People’s names change, but only at a certain enumerated set of events.
- People’s names are written in ASCII.
- People’s names are written in any single character set.
- People’s names are all mapped in Unicode code points.
- People’s names are case sensitive.
- People’s names are case insensitive.
- People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
- People’s names do not contain numbers.
- People’s names are not written in ALL CAPS.
- People’s names are not written in all lower case letters.
- People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
- People’s first names and last names are, by necessity, different.
- People have last names, family names, or anything else which is shared by folks recognized as their relatives.
- People’s names are globally unique.
- People’s names are almost globally unique.
- Alright alright but surely people’s names are diverse enough such that no million people share the same name.
- My system will never have to deal with names from China.
- Or Japan.
- Or Korea.
- Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
- That Klingon Empire thing was a joke, right?
- Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
- There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
- I can safely assume that this dictionary of bad words contains no people’s names in it.
- People’s names are assigned at birth.
- OK, maybe not at birth, but at least pretty close to birth.
- Alright, alright, within a year or so of birth.
- Five years?
- You’re kidding me, right?
- Two different systems containing data about the same person will use the same name for that person.
- Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
- People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎.
- People have names.
我做了一个简单的正则表达式来验证名字,我没想到的是一些用户有两个(或更多不常见的名字)并且他们有像 áéióú 这样的口音。
我也在考虑其他字符,例如西班牙语 ñ 或 ç。
到目前为止这是我的验证码:
function validateForm(element) {
var regex = /^[a-zA-Z\-]+$/;
var ctrl = document.getElementById(element).value;
if(ctrl == null || ctrl == '')
return;
if (!regex.test(ctrl)) {
alert(element + ' not valid');
document.getElementById(element).focus();
}
我不知道该怎么做,有什么提示吗?
名称非常难以验证,因为它们差异很大。正如您所指出的,名称可以连字符、space 分隔或包含大多数英文名称所没有的脚本,这使得几乎不可能考虑所有可能性。
话虽这么说...
有一些简单的事实适用于任何名称,无论是外国名称还是其他名称,我们可以测试这些条件。
以下是您可能想要或不想排除的一些内容。
- 空名
- 包含数字的名称
- 包含无意义字符(数字、+ _ ! @ # $ % ^ & * 等)的名称
- 以 space、连字符和撇号等特殊字符开头或结尾的名称
- 名称包含连续的、相同的特殊字符(例如 -- 或 -')
- 可能还有更多我没有提到的(如果你有任何想法,请发表评论,我做了很多这种验证,并且会热衷于其他人的意见)
注意: 我所说的特殊字符是指 'pseudo-alphanumeric' 符号,例如连字符、spaces、撇号 - 允许使用的字符在名称中但不是字母数字。
我的建议是 运行 为您要测试的每个条件使用一个单独的正则表达式,并且仅当所有条件都通过时才接受该名称。单独执行此操作还可以让您更好地确定究竟是什么原因导致输入的名称无效。
我有大约 15,000 个用户使用我使用类似的限制性验证构建的东西,我从来没有遇到过这种方法的问题。
编辑: 很明显,我解决这个问题的方法可能会排除名字中有数字或符号的人。你施加的条件越多,你给这些人带来的不便就越多。
请忽略此行上方的所有内容
关于这个问题,我诚实而谦虚地认为从名字中验证 'junk' 是明智的,因此我觉得删除我原来的答案对我来说是不诚实的。
我仍然会推荐它作为一种技术来验证更统一的输入,具有更好定义的标准,如日期、时间、本地电话号码。
相反,我想说的是,在评论中进行了辩论并阅读了其他答案后,我也得出结论,您不应该对名称施加任何条件。
只要您在将输入输入数据库之前对其进行清理,任何事情都会发生,这不是我的事,我什至不知道我为什么要关心。
现在请原谅我有一些代码需要修复...
我不会引用整个 Falsehoods Programmers Believe About Names,但足以说明一个简单的正则表达式无法准确捕捉人名的复杂性。
对于您提出的每条规则,保证有数百个例外。是否"names can't have spaces" 除了"van Buren",名字只有一个字母除了O'Reily,名字只有一个大写字母except "McDonnell",等等,甚至一个人会有名字和姓氏except "Cher","Prince","Bono",等等....
如第一条评论所述,名称唯一可能的正则表达式是:
/^.+$/
而且麻烦的事件,因为它意味着一个名字甚至有一个书面形式开头。
为了后代,我将在此处包含文章中的谎言列表:
Falsehoods Programmers Believe About Names
- People have exactly one canonical full name.
- People have exactly one full name which they go by.
- People have, at this point in time, exactly one canonical full name.
- People have, at this point in time, one full name which they go by.
- People have exactly N names, for any value of N.
- People’s names fit within a certain defined amount of space.
- People’s names do not change.
- People’s names change, but only at a certain enumerated set of events.
- People’s names are written in ASCII.
- People’s names are written in any single character set.
- People’s names are all mapped in Unicode code points.
- People’s names are case sensitive.
- People’s names are case insensitive.
- People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
- People’s names do not contain numbers.
- People’s names are not written in ALL CAPS.
- People’s names are not written in all lower case letters.
- People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
- People’s first names and last names are, by necessity, different.
- People have last names, family names, or anything else which is shared by folks recognized as their relatives.
- People’s names are globally unique.
- People’s names are almost globally unique.
- Alright alright but surely people’s names are diverse enough such that no million people share the same name.
- My system will never have to deal with names from China.
- Or Japan.
- Or Korea.
- Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
- That Klingon Empire thing was a joke, right?
- Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
- There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
- I can safely assume that this dictionary of bad words contains no people’s names in it.
- People’s names are assigned at birth.
- OK, maybe not at birth, but at least pretty close to birth.
- Alright, alright, within a year or so of birth.
- Five years?
- You’re kidding me, right?
- Two different systems containing data about the same person will use the same name for that person.
- Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
- People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎.
- People have names.