如何在 Lucee 中模仿 Unicode JS 正则表达式
How do I mimic a Unicode JS regular expression in Lucee
我正在尝试用Lucee写一个正则表达式来模仿前端的JS。由于 Lucee 的正则表达式似乎不支持 unicode 我该怎么做。
这是JS
function charTest(k){
var regexp = /^[\u00C0-\u00ff\s -\~]+$/;
return regexp.test(k)
}
if(!charTest(thisKey)){
alert("Please Use Latin Characters Only");
return false;
}
这是我在 Lucee 中尝试过的
regexp = '[\u00C0-\u00ff\s -\~]+/';
writeDump(reFind(regexp,"测));
writeDump(reFind(regexp,"test));
我也试过了
regexp = "[\p{L}]";
但转储总是 0
编辑:给我一秒钟。我想我错误地解释了您最初的 JS 正则表达式。修复它。
编辑 2: 一秒多了。你原来的 JS 正则表达式是:
"/^[\u00C0-\u00ff\s -\~]+$/"
。这是:
Basic parts of regex:
"/..../" == signifies the start and stop of the Regex.
"^[...]" == signifies anything that is NOT in this group
"+" == signifies at least one of the previous
"$" == signifies the end of the string
Identifiers in the regex:
"\u00c0-\u00ff" == Unicode character range of Character 192 (À)
to Character 255 (ÿ). This is the Latin 1
Extension of the Unicode character set.
"\s" == signifies a Space Character
" -\~" == signifies another identifier for a space character to the
(escaped) tilde character (~). This is ASCII 32-126, which
includes the printable characters of ASCII (except the DEL
character (127). This includes alpha-numerics amd most punctuation.
我错过了你的可打印拉丁基本字符集的后半部分。我已经更新了我的正则表达式和测试以包含它。有一些方法可以 shorthand 其中一些标识符,但我希望它是明确的。
你可以试试这个:
<cfscript>
//http://www.asciitable.com/
//https://en.wikipedia.org/wiki/List_of_Unicode_characters
//https://en.wikipedia.org/wiki/Latin_script_in_Unicode
function charTest(k) {
return
REfind("[^"
& chr(32) & "-" & chr(126)
& chr(192) & "-" & chr(255)
& "]",arguments.k)
? "Please Use Latin Characters Only"
: ""
;
}
// TESTS
writeDump(charTest("测")); // Not Latin
writeDump(charTest("test")); // All characters between 31 & 126
writeDump(charTest("À")); // Character 192 (in range)
writeDump(charTest("À ")); // Character 192 and Space
writeDump(charTest(" ")); // Space Characters
writeDump(charTest("12345")); // Digits ( character 48-57 )
writeDump(charTest("ð")); // Character 240 (in range)
writeDump(charTest("ℿ")); // Character 8511 (outside range)
writeDump(charTest(chr(199))); // CF Character (in range)
writeDump(charTest(chr(10))); // CF Line Feed Character (outside range)
writeDump(charTest(chr(1000))); // CF Character (outside range)
writeDump(charTest("
")); // CRLF (outside range)
writeDump(charTest(URLDecode("%00", "utf-8"))); // CF Null character (outside range)
//writeDump(asc("测"));
//writeDump(asc("test"));
//writeDump(asc("À"));
//writeDump(asc("ð"));
//writeDump(asc("ℿ"));
</cfscript>
https://trycf.com/gist/05d27baaed2b8fc269f90c7c80a1aa82/lucee5?theme=monokai
正则表达式所做的就是查看您的输入字符串,如果它没有找到介于 chr(192)
和 chr(255)
之间的值,它将 return 您选择的字符串,否则它return什么都不会。
我觉得你可以直接访问255以下的UNICODE字符。我得测试一下。
是否需要提醒此功能,如Javascript?如果需要,您可以只输出一个 1 或 0 来确定此函数是否真的找到了您要查找的字符。
我正在尝试用Lucee写一个正则表达式来模仿前端的JS。由于 Lucee 的正则表达式似乎不支持 unicode 我该怎么做。
这是JS
function charTest(k){
var regexp = /^[\u00C0-\u00ff\s -\~]+$/;
return regexp.test(k)
}
if(!charTest(thisKey)){
alert("Please Use Latin Characters Only");
return false;
}
这是我在 Lucee 中尝试过的
regexp = '[\u00C0-\u00ff\s -\~]+/';
writeDump(reFind(regexp,"测));
writeDump(reFind(regexp,"test));
我也试过了
regexp = "[\p{L}]";
但转储总是 0
编辑:给我一秒钟。我想我错误地解释了您最初的 JS 正则表达式。修复它。
编辑 2: 一秒多了。你原来的 JS 正则表达式是:
"/^[\u00C0-\u00ff\s -\~]+$/"
。这是:
Basic parts of regex:
"/..../" == signifies the start and stop of the Regex.
"^[...]" == signifies anything that is NOT in this group
"+" == signifies at least one of the previous
"$" == signifies the end of the string
Identifiers in the regex:
"\u00c0-\u00ff" == Unicode character range of Character 192 (À)
to Character 255 (ÿ). This is the Latin 1
Extension of the Unicode character set.
"\s" == signifies a Space Character
" -\~" == signifies another identifier for a space character to the
(escaped) tilde character (~). This is ASCII 32-126, which
includes the printable characters of ASCII (except the DEL
character (127). This includes alpha-numerics amd most punctuation.
我错过了你的可打印拉丁基本字符集的后半部分。我已经更新了我的正则表达式和测试以包含它。有一些方法可以 shorthand 其中一些标识符,但我希望它是明确的。
你可以试试这个:
<cfscript>
//http://www.asciitable.com/
//https://en.wikipedia.org/wiki/List_of_Unicode_characters
//https://en.wikipedia.org/wiki/Latin_script_in_Unicode
function charTest(k) {
return
REfind("[^"
& chr(32) & "-" & chr(126)
& chr(192) & "-" & chr(255)
& "]",arguments.k)
? "Please Use Latin Characters Only"
: ""
;
}
// TESTS
writeDump(charTest("测")); // Not Latin
writeDump(charTest("test")); // All characters between 31 & 126
writeDump(charTest("À")); // Character 192 (in range)
writeDump(charTest("À ")); // Character 192 and Space
writeDump(charTest(" ")); // Space Characters
writeDump(charTest("12345")); // Digits ( character 48-57 )
writeDump(charTest("ð")); // Character 240 (in range)
writeDump(charTest("ℿ")); // Character 8511 (outside range)
writeDump(charTest(chr(199))); // CF Character (in range)
writeDump(charTest(chr(10))); // CF Line Feed Character (outside range)
writeDump(charTest(chr(1000))); // CF Character (outside range)
writeDump(charTest("
")); // CRLF (outside range)
writeDump(charTest(URLDecode("%00", "utf-8"))); // CF Null character (outside range)
//writeDump(asc("测"));
//writeDump(asc("test"));
//writeDump(asc("À"));
//writeDump(asc("ð"));
//writeDump(asc("ℿ"));
</cfscript>
https://trycf.com/gist/05d27baaed2b8fc269f90c7c80a1aa82/lucee5?theme=monokai
正则表达式所做的就是查看您的输入字符串,如果它没有找到介于 chr(192)
和 chr(255)
之间的值,它将 return 您选择的字符串,否则它return什么都不会。
我觉得你可以直接访问255以下的UNICODE字符。我得测试一下。
是否需要提醒此功能,如Javascript?如果需要,您可以只输出一个 1 或 0 来确定此函数是否真的找到了您要查找的字符。