为什么 Unicode 表情符号 属性 转义匹配数字?
Why do Unicode emoji property escapes match numbers?
我发现了这种使用正则表达式检测表情符号的绝妙方法,该正则表达式不使用 Unicode property escape:
console.log(/\p{Emoji}/u.test('flowers ')) // true
console.log(/\p{Emoji}/u.test('flowers')) // false
但是当我分享这个知识时 in this answer,@Bronzdragon 注意到 \p{Emoji}
也匹配数字!这是为什么?数字不是表情符号?
console.log(/\p{Emoji}/u.test('flowers 123')) // unexpectdly true
// regex-only workaround by @Bonzdragon
const regex = /(?=\p{Emoji})(?!\p{Number})/u;
console.log(
regex.test('flowers'), // false, as expected
regex.test('flowers 123'), // false, as expected
regex.test('flowers 123 '), // true, as expected
regex.test('flowers '), // true, as expected
)
// more readable workaround
const hasEmoji = str => {
const nbEmojiOrNumber = (str.match(/\p{Emoji}/gu) || []).length;
const nbNumber = (str.match(/\p{Number}/gu) || []).length;
return nbEmojiOrNumber > nbNumber;
}
console.log(
hasEmoji('flowers'), // false, as expected
hasEmoji('flowers 123'), // false, as expected
hasEmoji('flowers 123 '), // true, as expected
hasEmoji('flowers '), // true, as expected
)
根据 this post,digitis,#
,*
,ZWJ 和其他一些字符包含 Emoji
属性 设置为 是,这意味着 数字被认为是有效的表情符号字符:
0023 ; Emoji_Component # 1.1 [1] (#️) number sign
002A ; Emoji_Component # 1.1 [1] (*️) asterisk
0030..0039 ; Emoji_Component # 1.1 [10] (0️..9️) digit zero..digit nine
200D ; Emoji_Component # 1.1 [1] () zero width joiner
20E3 ; Emoji_Component # 3.0 [1] (⃣) combining enclosing keycap
FE0F ; Emoji_Component # 3.2 [1] () VARIATION SELECTOR-16
1F1E6..1F1FF ; Emoji_Component # 6.0 [26] (..) regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF ; Emoji_Component # 8.0 [5] (..) light skin tone..dark skin tone
1F9B0..1F9B3 ; Emoji_Component # 11.0 [4] (..) red-haired..white-haired
E0020..E007F ; Emoji_Component # 3.1 [96] (..) tag space..cancel tag
例如,1
是一个数字,但与U+FE0F
和U+20E3
字符组合后就变成了表情符号:1️⃣:
console.log("1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 0\uFE0F\u20E3")
如果要避免匹配数字,请使用 Extended_Pictographic
Unicode 类别 class:
The Extended_Pictographic characters contain all the Emoji characters except for some Emoji_Components.
因此,您可以使用 /\p{Extended_Pictographic}/gu
来匹配大多数表情符号,或者 /\p{Extended_Pictographic}/u
来测试单个表情符号,或者使用 /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u
来匹配表情符号和浅色皮肤深色皮肤模式字符和 red-haired 到 white-haired 个字符:
const regex_emoji = /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u;
console.log( regex_emoji.test('flowers 123') ); // => false
console.log( regex_emoji.test('flowers ') ); // => true
我发现了这种使用正则表达式检测表情符号的绝妙方法,该正则表达式不使用 Unicode property escape:
console.log(/\p{Emoji}/u.test('flowers ')) // true
console.log(/\p{Emoji}/u.test('flowers')) // false
但是当我分享这个知识时 in this answer,@Bronzdragon 注意到 \p{Emoji}
也匹配数字!这是为什么?数字不是表情符号?
console.log(/\p{Emoji}/u.test('flowers 123')) // unexpectdly true
// regex-only workaround by @Bonzdragon
const regex = /(?=\p{Emoji})(?!\p{Number})/u;
console.log(
regex.test('flowers'), // false, as expected
regex.test('flowers 123'), // false, as expected
regex.test('flowers 123 '), // true, as expected
regex.test('flowers '), // true, as expected
)
// more readable workaround
const hasEmoji = str => {
const nbEmojiOrNumber = (str.match(/\p{Emoji}/gu) || []).length;
const nbNumber = (str.match(/\p{Number}/gu) || []).length;
return nbEmojiOrNumber > nbNumber;
}
console.log(
hasEmoji('flowers'), // false, as expected
hasEmoji('flowers 123'), // false, as expected
hasEmoji('flowers 123 '), // true, as expected
hasEmoji('flowers '), // true, as expected
)
根据 this post,digitis,#
,*
,ZWJ 和其他一些字符包含 Emoji
属性 设置为 是,这意味着 数字被认为是有效的表情符号字符:
0023 ; Emoji_Component # 1.1 [1] (#️) number sign
002A ; Emoji_Component # 1.1 [1] (*️) asterisk
0030..0039 ; Emoji_Component # 1.1 [10] (0️..9️) digit zero..digit nine
200D ; Emoji_Component # 1.1 [1] () zero width joiner
20E3 ; Emoji_Component # 3.0 [1] (⃣) combining enclosing keycap
FE0F ; Emoji_Component # 3.2 [1] () VARIATION SELECTOR-16
1F1E6..1F1FF ; Emoji_Component # 6.0 [26] (..) regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF ; Emoji_Component # 8.0 [5] (..) light skin tone..dark skin tone
1F9B0..1F9B3 ; Emoji_Component # 11.0 [4] (..) red-haired..white-haired
E0020..E007F ; Emoji_Component # 3.1 [96] (..) tag space..cancel tag
例如,1
是一个数字,但与U+FE0F
和U+20E3
字符组合后就变成了表情符号:1️⃣:
console.log("1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 0\uFE0F\u20E3")
如果要避免匹配数字,请使用 Extended_Pictographic
Unicode 类别 class:
The Extended_Pictographic characters contain all the Emoji characters except for some Emoji_Components.
因此,您可以使用 /\p{Extended_Pictographic}/gu
来匹配大多数表情符号,或者 /\p{Extended_Pictographic}/u
来测试单个表情符号,或者使用 /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u
来匹配表情符号和浅色皮肤深色皮肤模式字符和 red-haired 到 white-haired 个字符:
const regex_emoji = /[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u;
console.log( regex_emoji.test('flowers 123') ); // => false
console.log( regex_emoji.test('flowers ') ); // => true