比较 Python3 中的 Unicode 代码点范围

Question

我想检查一个字符是否在某个 Unicode 范围内，但似乎无法得到预期的答案。

char = "？" # the unicode value is 0xff1f
print(hex(ord(char)))
if hex(ord(char)) in range(0xff01, 0xff60):
    print("in range")
else:
    print("not in range")

它应该打印：“在范围内”，但结果显示：“不在范围内”。我做错了什么？

Answer 1

hex() returns 一个字符串。要比较整数，您应该简单地使用 ord:

if ord(char) in range(0xff01, 0xff60):

你也可以这样写：

if 0xff01 <= ord(char) < 0xff60:

Answer 2

只用ord:

if ord(char) in range(0xff01, 0xff60):
    ...

hex 不需要。

如docs所述：

Convert an integer number to a lowercase hexadecimal string prefixed with “0x”.

显然已经描述过了，它变成了一个字符串，而不是我们想要的，一个整数。

而 ord 函数做了我们想要的，如 docs 中所述：

Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().

Answer 3

一般此类问题，您可以尝试检查变量的类型。

键入不带引号的 0xff01，代表一个数字。

list(range(0xff01, 0xff60)) 会给你一个整数列表 [65281, 65282, .., 65375]。 range(0xff01, 0xff60) == range(65281, 65376) 的计算结果为 True。

ord('?') 给你整数 65311.

hex() 接受一个整数并将其转换为 '0xff01'（字符串）。

所以，你只需要使用ord()，不需要hex()。

比较 Python3 中的 Unicode 代码点范围

Compare Unicode code point range in Python3

unicode

python-3.x