unicodedata.digit 和 unicodedata.numeric 有什么区别?
What is the difference between unicodedata.digit and unicodedata.numeric?
来自 unicodedata 文档:
unicodedata.digit(chr[, default]) Returns the digit value assigned to
the character chr as integer. If no such value is defined, default is
returned, or, if not given, ValueError is raised.
unicodedata.numeric(chr[, default]) Returns the numeric value assigned
to the character chr as float. If no such value is defined, default is
returned, or, if not given, ValueError is raised.
任何人都可以向我解释这两个函数之间的区别吗?
这里有人可以阅读 the implementation of both functions 但我不明白快速浏览有什么区别,因为我不熟悉 CPython 实现。
编辑 1:
最好是一个显示差异的例子。
编辑 2:
示例有助于补充@user2357112 的评论和精彩回答:
print(unicodedata.digit('1')) # Decimal digit one.
print(unicodedata.digit('١')) # ARABIC-INDIC digit one
print(unicodedata.digit('¼')) # Not a digit, so "ValueError: not a digit" will be generated.
print(unicodedata.numeric('Ⅱ')) # Roman number two.
print(unicodedata.numeric('¼')) # Fraction to represent one quarter.
简答:
如果一个字符表示一个十进制数字,那么 1
、¹
(大写一)、①
(带圆圈的数字一)、١
(阿拉伯语-INDIC DIGIT ONE), unicodedata.digit
将 return 该字符表示为 int 的数字(因此所有这些示例都是 1)。
如果字符表示任何数值,那么像 ⅐
(普通分数一七)和所有十进制数字示例,unicodedata.numeric
将以浮点数形式给出该字符的数值。
出于技术原因,较新的数字字符,如 </code>(DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO)可能会从 <code>unicodedata.digit
.
引发 ValueError
长答案:
Unicode 字符都有一个 Numeric_Type
属性。 属性 可以有 4 个可能的值:Numeric_Type=Decimal,Numeric_Type=Digit,Numeric_Type=Numeric,或 Numeric_Type=None.
引用 Unicode standard, version 10.0.0, section 4.6,
The Numeric_Type=Decimal property value (which is correlated with the General_Category=Nd
property value) is limited to those numeric characters that are used in decimal-radix
numbers and for which a full set of digits has been encoded in a contiguous range,
with ascending order of Numeric_Value, and with the digit zero as the first code point in
the range.
Numeric_Type=十进制字符因此是符合其他一些特定技术要求的十进制数字。
Decimal digits, as defined in the Unicode Standard by these property assignments, exclude
some characters, such as the CJK ideographic digits (see the first ten entries in Table 4-5),
which are not encoded in a contiguous sequence. Decimal digits also exclude the compatibility
subscript and superscript digits, to prevent simplistic parsers from misinterpreting
their values in context. (For more information on superscript and subscripts, see
Section 22.4, Superscript and Subscript Symbols.) Traditionally, the Unicode Character
Database has given these sets of noncontiguous or compatibility digits the value Numeric_Type=Digit, to recognize the fact that they consist of digit values but do not necessarily
meet all the criteria for Numeric_Type=Decimal. However, the distinction between
Numeric_Type=Digit and the more generic Numeric_Type=Numeric has proven not to be
useful in implementations. As a result, future sets of digits which may be added to the standard
and which do not meet the criteria for Numeric_Type=Decimal will simply be
assigned the value Numeric_Type=Numeric.
所以 Numeric_Type=Digit 在历史上被用于不符合 Numeric_Type=Decimal 技术要求的其他数字,但他们认为这没有用,数字字符不符合 Numeric_Type=自 Unicode 6.3.0 以来,刚刚分配了十进制要求 Numeric_Type=Numeric。例如,Unicode 7.0 中引入的 </code>(DINGBAT 负圆圈无衬线数字零)具有 Numeric_Type=Numeric.</p>
<p>Numeric_Type=Numeric 适用于所有代表数字但不属于其他类别的字符,Numeric_Type=None 适用于不代表数字的字符(或者至少,不要在正常使用情况下)。</p>
<p>所有具有非None Numeric_Type 属性 的字符都有一个 Numeric_Value 属性 代表它们的数值。 <code>unicodedata.digit
将 return 该值作为具有 Numeric_Type=Decimal 或 Numeric_Type=Digit 的字符的 int,并且 unicodedata.numeric
将 return 该值作为具有任何非 None Numeric_Type.
字符的浮点数
来自 unicodedata 文档:
unicodedata.digit(chr[, default]) Returns the digit value assigned to the character chr as integer. If no such value is defined, default is returned, or, if not given, ValueError is raised.
unicodedata.numeric(chr[, default]) Returns the numeric value assigned to the character chr as float. If no such value is defined, default is returned, or, if not given, ValueError is raised.
任何人都可以向我解释这两个函数之间的区别吗?
这里有人可以阅读 the implementation of both functions 但我不明白快速浏览有什么区别,因为我不熟悉 CPython 实现。
编辑 1:
最好是一个显示差异的例子。
编辑 2:
示例有助于补充@user2357112 的评论和精彩回答:
print(unicodedata.digit('1')) # Decimal digit one.
print(unicodedata.digit('١')) # ARABIC-INDIC digit one
print(unicodedata.digit('¼')) # Not a digit, so "ValueError: not a digit" will be generated.
print(unicodedata.numeric('Ⅱ')) # Roman number two.
print(unicodedata.numeric('¼')) # Fraction to represent one quarter.
简答:
如果一个字符表示一个十进制数字,那么 1
、¹
(大写一)、①
(带圆圈的数字一)、١
(阿拉伯语-INDIC DIGIT ONE), unicodedata.digit
将 return 该字符表示为 int 的数字(因此所有这些示例都是 1)。
如果字符表示任何数值,那么像 ⅐
(普通分数一七)和所有十进制数字示例,unicodedata.numeric
将以浮点数形式给出该字符的数值。
出于技术原因,较新的数字字符,如 </code>(DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO)可能会从 <code>unicodedata.digit
.
长答案:
Unicode 字符都有一个 Numeric_Type
属性。 属性 可以有 4 个可能的值:Numeric_Type=Decimal,Numeric_Type=Digit,Numeric_Type=Numeric,或 Numeric_Type=None.
引用 Unicode standard, version 10.0.0, section 4.6,
The Numeric_Type=Decimal property value (which is correlated with the General_Category=Nd property value) is limited to those numeric characters that are used in decimal-radix numbers and for which a full set of digits has been encoded in a contiguous range, with ascending order of Numeric_Value, and with the digit zero as the first code point in the range.
Numeric_Type=十进制字符因此是符合其他一些特定技术要求的十进制数字。
Decimal digits, as defined in the Unicode Standard by these property assignments, exclude some characters, such as the CJK ideographic digits (see the first ten entries in Table 4-5), which are not encoded in a contiguous sequence. Decimal digits also exclude the compatibility subscript and superscript digits, to prevent simplistic parsers from misinterpreting their values in context. (For more information on superscript and subscripts, see Section 22.4, Superscript and Subscript Symbols.) Traditionally, the Unicode Character Database has given these sets of noncontiguous or compatibility digits the value Numeric_Type=Digit, to recognize the fact that they consist of digit values but do not necessarily meet all the criteria for Numeric_Type=Decimal. However, the distinction between Numeric_Type=Digit and the more generic Numeric_Type=Numeric has proven not to be useful in implementations. As a result, future sets of digits which may be added to the standard and which do not meet the criteria for Numeric_Type=Decimal will simply be assigned the value Numeric_Type=Numeric.
所以 Numeric_Type=Digit 在历史上被用于不符合 Numeric_Type=Decimal 技术要求的其他数字,但他们认为这没有用,数字字符不符合 Numeric_Type=自 Unicode 6.3.0 以来,刚刚分配了十进制要求 Numeric_Type=Numeric。例如,Unicode 7.0 中引入的 </code>(DINGBAT 负圆圈无衬线数字零)具有 Numeric_Type=Numeric.</p>
<p>Numeric_Type=Numeric 适用于所有代表数字但不属于其他类别的字符,Numeric_Type=None 适用于不代表数字的字符(或者至少,不要在正常使用情况下)。</p>
<p>所有具有非None Numeric_Type 属性 的字符都有一个 Numeric_Value 属性 代表它们的数值。 <code>unicodedata.digit
将 return 该值作为具有 Numeric_Type=Decimal 或 Numeric_Type=Digit 的字符的 int,并且 unicodedata.numeric
将 return 该值作为具有任何非 None Numeric_Type.