为什么 Google Vision 可以识别 'und' 文本中仅包含数字的图像的语言环境?

Why does Google Vision recognize 'und' locale for images containing just numbers in text?

我已将图像提供给 Google Cloud Vision OCR API 进行注释。该图像仅包含一个 phone 数字。

Google Cloud Vision 表示文本的语言环境是 'und'。这是否意味着未定义?我没有在文档中找到任何信息。

确实,'und' 不在 documentation 语言的代码中。由于图像不包含数字,因此它不会检测到语言。

但是,文档还指出 Vision API 使用 BCP-47 标识符,并且 'und' 被列为 Non-specific Language Tag. You can also find the clarification 即 "The special value 'und' (Undetermined) has a 'Scope' of 'special'"。被特殊定义为:

'special' - Indicates a special language code. These are subtags used for identifying linguistic attributes not particularly associated with a concrete language. These include codes for when the language is undetermined or for non-linguistic content.

因此,"the 'und' (Undetermined) primary language subtag identifies linguistic content whose language is not determined".