i18n 语言代码识别语言和地区的最佳实践?

Best practice usage of i18n language code to identify language and region?

我计划在我当前的 Web 项目中使用 language/region 代码来识别语言和区域,即 'en-US' 或 'de-CH'。 使用 'en-IN' 之类的代码来识别带有英文文本的印度地区的内容是否有效?

下面是关于在实际HTML文档中指定语言的。服务器端编码可能有不同的最佳实践,这可能取决于所使用的实际编程技术。

根据W3C[1] one should chose the language code based on the Best Current Practice 47 "Tags for Identifying Languages" (RFC5646)[2].1] one should chose the language code based on the Best Current Practice 47 "Tags for Identifying Languages" (RFC5646)[2.

在(X)HTML文档中如何声明语言的方式

建议格式如下

langtag   = language
           ["-" script]
           ["-" region]
           *("-" variant)
           *("-" extension)
           ["-" privateuse]

其中部分被定义为

language      = 2*3ALPHA            ; shortest ISO 639 code
region        = 2ALPHA              ; ISO 3166-1 code

关于region部分的用法,上面写着

Region subtags are used to indicate linguistic variations associated with or appropriate to a specific country, territory, or region. Typically, a region subtag is used to indicate variations such as regional dialects or usage, or region-specific spelling conventions. It can also be used to indicate that content is expressed in a way that is appropriate for use throughout a region, for instance, Spanish content tailored to be useful throughout Latin America.

The following rules apply to the region subtags:

  1. Region subtags MUST follow any primary language, extended language, or script subtags and MUST precede any other type of subtag.

[...]

  1. There MUST be at most one region subtag in a language tag and the region subtag MAY be omitted, as when it adds no distinguishing value to the tag.

没有说一种语言是该国的官方语言,也不是真正的通用语言。仅当区分不会增加太多价值时,您可以省略区域但不必这样做。

请注意,语言标签正式全部为 小写,而地区代码为 大写

tl;dr 只要您使用有效的 iso 代码来表示语言和地区,您就符合最佳实践和标准,例如使用en-IN

ISO 639-2 中的语言代码列表可在 https://www.loc.gov/standards/iso639-2/php/English_list.php while a list of region codes are listed on wikipedia or you may use the search on the official ISO website 中找到。