Java 8 - 语言环境查找行为

Java 8 - Locale lookup behavior

在Java8中引入,Locale.lookup(),基于RFC 4647,允许用户根据[=14]的优先级列表找到Locale列表的最佳匹配=].现在我不明白这种方法的每一个极端情况。下面是一个特例,我想解释一下:

// Create a collection of Locale objects to search
Collection<Locale> locales = new ArrayList<>();
locales.add(Locale.forLanguageTag("en-GB"));
locales.add(Locale.forLanguageTag("en"));

// Express the user's preferences with a Language Priority List
String ranges = "en-US;q=1.0,en-GB;q=1.0";
List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);

// Find the BEST match, and return just one result
Locale result = Locale.lookup(languageRanges,locales);
System.out.println(result.toString());

这会打印出 en,而我直觉上会期望 en-GB

注意:

有人可以解释这种行为背后的基本原理吗?

如果您提供具有相同优先级的备选语言,则列表顺序变得很重要。当您检查 "en-US;q=1.0,en-GB;q=1.0" 的解析列表时,这会变得很明显。它包含两个条目,代表 "en-US;q=1.0",然后是 "en-GB;q=1.0"

https://www.ietf.org/rfc/rfc4647.txt

3.4. Lookup

Lookup is used to select the single language tag that best matches the language priority list for a given request. When performing lookup, each language range in the language priority list is considered in turn, according to priority. … The first matching tag found, according to the user's priority, is considered the closest match and is the item returned. For example, if the language range is "de-ch", a lookup operation can produce content with the tags "de" or "de-CH" but never content with the tag "de-CH-1996". If no language tag matches the request, the "default" value is returned.

In the lookup scheme, the language range is progressively truncated from the end until a matching language tag is located. …

最后一句描述了第一段中已经举例说明的内容,即 de-CH 的语言范围可能匹配 de-CHde。对列表中的每个项目执行此回退查找,在找到匹配项的第一个项目处停止。

换句话说,指定 "en-US;q=1.0,en-GB;q=1.0" 就像指定 "en-US,en,en-GB,en"


也许你想要的是过滤,见

3.3. Filtering

Filtering is used to select the set of language tags that matches a given language priority list. …

In filtering, each language range represents the least specific language tag (that is, the language tag with fewest number of subtags) that is an acceptable match.

因此,给定您的 select可用语言环境

的原始列表
List<Locale> filtered = Locale.filter(
    Locale.LanguageRange.parse("en-US;q=1.0,en-GB;q=1.0"), locales);
System.out.println("filtered: "+filtered);

产生 [en_GB].

Collection<Locale> locales = Arrays.asList(Locale.forLanguageTag("en"),
    Locale.forLanguageTag("en-GB"), Locale.forLanguageTag("en-US"));
List<Locale> filtered = Locale.filter(
    Locale.LanguageRange.parse("en-US;q=1.0,en-GB;q=1.0"), locales);
System.out.println("filtered: "+filtered);

产生 [en_US, en_GB](注意优先顺序并且没有 en 回退)。因此,根据上下文,您可能会首先尝试从过滤列表中 select 并且仅在过滤列表为空时求助于 lookup

至少,Java实现的行为是符合规范的。正如您已经指出的那样,更改优先级或更改顺序(当优先级相同时)会根据规范更改结果。

解析给定范围生成Language Priority List:

  • 对于 "en-US;q=1.0,en-GB;q=1.0",优先级列表是 [en-us;=1.0,en-gb;=1.0]

  • 对于 "en-GB;q=1.0,en-US;q=1.0",优先级列表是 [en-gb;=1.0,en-us;=1.0]

  • 对于 "en-US;q=0.9,en-GB;q=1.0",优先级列表是 [en-gb;=1.0,en-us;=0.9]

然后查找方法遵循此优先级列表,直到找到匹配的语言环境(根据 RFC 4647):

  • 对于 en-us;=1.0,en-gb;=1.0,算法首先取 en-us;=1.0,其最佳匹配语言环境是 en
  • 对于 en-gb;=1.0,en-us;=1.0,算法首先取 en-gb;=1.0,最佳匹配语言环境是 en-GB
  • 对于 en-gb;=1.0,en-us;=0.9,算法首先取 en-gb;=1.0,最佳匹配语言环境是 en-GB

得到这个结果的步骤如下:

  1. en-US 匹配 en-GB 吗? → 没有
  2. en-US 匹配 en 吗? → 没有
  3. en-US 截断为 en
  4. en 匹配 en-GB 吗? → 没有
  5. en 匹配 en 吗? → 是的,找到匹配的标签,return it

它根据 RFC 4647:

3.4. Lookup

...

The first matching tag found, according to the user's priority, is considered the closest match and is the item returned.

...

In the lookup scheme, the language range is progressively truncated from the end until a matching language tag is located.

查找算法的核心在sun.util.locale.LocaleMatcher#lookupTag中实现。您可以查看 source code