如何在 Java 8 中用阿拉伯语-印度语数字解析字符串日期时间和时区？

Question

我想用阿拉伯-印度数字解析字符串日期时间和时区，所以我写了这样的代码：

    String dateTime = "٢٠٢١-١١-٠٨T٠٢:٢١:٠٨+٠٢:٠٠";
    char zeroDigit = '٠';
    Locale locale = Locale.forLanguageTag("ar");
    DateTimeFormatter pattern = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ssXXX")
            .withLocale(locale)
            .withDecimalStyle(DecimalStyle.of(locale).withZeroDigit(zeroDigit));
    ZonedDateTime parsedDateTime = ZonedDateTime.parse(dateTime, pattern);
    assert parsedDateTime != null;

但是我收到了异常：

java.time.format.DateTimeParseException: Text '٢٠٢١-١١-٠٨T٠٢:٢١:٠٨+٠٢:٠٠' could not be parsed at index 19

在Whosebug上查了很多问题，还是不明白自己做错了什么

当时区不使用阿拉伯-印度数字时，dateTime = "٢٠٢١-١١-٠٨T٠٢:٢١:٠٨+02:00" 可以正常工作。

Answer 1

你的dateTime字符串是错误的，被误解了。它显然试图符合 ISO 8601 格式但失败了。因为 ISO 8601 格式使用 US-ASCII 数字。

java.time 的类（Instant、OffsetDateTime 和 ZonedDateTime）将在没有任何格式化程序的情况下解析您的字符串，只要数字是正确的ISO 8601。在绝大多数情况下，我会采用您的方式：尝试按原样解析字符串。在这种情况下不是。对我来说，在解析之前更正字符串更有意义。

    String dateTime = "٢٠٢١-١١-٠٨T٠٢:٢١:٠٨+٠٢:٠٠";
    char[] dateTimeChars = dateTime.toCharArray();
    for (int index = 0; index < dateTimeChars.length; index++) {
        if (Character.isDigit(dateTimeChars[index])) {
            int digitValue = Character.getNumericValue(dateTimeChars[index]);
            dateTimeChars[index] = Character.forDigit(digitValue, 10);
        }
    }
    
    OffsetDateTime odt = OffsetDateTime.parse(CharBuffer.wrap(dateTimeChars));
    
    System.out.println(odt);

输出：

2021-11-08T02:21:08+02:00

编辑：当然，如果您能教育字符串的发布者使用 US-ASCII 数字，那就更好了。

编辑：我知道我 link 下面的维基百科文章说：

Representations must be written in a combination of Arabic numerals and the specific computer characters (such as "-", ":", "T", "W", "Z") that are assigned specific meanings within the standard; …

这是造成混乱的一个可想而知的原因。文章 阿拉伯数字 linked 说：

Arabic numerals are the ten digits: 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.

编辑：我如何转换每个数字：Character.getNumericValue() 将表示数字的 char 转换为等于数字表示的数字的 int，因此 '٠' 到 0，'٢' 到 2，等等。它适用于所有数字字符（不仅是阿拉伯语和 ASCII 字符）。 Character.forDigit() 执行某种相反的转换，总是转换为美国 ASCII，因此 0 转换为 '0'、2 转换为 '2'，等等

编辑：感谢@Holger 在这种情况下提请我注意 CharBuffer。 CharBuffer 实现了 CharSequence，java.time 的 parse 方法需要的类型，因此我们无需将 char 数组转换回 String.

链接

Answer 2

错误消息表明问题出在输入字符串中的索引 19 处。

字符 19 是输入字符串中的 + 字符。这意味着无法解析偏移量（在您的模式中由 XXX 表示）。

问题不在于 + 本身。问题是时区偏移量，如 +05:00，永远不会本地化。

文档上没有讲这个，只好去DateTimeFormatterBuilder的源码里验证一下

里面 class 是 this inner class:

static final class OffsetIdPrinterParser implements DateTimePrinterParser {

在那class中，我们可以找到parse method which has calls to the private parseHour, parseMinute, and parseSeconds个方法。

这些方法中的每一个都委托给私有 parseDigits method. In that method, we can see that only ASCII digits are considered:

char ch1 = parseText.charAt(pos++);
char ch2 = parseText.charAt(pos++);
if (ch1 < '0' || ch1 > '9' || ch2 < '0' || ch2 > '9') {
    return false;
}

因此，这里的答案是时区偏移量必须由 ASCII 数字组成，而不管语言环境如何。

如何在 Java 8 中用阿拉伯语-印度语数字解析字符串日期时间和时区？

How to parse string datetime & timezone with Arabic-Hindu digits in Java 8?

java

datetime

locale

arabic

java-8

链接