Java String.toUpperCase() 会失败吗？

Question

情况：有一个 Java ESB，它正在从 Vaadin 网络表单中获取输入（姓氏），并且应该保证在将其保存到数据库之前将其大写。

我被指派调查一个报告的问题，小写字符有时会出现在数据库中。我了解到，该程序在通过 EntityManager 保存数据之前正在使用 String.toUpperCase()（这是唯一修改接收到的数据的地方）。

所以我想知道的是，这是否足够。到目前为止，我还没有发现任何与 toUpperCase() 函数相关的 "well-known" 问题，但我想确定一下。

所以问题 - String.toUpperCase() 是否总是能正常工作？或者有没有什么可能的字符或者情况会出现错误导致字母不能大写？

Answer 1

在没有任何进一步信息的情况下，数据库中存储了哪些字符（您决定小写），我猜起源与那些博客中解释的情况类似

海因茨·卡布茨
http://www.javaspecialists.eu/archive/Issue209.html
http://www.javaspecialists.eu/archive/Issue211.html

埃利奥特·拉斯蒂·哈罗德着
http://cafe.elharo.com/blogroll/turkish/

edit 可能是数据库中存储了一个字符，该字符看起来（基于字体）类似于拉丁字符并且不存在大写字母。

一个例子是 GREEK LETTER YOT，它看起来类似于 LATIN SMALL LETTER J，但没有大写字母。

用于演示的小片段。

int[] codePoints = { 0x03F3, 0x006A}; 
for (int codePoint : codePoints) {
    char lowerCase = (char) Character.toLowerCase(codePoint);
    char upperCase = (char) Character.toUpperCase(codePoint);
    System.out.printf("Unicode name: %s%n", Character.getName(codePoint));
    System.out.printf("lowercase   : %s%n", lowerCase);
    System.out.printf("uppercase   : %s (%s)%n", upperCase,
        Character.isUpperCase(upperCase));
}

输出为

Unicode name: GREEK LETTER YOT
lowercase   : ϳ
uppercase   : ϳ (false)
Unicode name: LATIN SMALL LETTER J
lowercase   : j
uppercase   : J (true)

Answer 2

Can Java String.toUpperCase() ever fail?

这取决于您是否传递区域设置敏感字符串（见下文）。

在 Java.lang.String 的实现中，它仅使用默认语言环境：

public String toUpperCase() {
    return toUpperCase(Locale.getDefault());
}

toUpperCase(Locale) 使用给定 Locale 的规则将此 String 中的所有字符转换为大写。大小写映射基于字符 class 指定的 Unicode 标准版本。由于大小写映射并不总是 1:1 字符映射，因此生成的字符串可能是 与原始字符串不同的长度。

This method is locale sensitive, and may produce unexpected results if used for strings that are intended to be interpreted locale independently. Examples are programming language identifiers, protocol keys, and HTML tags.

To obtain correct results for locale insensitive strings, use toUpperCase(Locale.ENGLISH).

如果您对 toUpperCase(Locale) 的实现方式感兴趣：

public String toUpperCase(Locale locale) {
    if (locale == null) {
        throw new NullPointerException();
    }

    int firstLower;
    final int len = value.length;

    /* Now check if there are any characters that need to be changed. */
    scan: {
        for (firstLower = 0 ; firstLower < len; ) {
            int c = (int)value[firstLower];
            int srcCount;
            if ((c >= Character.MIN_HIGH_SURROGATE)
                    && (c <= Character.MAX_HIGH_SURROGATE)) {
                c = codePointAt(firstLower);
                srcCount = Character.charCount(c);
            } else {
                srcCount = 1;
            }
            int upperCaseChar = Character.toUpperCaseEx(c);
            if ((upperCaseChar == Character.ERROR)
                    || (c != upperCaseChar)) {
                break scan;
            }
            firstLower += srcCount;
        }
        return this;
    }

    /* result may grow, so i+resultOffset is the write location in result */
    int resultOffset = 0;
    char[] result = new char[len]; /* may grow */

    /* Just copy the first few upperCase characters. */
    System.arraycopy(value, 0, result, 0, firstLower);

    String lang = locale.getLanguage();
    boolean localeDependent =
            (lang == "tr" || lang == "az" || lang == "lt");
    char[] upperCharArray;
    int upperChar;
    int srcChar;
    int srcCount;
    for (int i = firstLower; i < len; i += srcCount) {
        srcChar = (int)value[i];
        if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&
            (char)srcChar <= Character.MAX_HIGH_SURROGATE) {
            srcChar = codePointAt(i);
            srcCount = Character.charCount(srcChar);
        } else {
            srcCount = 1;
        }
        if (localeDependent) {
            upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);
        } else {
            upperChar = Character.toUpperCaseEx(srcChar);
        }
        if ((upperChar == Character.ERROR)
                || (upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
            if (upperChar == Character.ERROR) {
                if (localeDependent) {
                    upperCharArray =
                            ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);
                } else {
                    upperCharArray = Character.toUpperCaseCharArray(srcChar);
                }
            } else if (srcCount == 2) {
                resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;
                continue;
            } else {
                upperCharArray = Character.toChars(upperChar);
            }

            /* Grow result if needed */
            int mapLen = upperCharArray.length;
            if (mapLen > srcCount) {
                char[] result2 = new char[result.length + mapLen - srcCount];
                System.arraycopy(result, 0, result2, 0, i + resultOffset);
                result = result2;
            }
            for (int x = 0; x < mapLen; ++x) {
                result[i + resultOffset + x] = upperCharArray[x];
            }
            resultOffset += (mapLen - srcCount);
        } else {
            result[i + resultOffset] = (char)upperChar;
        }
    }
    return new String(result, 0, len + resultOffset);
}