为什么不能在SD卡中使用特殊字符（超过10000个unicode）文件名？

Question

最近我在 Android 中无法将文件从内部 SD 卡复制到可移动 SD 卡。

这个文件有一个特殊字符 "snake"(http://www.fileformat.info/info/unicode/char/1f40d/index.htm).

可以保存到内部SD卡。但它不能复制到可移动 SD 卡。我试图从我的 PC 复制到可移动 SD 卡。但是也失败了

所以我创建了一个代码来将特殊字符更改为某些东西。

for(int i =0; i < input.length(); i++) {
        if(input.codePointAt(i) > 0xFFFF) { //check is it special character?
            //change string!!!!! 
            i ++;
        } else {
            builder.append(input.charAt(i));
        }
    }

它起作用了，所以我可以解决我的问题。但是我想知道

的原因

为什么我不能复制到有特殊字符（unicode 10000以上如U+1f40d）的可移动SD卡。

Answer 1

最新（也可能是最终）答案：

最近我通过android phone (Jellybean) 格式化了一张SD卡，只是为了检查新格式化的文件系统。原来是FAT系统

FAT 系统 （任何版本） 不支持 UTF-16 长文件名。它只支持 UCS-2 长文件名。（前者支持代理对，后者不支持）

从第二个开始link:

...whereas UCS-2 is limited to BMP characters. These encodings are practical because the length in units is the number of characters.

这可以解释您设备的 'Snake' 字符问题。你能为我们确认一下文件系统吗？

另一方面，您的内部文件系统将取决于制造商。检查确认这一点的第 4 个 link。

上一个答案(2)：

我尝试在我的 PC 中复制场景，在源文件名中使用 'Snake' Unicode 并使用 Files class 进行复制，并且复制成功。

使用的代码：

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;

public class abc {
    /**
     * @param arg
     * @throws IOException
     */
    public static void main(String arg[]) throws IOException {      
        StringBuffer sb = new StringBuffer();
        sb.appendCodePoint(128013);     
        File tmpFile = File.createTempFile("testFile" + sb.toString(), ".txt");
        File dirTrgt = new File("C:\"+tmpFile.getName());
        Files.copy(tmpFile.toPath(), dirTrgt.toPath(), StandardCopyOption.REPLACE_EXISTING);
        tmpFile.deleteOnExit();
    }
}

上一个答案 (1):

The range of valid code points is U+0000 to U+10FFFF.

在Java中，char的严格长度为2字节--> 16位--> 0xFFFF（最大值）

您从 #getCodePointAt(int) 获得的值（int 值）支持所有代码点。（Unicode 字符称为代码点）

~~因此，当您遇到字符 > 0xFFFF 时，您的 input.charAt(i)（最有可能）returns 溢出 char 即 != input.getCodePointAt(i)~~

关于 UTF-16 的小知识：

BMP - Basic Multilingual Plane (BMP) 是代码点范围 U+0000 到 U+FFFF.
超出该范围的任何字符都称为补充字符。此类字符使用对的 char 值表示。
增补字符由一个高代理项——取值范围0xD800到0xDBFF——和一个低代理项 -- 取值范围0xDC00到0xDFFF--

从你在问题中给出的link我们可以看出这一点。 UTF-16 的 Snake 值：

UTF-16 (hex) -- 0xD83D 0xDC0D (d83ddc0d)

有关更多信息，请参阅下面 links:

Android Doc for Character Class

how-does-java-store-utf-16-characters-in-its-16-bit-char-type

为什么不能在SD卡中使用特殊字符（超过10000个unicode）文件名？

Why cannot use special character(over 10000 unicode ) file name in SD card?

filesystems

unicode

android

utf-8

android-sdcard